2024-09-13 17:02:02,338 INFO [train.py:1266] (0/2) Training started 2024-09-13 17:02:02,348 INFO [train.py:1276] (0/2) Device: cuda:0 2024-09-13 17:02:02,364 INFO [train.py:1307] (0/2) Using dtype=torch.float16 2024-09-13 17:02:02,365 INFO [train.py:1308] (0/2) Use AMP=True 2024-09-13 17:02:02,365 INFO [train.py:1310] (0/2) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'ignore_id': -1, 'label_smoothing': 0.1, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '44a9d5682af9fd3ef77074777e15278ec6d390eb', 'k2-git-date': 'Wed Sep 27 11:22:55 2023', 'lhotse-version': '1.17.0.dev+git.ccfc5b2c.dirty', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'cr-ctc', 'icefall-git-sha1': 'a6eead6c-clean', 'icefall-git-date': 'Mon Sep 9 10:10:08 2024', 'icefall-path': '/star-zw/workspace/zipformer/icefall_cr_ctc', 'k2-path': '/star-zw/workspace/k2/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-zw/workspace/lhotse/lhotse/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-3-0904151514-6f47fc7cf9-27nzd', 'IP address': '10.30.28.42'}, 'world_size': 2, 'master_port': 12348, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.04, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 1.0, 'cr_loss_scale': 0.2, 'time_mask_ratio': 2.5, 'cr_loss_masked_scale': 1.0, 'attention_decoder_loss_scale': 0.8, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_bf16': False, 'num_encoder_layers': '2,2,2,2,2,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,768,768,768,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,256,256,256,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,192,192,192,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'attention_decoder_dim': 512, 'attention_decoder_num_layers': 6, 'attention_decoder_attention_dim': 512, 'attention_decoder_num_heads': 8, 'attention_decoder_feedforward_dim': 2048, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': False, 'use_ctc': True, 'use_attention_decoder': False, 'use_cr_ctc': True, 'full_libri': True, 'mini_libri': False, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 850, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'sos_id': 1, 'eos_id': 1, 'vocab_size': 500, 'dtype': torch.float16, 'use_autocast': True} 2024-09-13 17:02:02,365 INFO [train.py:1312] (0/2) About to create model 2024-09-13 17:02:03,058 INFO [train.py:1316] (0/2) Number of model parameters: 22118279 2024-09-13 17:02:03,058 INFO [train.py:752] (0/2) num_frame_masks: 25, max_frames_mask_fraction: 0.375 2024-09-13 17:02:11,818 INFO [train.py:1338] (0/2) Using DDP 2024-09-13 17:02:12,229 INFO [asr_datamodule.py:436] (0/2) About to get the shuffled train-clean-100, train-clean-360 and train-other-500 cuts 2024-09-13 17:02:13,759 INFO [asr_datamodule.py:232] (0/2) Enable MUSAN 2024-09-13 17:02:13,759 INFO [asr_datamodule.py:233] (0/2) About to get Musan cuts 2024-09-13 17:02:15,613 INFO [asr_datamodule.py:279] (0/2) Disable SpecAugment 2024-09-13 17:02:15,613 INFO [asr_datamodule.py:281] (0/2) About to create train dataset 2024-09-13 17:02:15,613 INFO [asr_datamodule.py:308] (0/2) Using DynamicBucketingSampler. 2024-09-13 17:02:39,746 INFO [asr_datamodule.py:325] (0/2) About to create train dataloader 2024-09-13 17:02:39,747 INFO [asr_datamodule.py:453] (0/2) About to get dev-clean cuts 2024-09-13 17:02:39,752 INFO [asr_datamodule.py:460] (0/2) About to get dev-other cuts 2024-09-13 17:02:39,753 INFO [asr_datamodule.py:356] (0/2) About to create dev dataset 2024-09-13 17:02:39,949 INFO [asr_datamodule.py:373] (0/2) About to create dev dataloader 2024-09-13 17:02:39,950 INFO [train.py:1545] (0/2) Sanity check -- see if any of the batches in epoch 1 would cause OOM. 2024-09-13 17:06:53,820 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 17162MB 2024-09-13 17:06:55,756 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 18352MB 2024-09-13 17:06:58,028 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 18352MB 2024-09-13 17:07:00,101 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 18352MB 2024-09-13 17:07:02,844 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 18352MB 2024-09-13 17:07:05,084 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 18352MB 2024-09-13 17:07:57,903 INFO [train.py:1198] (0/2) Epoch 1, batch 0, loss[loss=4.942, ctc_loss=4.743, cr_loss=0.9961, over 20955.00 frames. ], tot_loss[loss=4.942, ctc_loss=4.743, cr_loss=0.9961, over 20955.00 frames. ], batch size: 49, lr: 2.00e-02, grad_scale: 2.0 2024-09-13 17:07:57,904 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-13 17:08:16,721 INFO [train.py:1230] (0/2) Epoch 1, validation: loss=4.672, ctc_loss=4.672, cr_loss=3.997e-15, over 944034.00 frames. 2024-09-13 17:08:16,722 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 18352MB 2024-09-13 17:08:20,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=0.0, ans=0.5 2024-09-13 17:08:28,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=0.0, ans=0.5 2024-09-13 17:08:36,735 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.661e+03 5.991e+03 6.509e+03 6.684e+03 1.271e+04, threshold=2.604e+04, percent-clipped=0.0 2024-09-13 17:08:41,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=106.91 vs. limit=5.0 2024-09-13 17:08:54,100 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.533e+03 4.882e+03 6.179e+03 1.114e+04 2.062e+04, threshold=2.472e+04, percent-clipped=0.0 2024-09-13 17:09:02,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=163.06 vs. limit=4.011333333333333 2024-09-13 17:09:18,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.23 vs. limit=7.531875 2024-09-13 17:09:29,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.053e+02 1.776e+03 3.339e+03 6.179e+03 2.062e+04, threshold=1.336e+04, percent-clipped=0.0 2024-09-13 17:09:36,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=113.33333333333333, ans=0.4946875 2024-09-13 17:09:43,461 INFO [train.py:1198] (0/2) Epoch 1, batch 50, loss[loss=1.148, ctc_loss=1.114, cr_loss=0.1693, over 20939.00 frames. ], tot_loss[loss=1.764, ctc_loss=1.7, cr_loss=0.3181, over 930752.04 frames. ], batch size: 48, lr: 2.20e-02, grad_scale: 0.5 2024-09-13 17:09:47,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141.66666666666666, ans=0.2985833333333333 2024-09-13 17:09:57,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=141.66666666666666, ans=0.493359375 2024-09-13 17:10:06,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170.0, ans=0.2983 2024-09-13 17:10:42,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=226.66666666666666, ans=0.489375 2024-09-13 17:10:42,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=178.40 vs. limit=7.585 2024-09-13 17:10:46,833 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=29.91 vs. limit=7.585 2024-09-13 17:11:03,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=255.0, ans=0.09840625 2024-09-13 17:11:04,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.03 vs. limit=7.69125 2024-09-13 17:11:05,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255.0, ans=0.29745 2024-09-13 17:11:10,334 INFO [train.py:1198] (0/2) Epoch 1, batch 100, loss[loss=1.309, ctc_loss=1.284, cr_loss=0.1273, over 20649.00 frames. ], tot_loss[loss=1.453, ctc_loss=1.411, cr_loss=0.2077, over 1635323.05 frames. ], batch size: 71, lr: 2.40e-02, grad_scale: 1.0 2024-09-13 17:11:11,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.76 vs. limit=4.113333333333333 2024-09-13 17:11:13,600 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.479e+02 5.147e+02 9.858e+02 2.357e+03 2.062e+04, threshold=1.972e+03, percent-clipped=0.0 2024-09-13 17:11:24,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=7.7125 2024-09-13 17:11:26,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=7.73375 2024-09-13 17:11:44,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=60.65 vs. limit=7.6275 2024-09-13 17:12:03,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=32.74 vs. limit=7.638125 2024-09-13 17:12:14,253 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=47.85 vs. limit=7.638125 2024-09-13 17:12:19,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=4.158666666666667 2024-09-13 17:12:22,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396.6666666666667, ans=0.2960333333333333 2024-09-13 17:12:37,102 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.28 vs. limit=3.06375 2024-09-13 17:12:37,985 INFO [train.py:1198] (0/2) Epoch 1, batch 150, loss[loss=1.249, ctc_loss=1.233, cr_loss=0.07922, over 20715.00 frames. ], tot_loss[loss=1.349, ctc_loss=1.317, cr_loss=0.161, over 2171203.39 frames. ], batch size: 71, lr: 2.60e-02, grad_scale: 1.0 2024-09-13 17:12:40,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=83.13 vs. limit=7.81875 2024-09-13 17:12:42,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=425.0, ans=7.659375 2024-09-13 17:13:27,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=85.52 vs. limit=7.680625 2024-09-13 17:13:34,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=7.69125 2024-09-13 17:13:34,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.20 vs. limit=3.0765 2024-09-13 17:13:37,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=510.0, ans=0.47609375 2024-09-13 17:13:43,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=4.204 2024-09-13 17:14:05,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=538.3333333333334, ans=0.474765625 2024-09-13 17:14:09,308 INFO [train.py:1198] (0/2) Epoch 1, batch 200, loss[loss=1.249, ctc_loss=1.231, cr_loss=0.09186, over 20831.00 frames. ], tot_loss[loss=1.299, ctc_loss=1.272, cr_loss=0.1376, over 2602250.51 frames. ], batch size: 59, lr: 2.80e-02, grad_scale: 2.0 2024-09-13 17:14:12,948 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.501e+02 3.089e+02 4.430e+02 6.680e+02 6.501e+03, threshold=8.860e+02, percent-clipped=1.0 2024-09-13 17:14:27,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=7.925 2024-09-13 17:14:41,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=7.94625 2024-09-13 17:14:48,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=34.09 vs. limit=7.73375 2024-09-13 17:14:52,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=623.3333333333334, ans=0.47078125 2024-09-13 17:14:55,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=50.54 vs. limit=7.73375 2024-09-13 17:14:58,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=69.94 vs. limit=7.73375 2024-09-13 17:14:59,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=155.25 vs. limit=7.73375 2024-09-13 17:15:40,587 INFO [train.py:1198] (0/2) Epoch 1, batch 250, loss[loss=1.31, ctc_loss=1.288, cr_loss=0.1141, over 20953.00 frames. ], tot_loss[loss=1.266, ctc_loss=1.241, cr_loss=0.1248, over 2926464.69 frames. ], batch size: 64, lr: 3.00e-02, grad_scale: 2.0 2024-09-13 17:15:41,405 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.28 vs. limit=5.177083333333333 2024-09-13 17:15:48,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=55.84 vs. limit=7.765625 2024-09-13 17:15:52,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=708.3333333333334, ans=0.29291666666666666 2024-09-13 17:16:01,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=29.05 vs. limit=8.0525 2024-09-13 17:16:08,124 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=102.96 vs. limit=8.0525 2024-09-13 17:16:10,198 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.02 vs. limit=5.368333333333333 2024-09-13 17:16:18,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=765.0, ans=0.464140625 2024-09-13 17:16:24,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.14 vs. limit=7.786875 2024-09-13 17:16:44,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=53.98 vs. limit=5.3966666666666665 2024-09-13 17:16:47,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=821.6666666666666, ans=5.205416666666666 2024-09-13 17:17:06,347 INFO [train.py:1198] (0/2) Epoch 1, batch 300, loss[loss=1.237, ctc_loss=1.211, cr_loss=0.1292, over 21044.00 frames. ], tot_loss[loss=1.247, ctc_loss=1.223, cr_loss=0.1218, over 3187190.82 frames. ], batch size: 62, lr: 3.20e-02, grad_scale: 4.0 2024-09-13 17:17:08,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=850.0, ans=7.81875 2024-09-13 17:17:09,938 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.667e+02 2.646e+02 3.598e+02 4.970e+02 1.314e+03, threshold=7.197e+02, percent-clipped=2.0 2024-09-13 17:17:10,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=850.0, ans=0.09468750000000001 2024-09-13 17:17:14,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=7.81875 2024-09-13 17:17:25,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=91.22 vs. limit=5.439166666666667 2024-09-13 17:17:26,065 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=72.33 vs. limit=7.829375 2024-09-13 17:17:32,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=878.3333333333334, ans=0.1670625 2024-09-13 17:17:34,597 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.296e+01 2024-09-13 17:17:43,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=46.45 vs. limit=5.453333333333333 2024-09-13 17:17:46,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=906.6666666666666, ans=0.166 2024-09-13 17:18:00,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=935.0, ans=0.0789625 2024-09-13 17:18:12,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=935.0, ans=0.09415625000000001 2024-09-13 17:18:17,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=65.17 vs. limit=7.86125 2024-09-13 17:18:33,241 INFO [train.py:1198] (0/2) Epoch 1, batch 350, loss[loss=1.26, ctc_loss=1.218, cr_loss=0.2084, over 20673.00 frames. ], tot_loss[loss=1.23, ctc_loss=1.203, cr_loss=0.134, over 3386134.13 frames. ], batch size: 66, lr: 3.40e-02, grad_scale: 4.0 2024-09-13 17:18:52,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1020.0, ans=0.4521875 2024-09-13 17:19:02,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.07 vs. limit=5.51 2024-09-13 17:19:04,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1020.0, ans=0.4521875 2024-09-13 17:19:13,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1048.3333333333333, ans=8.286249999999999 2024-09-13 17:19:37,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=26.64 vs. limit=7.9037500000000005 2024-09-13 17:19:44,775 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.32 vs. limit=8.32875 2024-09-13 17:20:01,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1133.3333333333333, ans=0.04645833333333334 2024-09-13 17:20:02,744 INFO [train.py:1198] (0/2) Epoch 1, batch 400, loss[loss=1.227, ctc_loss=1.189, cr_loss=0.1872, over 21019.00 frames. ], tot_loss[loss=1.217, ctc_loss=1.187, cr_loss=0.1493, over 3541156.91 frames. ], batch size: 62, lr: 3.60e-02, grad_scale: 8.0 2024-09-13 17:20:07,832 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.611e+02 3.604e+02 4.772e+02 6.857e+02 1.281e+03, threshold=9.545e+02, percent-clipped=18.0 2024-09-13 17:20:27,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=8.37125 2024-09-13 17:20:34,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.54 vs. limit=7.935625 2024-09-13 17:20:43,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=8.3925 2024-09-13 17:21:02,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=7.956875 2024-09-13 17:21:07,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.97 vs. limit=5.609166666666667 2024-09-13 17:21:11,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=26.13 vs. limit=7.956875 2024-09-13 17:21:14,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=7.9675 2024-09-13 17:21:32,601 INFO [train.py:1198] (0/2) Epoch 1, batch 450, loss[loss=1.181, ctc_loss=1.136, cr_loss=0.2264, over 20944.00 frames. ], tot_loss[loss=1.203, ctc_loss=1.171, cr_loss=0.1634, over 3658097.09 frames. ], batch size: 60, lr: 3.80e-02, grad_scale: 4.0 2024-09-13 17:21:34,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1275.0, ans=0.440234375 2024-09-13 17:21:34,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=7.978125 2024-09-13 17:22:24,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=8.01 2024-09-13 17:22:24,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=8.52 2024-09-13 17:22:40,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1388.3333333333333, ans=0.1479375 2024-09-13 17:22:41,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1388.3333333333333, ans=0.8514083333333333 2024-09-13 17:22:58,378 INFO [train.py:1198] (0/2) Epoch 1, batch 500, loss[loss=1.058, ctc_loss=1.01, cr_loss=0.2411, over 20962.00 frames. ], tot_loss[loss=1.196, ctc_loss=1.159, cr_loss=0.1806, over 3751518.98 frames. ], batch size: 48, lr: 4.00e-02, grad_scale: 8.0 2024-09-13 17:23:03,392 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.539e+02 3.291e+02 4.274e+02 6.351e+02 1.371e+03, threshold=8.549e+02, percent-clipped=4.0 2024-09-13 17:23:07,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1416.6666666666667, ans=0.28583333333333333 2024-09-13 17:23:07,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=8.03125 2024-09-13 17:23:12,548 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.93 vs. limit=3.2125 2024-09-13 17:23:16,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=8.041875 2024-09-13 17:23:20,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1445.0, ans=0.5 2024-09-13 17:23:37,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=8.605 2024-09-13 17:24:03,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=8.063125 2024-09-13 17:24:08,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1530.0, ans=0.065575 2024-09-13 17:24:16,997 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=8.6475 2024-09-13 17:24:22,869 INFO [train.py:1198] (0/2) Epoch 1, batch 550, loss[loss=1.193, ctc_loss=1.138, cr_loss=0.2776, over 20980.00 frames. ], tot_loss[loss=1.187, ctc_loss=1.147, cr_loss=0.201, over 3825060.86 frames. ], batch size: 64, lr: 3.99e-02, grad_scale: 4.0 2024-09-13 17:24:25,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=25.52 vs. limit=8.084375 2024-09-13 17:24:25,461 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.74 vs. limit=5.779166666666667 2024-09-13 17:24:36,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.68 vs. limit=5.389583333333333 2024-09-13 17:25:09,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=8.105625 2024-09-13 17:25:10,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1615.0, ans=0.1394375 2024-09-13 17:25:18,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.11 vs. limit=8.11625 2024-09-13 17:25:30,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1643.3333333333333, ans=0.28356666666666663 2024-09-13 17:25:39,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.95 vs. limit=8.126875 2024-09-13 17:25:50,658 INFO [train.py:1198] (0/2) Epoch 1, batch 600, loss[loss=1.12, ctc_loss=1.051, cr_loss=0.3435, over 20361.00 frames. ], tot_loss[loss=1.166, ctc_loss=1.121, cr_loss=0.224, over 3885861.95 frames. ], batch size: 74, lr: 3.99e-02, grad_scale: 8.0 2024-09-13 17:25:57,438 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 3.241e+02 4.356e+02 6.431e+02 1.093e+03, threshold=8.713e+02, percent-clipped=4.0 2024-09-13 17:25:59,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1700.0, ans=0.06175 2024-09-13 17:26:13,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.04 vs. limit=8.79625 2024-09-13 17:26:18,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1728.3333333333333, ans=0.5 2024-09-13 17:26:21,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=4.691333333333334 2024-09-13 17:26:22,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=8.148125 2024-09-13 17:26:40,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1785.0, ans=0.8375250000000001 2024-09-13 17:26:41,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1785.0, ans=0.28215 2024-09-13 17:26:57,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.74 vs. limit=5.8925 2024-09-13 17:27:01,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=1813.3333333333333, ans=8.86 2024-09-13 17:27:07,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1813.3333333333333, ans=0.415 2024-09-13 17:27:09,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.45 vs. limit=8.18 2024-09-13 17:27:11,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=8.86 2024-09-13 17:27:12,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=5.453333333333333 2024-09-13 17:27:18,496 INFO [train.py:1198] (0/2) Epoch 1, batch 650, loss[loss=1.109, ctc_loss=1.037, cr_loss=0.3597, over 20655.00 frames. ], tot_loss[loss=1.143, ctc_loss=1.094, cr_loss=0.2473, over 3933372.39 frames. ], batch size: 68, lr: 3.99e-02, grad_scale: 8.0 2024-09-13 17:27:37,806 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=8.20125 2024-09-13 17:27:45,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1870.0, ans=0.41234375 2024-09-13 17:28:19,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1926.6666666666667, ans=0.12775 2024-09-13 17:28:23,397 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=8.2225 2024-09-13 17:28:41,446 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.17 vs. limit=8.9875 2024-09-13 17:28:42,294 INFO [train.py:1198] (0/2) Epoch 1, batch 700, loss[loss=1.027, ctc_loss=0.957, cr_loss=0.3493, over 20681.00 frames. ], tot_loss[loss=1.113, ctc_loss=1.059, cr_loss=0.2715, over 3961710.07 frames. ], batch size: 68, lr: 3.99e-02, grad_scale: 8.0 2024-09-13 17:28:49,209 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.650e+02 3.567e+02 5.343e+02 1.045e+03, threshold=7.135e+02, percent-clipped=3.0 2024-09-13 17:29:18,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2040.0, ans=0.2796 2024-09-13 17:29:19,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=9.03 2024-09-13 17:29:32,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2068.3333333333335, ans=0.8276083333333334 2024-09-13 17:29:35,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2068.3333333333335, ans=0.403046875 2024-09-13 17:29:54,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2096.6666666666665, ans=0.052825000000000004 2024-09-13 17:30:00,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2096.6666666666665, ans=0.8266166666666667 2024-09-13 17:30:05,029 INFO [train.py:1198] (0/2) Epoch 1, batch 750, loss[loss=1.005, ctc_loss=0.9332, cr_loss=0.3575, over 20673.00 frames. ], tot_loss[loss=1.077, ctc_loss=1.019, cr_loss=0.2937, over 3993349.76 frames. ], batch size: 68, lr: 3.99e-02, grad_scale: 8.0 2024-09-13 17:30:10,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2125.0, ans=0.08671875000000001 2024-09-13 17:30:15,991 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=8.296875 2024-09-13 17:30:45,931 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=8.318125 2024-09-13 17:30:49,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=8.318125 2024-09-13 17:30:54,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=9.1575 2024-09-13 17:30:57,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.65 vs. limit=8.32875 2024-09-13 17:31:09,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2210.0, ans=0.11712499999999999 2024-09-13 17:31:19,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=8.339375 2024-09-13 17:31:27,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2266.6666666666665, ans=0.115 2024-09-13 17:31:28,485 INFO [train.py:1198] (0/2) Epoch 1, batch 800, loss[loss=0.8639, ctc_loss=0.7848, cr_loss=0.3955, over 20873.00 frames. ], tot_loss[loss=1.041, ctc_loss=0.9786, cr_loss=0.3125, over 4018860.56 frames. ], batch size: 57, lr: 3.99e-02, grad_scale: 16.0 2024-09-13 17:31:29,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=4.906666666666666 2024-09-13 17:31:38,254 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.948e+02 4.044e+02 5.901e+02 1.230e+03, threshold=8.087e+02, percent-clipped=16.0 2024-09-13 17:32:09,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.74 vs. limit=5.5808333333333335 2024-09-13 17:32:13,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2323.3333333333335, ans=0.2095833333333333 2024-09-13 17:32:46,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2380.0, ans=0.8167 2024-09-13 17:32:57,277 INFO [train.py:1198] (0/2) Epoch 1, batch 850, loss[loss=0.9102, ctc_loss=0.8309, cr_loss=0.3963, over 18188.00 frames. ], tot_loss[loss=1.005, ctc_loss=0.9395, cr_loss=0.3269, over 4032199.47 frames. ], batch size: 108, lr: 3.99e-02, grad_scale: 16.0 2024-09-13 17:33:03,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=9.30625 2024-09-13 17:34:16,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2521.6666666666665, ans=0.381796875 2024-09-13 17:34:18,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2521.6666666666665, ans=0.381796875 2024-09-13 17:34:21,202 INFO [train.py:1198] (0/2) Epoch 1, batch 900, loss[loss=0.8785, ctc_loss=0.7933, cr_loss=0.4256, over 19978.00 frames. ], tot_loss[loss=0.9681, ctc_loss=0.9004, cr_loss=0.3385, over 4048815.89 frames. ], batch size: 80, lr: 3.99e-02, grad_scale: 8.0 2024-09-13 17:34:24,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2550.0, ans=0.27449999999999997 2024-09-13 17:34:29,293 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.284e+02 3.662e+02 5.387e+02 8.238e+02 1.960e+03, threshold=1.077e+03, percent-clipped=27.0 2024-09-13 17:34:44,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2578.3333333333335, ans=0.379140625 2024-09-13 17:35:26,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=8.49875 2024-09-13 17:35:29,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=5.065333333333333 2024-09-13 17:35:43,691 INFO [train.py:1198] (0/2) Epoch 1, batch 950, loss[loss=0.7859, ctc_loss=0.7112, cr_loss=0.3738, over 20982.00 frames. ], tot_loss[loss=0.9328, ctc_loss=0.8633, cr_loss=0.3479, over 4055385.96 frames. ], batch size: 55, lr: 3.98e-02, grad_scale: 8.0 2024-09-13 17:35:57,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=9.51875 2024-09-13 17:36:19,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2748.3333333333335, ans=0.37117187500000004 2024-09-13 17:36:28,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.30 vs. limit=6.3741666666666665 2024-09-13 17:36:29,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2748.3333333333335, ans=0.035 2024-09-13 17:36:54,566 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.80 vs. limit=5.70125 2024-09-13 17:37:06,973 INFO [train.py:1198] (0/2) Epoch 1, batch 1000, loss[loss=0.7342, ctc_loss=0.65, cr_loss=0.4212, over 19053.00 frames. ], tot_loss[loss=0.8978, ctc_loss=0.8263, cr_loss=0.3576, over 4067198.63 frames. ], batch size: 42, lr: 3.98e-02, grad_scale: 8.0 2024-09-13 17:37:15,309 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.516e+02 4.793e+02 6.874e+02 9.512e+02 2.241e+03, threshold=1.375e+03, percent-clipped=19.0 2024-09-13 17:37:31,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2861.6666666666665, ans=0.365859375 2024-09-13 17:37:34,251 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.52 vs. limit=8.573125000000001 2024-09-13 17:37:53,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2890.0, ans=9.6675 2024-09-13 17:38:31,794 INFO [train.py:1198] (0/2) Epoch 1, batch 1050, loss[loss=0.6919, ctc_loss=0.6144, cr_loss=0.3871, over 21064.00 frames. ], tot_loss[loss=0.8647, ctc_loss=0.7912, cr_loss=0.3674, over 4081222.10 frames. ], batch size: 53, lr: 3.98e-02, grad_scale: 8.0 2024-09-13 17:38:52,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3003.3333333333335, ans=0.08737499999999998 2024-09-13 17:39:15,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=8.636875 2024-09-13 17:39:25,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.29 vs. limit=3.459 2024-09-13 17:39:56,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=8.66875 2024-09-13 17:39:57,639 INFO [train.py:1198] (0/2) Epoch 1, batch 1100, loss[loss=0.7377, ctc_loss=0.6492, cr_loss=0.4425, over 20665.00 frames. ], tot_loss[loss=0.8355, ctc_loss=0.7597, cr_loss=0.379, over 4095340.24 frames. ], batch size: 71, lr: 3.98e-02, grad_scale: 8.0 2024-09-13 17:40:05,376 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.321e+02 3.634e+02 5.107e+02 7.438e+02 1.793e+03, threshold=1.021e+03, percent-clipped=5.0 2024-09-13 17:41:00,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3201.6666666666665, ans=0.349921875 2024-09-13 17:41:19,645 INFO [train.py:1198] (0/2) Epoch 1, batch 1150, loss[loss=0.699, ctc_loss=0.6161, cr_loss=0.4145, over 21004.00 frames. ], tot_loss[loss=0.8085, ctc_loss=0.7307, cr_loss=0.389, over 4095132.91 frames. ], batch size: 55, lr: 3.98e-02, grad_scale: 8.0 2024-09-13 17:41:21,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3258.3333333333335, ans=0.26741666666666664 2024-09-13 17:41:58,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3315.0, ans=0.21685 2024-09-13 17:42:00,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315.0, ans=0.26685 2024-09-13 17:42:09,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.92 vs. limit=5.835833333333333 2024-09-13 17:42:34,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3371.6666666666665, ans=0.34195312499999997 2024-09-13 17:42:41,272 INFO [train.py:1198] (0/2) Epoch 1, batch 1200, loss[loss=0.669, ctc_loss=0.583, cr_loss=0.4301, over 20790.00 frames. ], tot_loss[loss=0.786, ctc_loss=0.7062, cr_loss=0.3993, over 4074982.01 frames. ], batch size: 56, lr: 3.97e-02, grad_scale: 16.0 2024-09-13 17:42:50,983 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.657e+02 3.801e+02 5.290e+02 7.416e+02 2.091e+03, threshold=1.058e+03, percent-clipped=7.0 2024-09-13 17:43:39,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.45 vs. limit=3.5227500000000003 2024-09-13 17:43:41,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3485.0, ans=0.06931249999999997 2024-09-13 17:43:52,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3513.3333333333335, ans=0.26486666666666664 2024-09-13 17:43:52,790 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.59 vs. limit=6.756666666666667 2024-09-13 17:44:06,202 INFO [train.py:1198] (0/2) Epoch 1, batch 1250, loss[loss=0.6583, ctc_loss=0.5761, cr_loss=0.4108, over 20883.00 frames. ], tot_loss[loss=0.7604, ctc_loss=0.6785, cr_loss=0.4093, over 4085637.99 frames. ], batch size: 57, lr: 3.97e-02, grad_scale: 8.0 2024-09-13 17:44:22,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3570.0, ans=0.33265625 2024-09-13 17:45:23,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3655.0, ans=0.26345 2024-09-13 17:45:30,179 INFO [train.py:1198] (0/2) Epoch 1, batch 1300, loss[loss=0.6705, ctc_loss=0.5799, cr_loss=0.4528, over 20967.00 frames. ], tot_loss[loss=0.7359, ctc_loss=0.6527, cr_loss=0.4157, over 4088586.57 frames. ], batch size: 58, lr: 3.97e-02, grad_scale: 8.0 2024-09-13 17:45:35,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3683.3333333333335, ans=0.26316666666666666 2024-09-13 17:45:39,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.613e+02 3.656e+02 4.890e+02 6.590e+02 1.194e+03, threshold=9.780e+02, percent-clipped=2.0 2024-09-13 17:45:46,893 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=8.891875 2024-09-13 17:46:13,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.71 vs. limit=6.87 2024-09-13 17:46:30,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3768.3333333333335, ans=0.323359375 2024-09-13 17:46:52,047 INFO [train.py:1198] (0/2) Epoch 1, batch 1350, loss[loss=0.6123, ctc_loss=0.533, cr_loss=0.3964, over 21078.00 frames. ], tot_loss[loss=0.7123, ctc_loss=0.6289, cr_loss=0.4171, over 4088678.34 frames. ], batch size: 59, lr: 3.97e-02, grad_scale: 8.0 2024-09-13 17:47:03,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3825.0, ans=0.320703125 2024-09-13 17:47:13,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3853.3333333333335, ans=0.013299999999999992 2024-09-13 17:47:24,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3881.6666666666665, ans=0.318046875 2024-09-13 17:47:28,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=8.955625 2024-09-13 17:48:12,907 INFO [train.py:1198] (0/2) Epoch 1, batch 1400, loss[loss=0.6197, ctc_loss=0.5349, cr_loss=0.4241, over 21079.00 frames. ], tot_loss[loss=0.6887, ctc_loss=0.6052, cr_loss=0.4173, over 4083393.27 frames. ], batch size: 59, lr: 3.97e-02, grad_scale: 8.0 2024-09-13 17:48:16,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3966.6666666666665, ans=0.3140625 2024-09-13 17:48:22,681 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.578e+02 3.553e+02 4.759e+02 7.377e+02 1.558e+03, threshold=9.517e+02, percent-clipped=7.0 2024-09-13 17:48:31,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3995.0, ans=0.312734375 2024-09-13 17:48:40,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=8.998125 2024-09-13 17:48:41,333 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 17:48:46,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4023.3333333333335, ans=0.7591833333333333 2024-09-13 17:48:51,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=9.00875 2024-09-13 17:48:54,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4023.3333333333335, ans=0.7591833333333333 2024-09-13 17:48:57,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4023.3333333333335, ans=0.20976666666666666 2024-09-13 17:49:08,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=4051.6666666666665, ans=0.2 2024-09-13 17:49:18,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.21 vs. limit=6.02 2024-09-13 17:49:33,871 INFO [train.py:1198] (0/2) Epoch 1, batch 1450, loss[loss=0.5808, ctc_loss=0.4949, cr_loss=0.4296, over 20867.00 frames. ], tot_loss[loss=0.6702, ctc_loss=0.5863, cr_loss=0.4194, over 4085851.19 frames. ], batch size: 57, lr: 3.96e-02, grad_scale: 8.0 2024-09-13 17:50:37,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4193.333333333333, ans=0.3034375 2024-09-13 17:50:45,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4221.666666666667, ans=0.302109375 2024-09-13 17:51:00,783 INFO [train.py:1198] (0/2) Epoch 1, batch 1500, loss[loss=0.506, ctc_loss=0.4343, cr_loss=0.3581, over 20974.00 frames. ], tot_loss[loss=0.6509, ctc_loss=0.5671, cr_loss=0.4193, over 4090910.72 frames. ], batch size: 49, lr: 3.96e-02, grad_scale: 8.0 2024-09-13 17:51:10,524 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.402e+02 3.618e+02 4.755e+02 6.931e+02 1.256e+03, threshold=9.511e+02, percent-clipped=8.0 2024-09-13 17:51:14,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4250.0, ans=0.2575 2024-09-13 17:51:33,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4306.666666666667, ans=0.25693333333333335 2024-09-13 17:51:34,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4306.666666666667, ans=0.298125 2024-09-13 17:51:49,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4335.0, ans=0.04860416666666667 2024-09-13 17:51:53,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4335.0, ans=0.29679687499999996 2024-09-13 17:52:05,119 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.13 vs. limit=6.090833333333333 2024-09-13 17:52:06,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.97 vs. limit=6.090833333333333 2024-09-13 17:52:09,994 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.83 vs. limit=9.13625 2024-09-13 17:52:21,674 INFO [train.py:1198] (0/2) Epoch 1, batch 1550, loss[loss=0.5827, ctc_loss=0.4967, cr_loss=0.4297, over 20814.00 frames. ], tot_loss[loss=0.6304, ctc_loss=0.5466, cr_loss=0.4189, over 4103997.30 frames. ], batch size: 59, lr: 3.96e-02, grad_scale: 8.0 2024-09-13 17:52:36,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=9.1575 2024-09-13 17:52:37,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4420.0, ans=0.29281250000000003 2024-09-13 17:52:37,570 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=5.768 2024-09-13 17:52:44,005 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=5.768 2024-09-13 17:52:49,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4420.0, ans=0.29281250000000003 2024-09-13 17:53:40,770 INFO [train.py:1198] (0/2) Epoch 1, batch 1600, loss[loss=0.5858, ctc_loss=0.4964, cr_loss=0.4474, over 20686.00 frames. ], tot_loss[loss=0.6163, ctc_loss=0.5322, cr_loss=0.4206, over 4102045.69 frames. ], batch size: 68, lr: 3.96e-02, grad_scale: 16.0 2024-09-13 17:53:50,401 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.444e+02 3.359e+02 4.265e+02 5.483e+02 1.620e+03, threshold=8.530e+02, percent-clipped=7.0 2024-09-13 17:53:57,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4561.666666666667, ans=0.04765972222222222 2024-09-13 17:54:11,739 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 17:54:21,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4590.0, ans=0.26885000000000003 2024-09-13 17:54:47,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4646.666666666667, ans=0.7373666666666667 2024-09-13 17:55:00,300 INFO [train.py:1198] (0/2) Epoch 1, batch 1650, loss[loss=0.6266, ctc_loss=0.5412, cr_loss=0.4268, over 20852.00 frames. ], tot_loss[loss=0.6056, ctc_loss=0.5212, cr_loss=0.422, over 4079845.45 frames. ], batch size: 65, lr: 3.95e-02, grad_scale: 16.0 2024-09-13 17:55:11,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4675.0, ans=0.270125 2024-09-13 17:56:05,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=9.295625 2024-09-13 17:56:09,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.00 vs. limit=9.295625 2024-09-13 17:56:21,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4816.666666666667, ans=0.27421875 2024-09-13 17:56:22,797 INFO [train.py:1198] (0/2) Epoch 1, batch 1700, loss[loss=0.5284, ctc_loss=0.449, cr_loss=0.3972, over 20985.00 frames. ], tot_loss[loss=0.5917, ctc_loss=0.5069, cr_loss=0.4237, over 4084489.91 frames. ], batch size: 55, lr: 3.95e-02, grad_scale: 16.0 2024-09-13 17:56:29,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4816.666666666667, ans=0.27421875 2024-09-13 17:56:32,452 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.493e+02 3.320e+02 4.432e+02 5.892e+02 9.131e+02, threshold=8.864e+02, percent-clipped=5.0 2024-09-13 17:56:39,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4845.0, ans=0.272890625 2024-09-13 17:56:42,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4845.0, ans=0.04647916666666667 2024-09-13 17:57:28,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4930.0, ans=0.26890625 2024-09-13 17:57:33,469 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.42 vs. limit=6.2325 2024-09-13 17:57:45,914 INFO [train.py:1198] (0/2) Epoch 1, batch 1750, loss[loss=0.6005, ctc_loss=0.5141, cr_loss=0.4317, over 20296.00 frames. ], tot_loss[loss=0.5829, ctc_loss=0.498, cr_loss=0.4247, over 4069309.50 frames. ], batch size: 74, lr: 3.95e-02, grad_scale: 16.0 2024-09-13 17:57:49,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4958.333333333333, ans=0.267578125 2024-09-13 17:58:22,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=11.26125 2024-09-13 17:58:39,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5043.333333333333, ans=0.26359374999999996 2024-09-13 17:59:05,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5100.0, ans=0.0 2024-09-13 17:59:06,812 INFO [train.py:1198] (0/2) Epoch 1, batch 1800, loss[loss=0.4717, ctc_loss=0.3891, cr_loss=0.4129, over 20935.00 frames. ], tot_loss[loss=0.5707, ctc_loss=0.4855, cr_loss=0.4257, over 4075799.04 frames. ], batch size: 48, lr: 3.94e-02, grad_scale: 16.0 2024-09-13 17:59:15,947 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 3.324e+02 4.368e+02 6.109e+02 1.538e+03, threshold=8.735e+02, percent-clipped=7.0 2024-09-13 17:59:18,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5100.0, ans=0.0 2024-09-13 17:59:43,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5156.666666666667, ans=0.25828125 2024-09-13 18:00:26,414 INFO [train.py:1198] (0/2) Epoch 1, batch 1850, loss[loss=0.5698, ctc_loss=0.4844, cr_loss=0.4269, over 19366.00 frames. ], tot_loss[loss=0.561, ctc_loss=0.4755, cr_loss=0.4275, over 4095495.66 frames. ], batch size: 90, lr: 3.94e-02, grad_scale: 16.0 2024-09-13 18:00:41,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5270.0, ans=0.25296874999999996 2024-09-13 18:01:01,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5298.333333333333, ans=0.044590277777777784 2024-09-13 18:01:45,272 INFO [train.py:1198] (0/2) Epoch 1, batch 1900, loss[loss=0.462, ctc_loss=0.3842, cr_loss=0.3889, over 20952.00 frames. ], tot_loss[loss=0.5542, ctc_loss=0.4684, cr_loss=0.4291, over 4079504.40 frames. ], batch size: 50, lr: 3.94e-02, grad_scale: 16.0 2024-09-13 18:01:48,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5383.333333333333, ans=0.24765625000000002 2024-09-13 18:01:53,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=5383.333333333333, ans=0.025 2024-09-13 18:01:58,057 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.212e+02 4.078e+02 5.502e+02 9.178e+02, threshold=8.156e+02, percent-clipped=1.0 2024-09-13 18:02:24,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=11.58 2024-09-13 18:02:30,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5440.0, ans=0.245 2024-09-13 18:02:39,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5468.333333333333, ans=0.24367187499999998 2024-09-13 18:03:10,665 INFO [train.py:1198] (0/2) Epoch 1, batch 1950, loss[loss=0.4719, ctc_loss=0.393, cr_loss=0.3945, over 20303.00 frames. ], tot_loss[loss=0.5466, ctc_loss=0.4605, cr_loss=0.4304, over 4089664.96 frames. ], batch size: 45, lr: 3.94e-02, grad_scale: 16.0 2024-09-13 18:03:49,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5581.666666666667, ans=0.009656159420289856 2024-09-13 18:04:08,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5610.0, ans=0.23703125000000003 2024-09-13 18:04:14,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=5638.333333333333, ans=0.025 2024-09-13 18:04:20,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5638.333333333333, ans=0.24361666666666665 2024-09-13 18:04:30,032 INFO [train.py:1198] (0/2) Epoch 1, batch 2000, loss[loss=0.5616, ctc_loss=0.477, cr_loss=0.4232, over 20272.00 frames. ], tot_loss[loss=0.5402, ctc_loss=0.4539, cr_loss=0.4312, over 4079159.01 frames. ], batch size: 74, lr: 3.93e-02, grad_scale: 32.0 2024-09-13 18:04:39,335 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.265e+02 3.144e+02 3.828e+02 5.126e+02 9.280e+02, threshold=7.656e+02, percent-clipped=2.0 2024-09-13 18:04:41,680 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=11.75 2024-09-13 18:05:41,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=7.890000000000001 2024-09-13 18:05:46,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=9.6675 2024-09-13 18:05:48,728 INFO [train.py:1198] (0/2) Epoch 1, batch 2050, loss[loss=0.5605, ctc_loss=0.4721, cr_loss=0.4422, over 18458.00 frames. ], tot_loss[loss=0.5339, ctc_loss=0.4476, cr_loss=0.4318, over 4072712.58 frames. ], batch size: 108, lr: 3.93e-02, grad_scale: 16.0 2024-09-13 18:06:25,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5865.0, ans=0.009594565217391304 2024-09-13 18:06:36,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5893.333333333333, ans=0.00958840579710145 2024-09-13 18:06:40,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5893.333333333333, ans=0.04211111111111111 2024-09-13 18:07:07,434 INFO [train.py:1198] (0/2) Epoch 1, batch 2100, loss[loss=0.5097, ctc_loss=0.4258, cr_loss=0.4196, over 20918.00 frames. ], tot_loss[loss=0.5281, ctc_loss=0.4414, cr_loss=0.4335, over 4086039.03 frames. ], batch size: 60, lr: 3.93e-02, grad_scale: 16.0 2024-09-13 18:07:15,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5950.0, ans=0.22109374999999998 2024-09-13 18:07:18,649 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.330e+02 3.500e+02 4.627e+02 5.729e+02 1.082e+03, threshold=9.254e+02, percent-clipped=7.0 2024-09-13 18:07:40,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.32 vs. limit=6.501666666666667 2024-09-13 18:08:19,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.45 vs. limit=8.031666666666666 2024-09-13 18:08:29,522 INFO [train.py:1198] (0/2) Epoch 1, batch 2150, loss[loss=0.5329, ctc_loss=0.4421, cr_loss=0.4543, over 20961.00 frames. ], tot_loss[loss=0.5215, ctc_loss=0.4349, cr_loss=0.4334, over 4089873.25 frames. ], batch size: 64, lr: 3.92e-02, grad_scale: 16.0 2024-09-13 18:08:59,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=6120.0, ans=0.04116666666666667 2024-09-13 18:09:10,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=6148.333333333333, ans=0.211796875 2024-09-13 18:09:12,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=6148.333333333333, ans=0.04104861111111111 2024-09-13 18:09:14,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=9.805625 2024-09-13 18:09:16,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=12.11125 2024-09-13 18:09:33,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=6205.0, ans=0.20914062500000002 2024-09-13 18:09:50,803 INFO [train.py:1198] (0/2) Epoch 1, batch 2200, loss[loss=0.516, ctc_loss=0.4306, cr_loss=0.4273, over 21011.00 frames. ], tot_loss[loss=0.5159, ctc_loss=0.429, cr_loss=0.4342, over 4103064.78 frames. ], batch size: 61, lr: 3.92e-02, grad_scale: 16.0 2024-09-13 18:10:01,802 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.284e+02 3.094e+02 3.918e+02 5.156e+02 9.762e+02, threshold=7.835e+02, percent-clipped=1.0 2024-09-13 18:10:13,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=6261.666666666667, ans=0.206484375 2024-09-13 18:10:44,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=6318.333333333333, ans=0.20382812500000003 2024-09-13 18:11:09,024 INFO [train.py:1198] (0/2) Epoch 1, batch 2250, loss[loss=0.5105, ctc_loss=0.4199, cr_loss=0.4532, over 20930.00 frames. ], tot_loss[loss=0.5135, ctc_loss=0.4262, cr_loss=0.4366, over 4110856.94 frames. ], batch size: 60, lr: 3.91e-02, grad_scale: 16.0 2024-09-13 18:11:10,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=6375.0, ans=0.676875 2024-09-13 18:11:44,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=6431.666666666667, ans=6.572666666666667 2024-09-13 18:11:53,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=6431.666666666667, ans=0.19851562499999997 2024-09-13 18:12:26,881 INFO [train.py:1198] (0/2) Epoch 1, batch 2300, loss[loss=0.4745, ctc_loss=0.3883, cr_loss=0.4312, over 21081.00 frames. ], tot_loss[loss=0.5098, ctc_loss=0.4225, cr_loss=0.4368, over 4105450.29 frames. ], batch size: 53, lr: 3.91e-02, grad_scale: 16.0 2024-09-13 18:12:37,759 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 3.017e+02 3.614e+02 5.021e+02 8.740e+02, threshold=7.228e+02, percent-clipped=1.0 2024-09-13 18:12:51,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=6545.0, ans=0.009446739130434784 2024-09-13 18:12:52,411 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=9.954375 2024-09-13 18:12:55,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6545.0, ans=0.23454999999999998 2024-09-13 18:13:04,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=6573.333333333333, ans=0.0 2024-09-13 18:13:12,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=6601.666666666667, ans=0.190546875 2024-09-13 18:13:47,109 INFO [train.py:1198] (0/2) Epoch 1, batch 2350, loss[loss=0.4854, ctc_loss=0.397, cr_loss=0.4422, over 20987.00 frames. ], tot_loss[loss=0.5046, ctc_loss=0.4172, cr_loss=0.4366, over 4108536.80 frames. ], batch size: 55, lr: 3.91e-02, grad_scale: 16.0 2024-09-13 18:15:07,812 INFO [train.py:1198] (0/2) Epoch 1, batch 2400, loss[loss=0.5972, ctc_loss=0.5027, cr_loss=0.4724, over 14630.00 frames. ], tot_loss[loss=0.5039, ctc_loss=0.4163, cr_loss=0.4383, over 4095330.38 frames. ], batch size: 149, lr: 3.90e-02, grad_scale: 32.0 2024-09-13 18:15:18,477 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.319e+02 3.072e+02 3.665e+02 4.911e+02 8.026e+02, threshold=7.330e+02, percent-clipped=1.0 2024-09-13 18:15:36,337 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=10.060625 2024-09-13 18:15:40,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=6856.666666666667, ans=0.025 2024-09-13 18:16:24,716 INFO [train.py:1198] (0/2) Epoch 1, batch 2450, loss[loss=0.5855, ctc_loss=0.4903, cr_loss=0.4758, over 14206.00 frames. ], tot_loss[loss=0.5014, ctc_loss=0.4135, cr_loss=0.4395, over 4086568.91 frames. ], batch size: 149, lr: 3.90e-02, grad_scale: 32.0 2024-09-13 18:16:31,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=6941.666666666667, ans=9.338541666666668 2024-09-13 18:17:04,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=6998.333333333333, ans=0.03750694444444445 2024-09-13 18:17:42,461 INFO [train.py:1198] (0/2) Epoch 1, batch 2500, loss[loss=0.4704, ctc_loss=0.3794, cr_loss=0.4549, over 20831.00 frames. ], tot_loss[loss=0.4983, ctc_loss=0.4104, cr_loss=0.4396, over 4089477.33 frames. ], batch size: 59, lr: 3.90e-02, grad_scale: 32.0 2024-09-13 18:17:53,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.02 vs. limit=10.15625 2024-09-13 18:17:53,490 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.380e+02 3.001e+02 3.886e+02 5.271e+02 1.281e+03, threshold=7.771e+02, percent-clipped=14.0 2024-09-13 18:17:56,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=7111.666666666667, ans=0.6510916666666666 2024-09-13 18:18:33,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=7168.333333333333, ans=0.16398437500000002 2024-09-13 18:18:44,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=10.19875 2024-09-13 18:18:52,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=7196.666666666667, ans=0.04949747468305833 2024-09-13 18:18:59,481 INFO [train.py:1198] (0/2) Epoch 1, batch 2550, loss[loss=0.4792, ctc_loss=0.389, cr_loss=0.4511, over 20952.00 frames. ], tot_loss[loss=0.4931, ctc_loss=0.4054, cr_loss=0.4384, over 4098426.46 frames. ], batch size: 55, lr: 3.89e-02, grad_scale: 32.0 2024-09-13 18:19:26,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=10.22 2024-09-13 18:19:57,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=7310.0, ans=0.0 2024-09-13 18:20:08,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=10.251875 2024-09-13 18:20:18,634 INFO [train.py:1198] (0/2) Epoch 1, batch 2600, loss[loss=0.514, ctc_loss=0.4183, cr_loss=0.4788, over 21064.00 frames. ], tot_loss[loss=0.4923, ctc_loss=0.4044, cr_loss=0.4393, over 4094352.39 frames. ], batch size: 59, lr: 3.89e-02, grad_scale: 32.0 2024-09-13 18:20:28,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=7366.666666666667, ans=0.15468749999999998 2024-09-13 18:20:29,396 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.919e+02 3.619e+02 4.799e+02 9.059e+02, threshold=7.237e+02, percent-clipped=3.0 2024-09-13 18:21:30,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=7480.0, ans=0.8248 2024-09-13 18:21:37,848 INFO [train.py:1198] (0/2) Epoch 1, batch 2650, loss[loss=0.4616, ctc_loss=0.3772, cr_loss=0.4219, over 20961.00 frames. ], tot_loss[loss=0.4884, ctc_loss=0.4007, cr_loss=0.4385, over 4095504.56 frames. ], batch size: 58, lr: 3.88e-02, grad_scale: 32.0 2024-09-13 18:21:47,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=7508.333333333333, ans=0.14804687500000002 2024-09-13 18:22:05,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=7536.666666666667, ans=0.04949747468305833 2024-09-13 18:22:10,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=4.13475 2024-09-13 18:22:15,123 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.23 vs. limit=8.7825 2024-09-13 18:22:41,354 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=7.048666666666667 2024-09-13 18:22:44,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=7621.666666666667, ans=0.14273437500000002 2024-09-13 18:22:54,215 INFO [train.py:1198] (0/2) Epoch 1, batch 2700, loss[loss=0.5168, ctc_loss=0.423, cr_loss=0.4688, over 21007.00 frames. ], tot_loss[loss=0.4868, ctc_loss=0.3989, cr_loss=0.4395, over 4095785.30 frames. ], batch size: 63, lr: 3.88e-02, grad_scale: 32.0 2024-09-13 18:23:04,891 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.346e+02 2.928e+02 3.699e+02 4.745e+02 9.041e+02, threshold=7.397e+02, percent-clipped=3.0 2024-09-13 18:23:32,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=7706.666666666667, ans=0.13874999999999998 2024-09-13 18:23:36,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=7706.666666666667, ans=0.13874999999999998 2024-09-13 18:23:59,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7763.333333333333, ans=0.22236666666666666 2024-09-13 18:24:10,416 INFO [train.py:1198] (0/2) Epoch 1, batch 2750, loss[loss=0.4306, ctc_loss=0.3478, cr_loss=0.4142, over 21046.00 frames. ], tot_loss[loss=0.4825, ctc_loss=0.395, cr_loss=0.438, over 4087856.67 frames. ], batch size: 53, lr: 3.88e-02, grad_scale: 32.0 2024-09-13 18:24:20,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=13.34375 2024-09-13 18:25:23,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7905.0, ans=0.22094999999999998 2024-09-13 18:25:29,558 INFO [train.py:1198] (0/2) Epoch 1, batch 2800, loss[loss=0.4907, ctc_loss=0.4011, cr_loss=0.4482, over 21017.00 frames. ], tot_loss[loss=0.4803, ctc_loss=0.3928, cr_loss=0.4377, over 4083739.48 frames. ], batch size: 61, lr: 3.87e-02, grad_scale: 32.0 2024-09-13 18:25:35,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=7933.333333333333, ans=0.128125 2024-09-13 18:25:40,317 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.868e+02 3.680e+02 5.040e+02 8.412e+02, threshold=7.361e+02, percent-clipped=5.0 2024-09-13 18:26:33,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8046.666666666667, ans=0.21953333333333333 2024-09-13 18:26:48,200 INFO [train.py:1198] (0/2) Epoch 1, batch 2850, loss[loss=0.5264, ctc_loss=0.4284, cr_loss=0.4896, over 21021.00 frames. ], tot_loss[loss=0.4791, ctc_loss=0.3912, cr_loss=0.4394, over 4085834.85 frames. ], batch size: 63, lr: 3.87e-02, grad_scale: 32.0 2024-09-13 18:26:55,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=8075.0, ans=0.025 2024-09-13 18:27:03,449 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:28:00,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=8188.333333333333, ans=0.6134083333333333 2024-09-13 18:28:04,323 INFO [train.py:1198] (0/2) Epoch 1, batch 2900, loss[loss=0.5004, ctc_loss=0.4066, cr_loss=0.4691, over 20639.00 frames. ], tot_loss[loss=0.4768, ctc_loss=0.3888, cr_loss=0.4399, over 4088456.02 frames. ], batch size: 71, lr: 3.86e-02, grad_scale: 32.0 2024-09-13 18:28:09,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=8216.666666666666, ans=0.125 2024-09-13 18:28:12,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=8216.666666666666, ans=0.125 2024-09-13 18:28:14,832 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.332e+02 3.038e+02 4.095e+02 5.397e+02 9.687e+02, threshold=8.190e+02, percent-clipped=4.0 2024-09-13 18:28:24,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=8245.0, ans=0.125 2024-09-13 18:28:59,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=8301.666666666666, ans=0.125 2024-09-13 18:29:12,747 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:29:20,163 INFO [train.py:1198] (0/2) Epoch 1, batch 2950, loss[loss=0.4453, ctc_loss=0.3571, cr_loss=0.441, over 21098.00 frames. ], tot_loss[loss=0.4738, ctc_loss=0.3858, cr_loss=0.4398, over 4097982.18 frames. ], batch size: 59, lr: 3.86e-02, grad_scale: 32.0 2024-09-13 18:29:26,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=8358.333333333334, ans=0.125 2024-09-13 18:29:53,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=8415.0, ans=0.03160416666666667 2024-09-13 18:30:23,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=8471.666666666666, ans=0.125 2024-09-13 18:30:32,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=8471.666666666666, ans=0.125 2024-09-13 18:30:35,547 INFO [train.py:1198] (0/2) Epoch 1, batch 3000, loss[loss=0.5159, ctc_loss=0.4155, cr_loss=0.5018, over 21025.00 frames. ], tot_loss[loss=0.4698, ctc_loss=0.382, cr_loss=0.4389, over 4103791.51 frames. ], batch size: 61, lr: 3.85e-02, grad_scale: 32.0 2024-09-13 18:30:35,549 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-13 18:30:48,742 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9465, 4.6436, 4.8044, 4.5497], device='cuda:0') 2024-09-13 18:30:56,770 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1678, 4.2062, 4.1293, 3.9554], device='cuda:0') 2024-09-13 18:30:57,551 INFO [train.py:1230] (0/2) Epoch 1, validation: loss=0.1613, ctc_loss=0.1613, cr_loss=9.026e-15, over 944034.00 frames. 2024-09-13 18:30:57,552 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-13 18:31:05,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=8500.0, ans=0.03125 2024-09-13 18:31:08,239 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.089e+02 2.778e+02 3.342e+02 4.333e+02 9.090e+02, threshold=6.683e+02, percent-clipped=2.0 2024-09-13 18:31:37,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=8556.666666666666, ans=0.6005166666666667 2024-09-13 18:32:08,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=8613.333333333334, ans=0.125 2024-09-13 18:32:08,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=10.73 2024-09-13 18:32:15,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=8641.666666666666, ans=0.125 2024-09-13 18:32:16,940 INFO [train.py:1198] (0/2) Epoch 1, batch 3050, loss[loss=0.4619, ctc_loss=0.3765, cr_loss=0.427, over 20948.00 frames. ], tot_loss[loss=0.4675, ctc_loss=0.3798, cr_loss=0.4388, over 4096954.60 frames. ], batch size: 60, lr: 3.85e-02, grad_scale: 32.0 2024-09-13 18:32:44,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=4.3004999999999995 2024-09-13 18:33:22,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=10.783125 2024-09-13 18:33:32,812 INFO [train.py:1198] (0/2) Epoch 1, batch 3100, loss[loss=0.4342, ctc_loss=0.3513, cr_loss=0.4143, over 20979.00 frames. ], tot_loss[loss=0.4669, ctc_loss=0.379, cr_loss=0.4394, over 4090151.68 frames. ], batch size: 55, lr: 3.85e-02, grad_scale: 32.0 2024-09-13 18:33:43,719 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.264e+02 3.029e+02 3.778e+02 4.657e+02 9.135e+02, threshold=7.556e+02, percent-clipped=6.0 2024-09-13 18:34:11,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=8840.0, ans=0.008947826086956523 2024-09-13 18:34:38,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8896.666666666666, ans=0.21103333333333335 2024-09-13 18:34:48,952 INFO [train.py:1198] (0/2) Epoch 1, batch 3150, loss[loss=0.4786, ctc_loss=0.385, cr_loss=0.4682, over 20977.00 frames. ], tot_loss[loss=0.4642, ctc_loss=0.3765, cr_loss=0.4389, over 4087265.28 frames. ], batch size: 58, lr: 3.84e-02, grad_scale: 32.0 2024-09-13 18:34:53,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=8925.0, ans=0.02947916666666667 2024-09-13 18:35:15,019 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:35:17,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=8981.666666666666, ans=0.5856416666666667 2024-09-13 18:35:36,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=9010.0, ans=0.125 2024-09-13 18:35:36,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=9010.0, ans=0.125 2024-09-13 18:35:39,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=9010.0, ans=0.125 2024-09-13 18:35:53,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.12 vs. limit=9.519166666666667 2024-09-13 18:35:59,814 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=10.889375 2024-09-13 18:36:05,213 INFO [train.py:1198] (0/2) Epoch 1, batch 3200, loss[loss=0.4239, ctc_loss=0.3361, cr_loss=0.4391, over 21073.00 frames. ], tot_loss[loss=0.4635, ctc_loss=0.3756, cr_loss=0.4394, over 4085543.53 frames. ], batch size: 59, lr: 3.84e-02, grad_scale: 32.0 2024-09-13 18:36:15,965 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 2.896e+02 3.667e+02 4.425e+02 9.257e+02, threshold=7.334e+02, percent-clipped=1.0 2024-09-13 18:36:29,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9095.0, ans=0.20905 2024-09-13 18:36:48,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=9123.333333333334, ans=0.05 2024-09-13 18:36:54,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=9151.666666666666, ans=0.008880072463768117 2024-09-13 18:37:24,564 INFO [train.py:1198] (0/2) Epoch 1, batch 3250, loss[loss=0.4471, ctc_loss=0.3601, cr_loss=0.4349, over 20841.00 frames. ], tot_loss[loss=0.4634, ctc_loss=0.3755, cr_loss=0.4396, over 4062352.95 frames. ], batch size: 59, lr: 3.83e-02, grad_scale: 32.0 2024-09-13 18:37:26,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=7.683333333333334 2024-09-13 18:37:37,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=9208.333333333334, ans=0.008867753623188406 2024-09-13 18:37:47,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=9236.666666666666, ans=0.02818055555555556 2024-09-13 18:38:21,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=9293.333333333334, ans=0.125 2024-09-13 18:38:29,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=9321.666666666666, ans=0.0 2024-09-13 18:38:29,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=10.995625 2024-09-13 18:38:42,162 INFO [train.py:1198] (0/2) Epoch 1, batch 3300, loss[loss=0.55, ctc_loss=0.4681, cr_loss=0.4098, over 13918.00 frames. ], tot_loss[loss=0.4605, ctc_loss=0.3725, cr_loss=0.4397, over 4074327.03 frames. ], batch size: 150, lr: 3.83e-02, grad_scale: 32.0 2024-09-13 18:38:52,781 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.254e+02 2.738e+02 3.079e+02 3.871e+02 7.046e+02, threshold=6.159e+02, percent-clipped=0.0 2024-09-13 18:39:02,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=9378.333333333334, ans=0.008830797101449276 2024-09-13 18:39:32,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9435.0, ans=0.125 2024-09-13 18:39:53,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9463.333333333334, ans=0.20536666666666664 2024-09-13 18:39:57,822 INFO [train.py:1198] (0/2) Epoch 1, batch 3350, loss[loss=0.3857, ctc_loss=0.3079, cr_loss=0.3892, over 20354.00 frames. ], tot_loss[loss=0.4585, ctc_loss=0.3706, cr_loss=0.4395, over 4078528.36 frames. ], batch size: 45, lr: 3.82e-02, grad_scale: 32.0 2024-09-13 18:40:18,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=9520.0, ans=0.0088 2024-09-13 18:40:36,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=9548.333333333334, ans=0.5658083333333334 2024-09-13 18:40:51,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=9576.666666666666, ans=0.025 2024-09-13 18:40:52,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=9576.666666666666, ans=0.5648166666666667 2024-09-13 18:41:12,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=9633.333333333334, ans=0.02652777777777778 2024-09-13 18:41:13,483 INFO [train.py:1198] (0/2) Epoch 1, batch 3400, loss[loss=0.4661, ctc_loss=0.3769, cr_loss=0.4456, over 20964.00 frames. ], tot_loss[loss=0.459, ctc_loss=0.3709, cr_loss=0.4407, over 4076414.56 frames. ], batch size: 58, lr: 3.82e-02, grad_scale: 32.0 2024-09-13 18:41:23,934 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.247e+02 2.740e+02 3.694e+02 4.563e+02 9.142e+02, threshold=7.388e+02, percent-clipped=10.0 2024-09-13 18:42:31,734 INFO [train.py:1198] (0/2) Epoch 1, batch 3450, loss[loss=0.4633, ctc_loss=0.3768, cr_loss=0.4324, over 20651.00 frames. ], tot_loss[loss=0.4565, ctc_loss=0.3684, cr_loss=0.4406, over 4088738.72 frames. ], batch size: 68, lr: 3.81e-02, grad_scale: 32.0 2024-09-13 18:42:32,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=9775.0, ans=0.125 2024-09-13 18:42:53,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=9803.333333333334, ans=0.008738405797101449 2024-09-13 18:43:14,744 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=11.186875 2024-09-13 18:43:17,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=9860.0, ans=0.125 2024-09-13 18:43:50,186 INFO [train.py:1198] (0/2) Epoch 1, batch 3500, loss[loss=0.459, ctc_loss=0.372, cr_loss=0.4349, over 20299.00 frames. ], tot_loss[loss=0.4555, ctc_loss=0.3673, cr_loss=0.441, over 4086354.98 frames. ], batch size: 74, lr: 3.81e-02, grad_scale: 32.0 2024-09-13 18:44:00,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.299e+02 2.941e+02 3.448e+02 4.907e+02 9.728e+02, threshold=6.896e+02, percent-clipped=8.0 2024-09-13 18:44:02,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=9916.666666666666, ans=0.125 2024-09-13 18:44:02,862 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.20 vs. limit=9.958333333333332 2024-09-13 18:44:03,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=9945.0, ans=0.02522916666666667 2024-09-13 18:44:45,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=10001.666666666666, ans=0.125 2024-09-13 18:44:54,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=10030.0, ans=0.024875 2024-09-13 18:45:05,989 INFO [train.py:1198] (0/2) Epoch 1, batch 3550, loss[loss=0.4263, ctc_loss=0.342, cr_loss=0.4218, over 21021.00 frames. ], tot_loss[loss=0.4546, ctc_loss=0.3663, cr_loss=0.4419, over 4085828.34 frames. ], batch size: 61, lr: 3.80e-02, grad_scale: 32.0 2024-09-13 18:45:06,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=10058.333333333334, ans=0.125 2024-09-13 18:45:14,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=15.04375 2024-09-13 18:45:31,373 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=11.2825 2024-09-13 18:45:49,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.42 vs. limit=11.293125 2024-09-13 18:45:59,812 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=11.30375 2024-09-13 18:46:22,263 INFO [train.py:1198] (0/2) Epoch 1, batch 3600, loss[loss=0.4341, ctc_loss=0.3486, cr_loss=0.4274, over 21027.00 frames. ], tot_loss[loss=0.4539, ctc_loss=0.3655, cr_loss=0.4421, over 4092736.51 frames. ], batch size: 61, lr: 3.80e-02, grad_scale: 32.0 2024-09-13 18:46:22,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10200.0, ans=0.198 2024-09-13 18:46:32,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.266e+02 2.835e+02 3.634e+02 4.353e+02 8.135e+02, threshold=7.267e+02, percent-clipped=1.0 2024-09-13 18:46:34,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=10200.0, ans=0.543 2024-09-13 18:46:46,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=10228.333333333334, ans=0.0 2024-09-13 18:47:14,381 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 18:47:21,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=10313.333333333334, ans=0.05 2024-09-13 18:47:24,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=10313.333333333334, ans=0.125 2024-09-13 18:47:38,068 INFO [train.py:1198] (0/2) Epoch 1, batch 3650, loss[loss=0.4195, ctc_loss=0.333, cr_loss=0.4323, over 20868.00 frames. ], tot_loss[loss=0.4545, ctc_loss=0.366, cr_loss=0.4424, over 4083146.04 frames. ], batch size: 57, lr: 3.79e-02, grad_scale: 32.0 2024-09-13 18:47:44,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10341.666666666666, ans=0.19658333333333333 2024-09-13 18:47:56,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=10370.0, ans=0.53705 2024-09-13 18:48:02,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=10370.0, ans=0.023458333333333335 2024-09-13 18:48:05,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=15.2775 2024-09-13 18:48:42,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=10455.0, ans=0.008596739130434784 2024-09-13 18:48:56,666 INFO [train.py:1198] (0/2) Epoch 1, batch 3700, loss[loss=0.4805, ctc_loss=0.3944, cr_loss=0.4306, over 14302.00 frames. ], tot_loss[loss=0.4527, ctc_loss=0.3641, cr_loss=0.4433, over 4091212.78 frames. ], batch size: 149, lr: 3.79e-02, grad_scale: 32.0 2024-09-13 18:49:10,260 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.323e+02 2.808e+02 3.446e+02 4.369e+02 7.538e+02, threshold=6.891e+02, percent-clipped=1.0 2024-09-13 18:49:43,161 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.91 vs. limit=15.405000000000001 2024-09-13 18:50:16,000 INFO [train.py:1198] (0/2) Epoch 1, batch 3750, loss[loss=0.3638, ctc_loss=0.2818, cr_loss=0.4099, over 21009.00 frames. ], tot_loss[loss=0.4507, ctc_loss=0.3622, cr_loss=0.4428, over 4085199.37 frames. ], batch size: 48, lr: 3.78e-02, grad_scale: 32.0 2024-09-13 18:50:16,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=8.25 2024-09-13 18:50:25,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=10625.0, ans=0.008559782608695653 2024-09-13 18:50:39,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=10653.333333333334, ans=0.5271333333333335 2024-09-13 18:50:44,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=10653.333333333334, ans=0.022277777777777775 2024-09-13 18:50:44,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=10653.333333333334, ans=0.125 2024-09-13 18:50:45,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=10681.666666666666, ans=10.0 2024-09-13 18:51:09,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=10710.0, ans=0.022041666666666668 2024-09-13 18:51:14,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=11.51625 2024-09-13 18:51:17,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=10738.333333333334, ans=0.02192361111111111 2024-09-13 18:51:32,155 INFO [train.py:1198] (0/2) Epoch 1, batch 3800, loss[loss=0.4245, ctc_loss=0.3362, cr_loss=0.4418, over 21055.00 frames. ], tot_loss[loss=0.4471, ctc_loss=0.3588, cr_loss=0.4416, over 4093977.95 frames. ], batch size: 53, lr: 3.78e-02, grad_scale: 32.0 2024-09-13 18:51:37,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=10766.666666666666, ans=0.02180555555555556 2024-09-13 18:51:41,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.89 vs. limit=15.575 2024-09-13 18:51:41,981 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=11.5375 2024-09-13 18:51:42,517 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.832e+02 3.344e+02 4.065e+02 7.707e+02, threshold=6.689e+02, percent-clipped=3.0 2024-09-13 18:51:52,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=10795.0, ans=0.125 2024-09-13 18:52:47,782 INFO [train.py:1198] (0/2) Epoch 1, batch 3850, loss[loss=0.4035, ctc_loss=0.3201, cr_loss=0.4173, over 20790.00 frames. ], tot_loss[loss=0.4449, ctc_loss=0.3567, cr_loss=0.4407, over 4088935.75 frames. ], batch size: 53, lr: 3.77e-02, grad_scale: 32.0 2024-09-13 18:52:49,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=11.590625 2024-09-13 18:53:07,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=10936.666666666666, ans=0.021097222222222226 2024-09-13 18:53:13,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=10936.666666666666, ans=0.0 2024-09-13 18:53:19,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=10965.0, ans=0.035 2024-09-13 18:53:20,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=11.611875000000001 2024-09-13 18:53:58,261 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=11.633125 2024-09-13 18:54:06,673 INFO [train.py:1198] (0/2) Epoch 1, batch 3900, loss[loss=0.4363, ctc_loss=0.3514, cr_loss=0.4244, over 21070.00 frames. ], tot_loss[loss=0.4441, ctc_loss=0.3556, cr_loss=0.4422, over 4102071.24 frames. ], batch size: 53, lr: 3.77e-02, grad_scale: 32.0 2024-09-13 18:54:16,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=11050.0, ans=0.5132500000000001 2024-09-13 18:54:16,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11050.0, ans=0.1895 2024-09-13 18:54:16,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=19.13 vs. limit=11.64375 2024-09-13 18:54:17,389 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.325e+02 2.905e+02 3.463e+02 4.744e+02 8.766e+02, threshold=6.927e+02, percent-clipped=3.0 2024-09-13 18:54:35,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=11106.666666666666, ans=0.125 2024-09-13 18:54:53,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=11135.0, ans=0.025 2024-09-13 18:54:56,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.85125 2024-09-13 18:54:57,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=11135.0, ans=0.025 2024-09-13 18:55:16,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=11163.333333333334, ans=0.125 2024-09-13 18:55:17,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.72 vs. limit=10.581666666666667 2024-09-13 18:55:25,376 INFO [train.py:1198] (0/2) Epoch 1, batch 3950, loss[loss=0.4493, ctc_loss=0.3574, cr_loss=0.4596, over 21081.00 frames. ], tot_loss[loss=0.442, ctc_loss=0.3538, cr_loss=0.441, over 4091864.87 frames. ], batch size: 59, lr: 3.76e-02, grad_scale: 32.0 2024-09-13 18:55:55,256 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.47 vs. limit=7.812083333333334 2024-09-13 18:56:16,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=11.72875 2024-09-13 18:56:40,199 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-4000.pt 2024-09-13 18:56:42,533 INFO [train.py:1198] (0/2) Epoch 1, batch 4000, loss[loss=0.4365, ctc_loss=0.3438, cr_loss=0.463, over 20926.00 frames. ], tot_loss[loss=0.4395, ctc_loss=0.3515, cr_loss=0.4402, over 4103111.44 frames. ], batch size: 60, lr: 3.76e-02, grad_scale: 32.0 2024-09-13 18:56:53,352 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.193e+02 2.621e+02 3.193e+02 4.122e+02 6.110e+02, threshold=6.387e+02, percent-clipped=0.0 2024-09-13 18:57:18,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=11390.0, ans=0.125 2024-09-13 18:57:26,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=11418.333333333334, ans=0.125 2024-09-13 18:57:35,052 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=11.781875 2024-09-13 18:57:38,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11418.333333333334, ans=0.18581666666666666 2024-09-13 18:57:42,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=11.7925 2024-09-13 18:57:58,254 INFO [train.py:1198] (0/2) Epoch 1, batch 4050, loss[loss=0.3725, ctc_loss=0.2982, cr_loss=0.3717, over 20969.00 frames. ], tot_loss[loss=0.4394, ctc_loss=0.3513, cr_loss=0.4405, over 4098165.02 frames. ], batch size: 49, lr: 3.75e-02, grad_scale: 64.0 2024-09-13 18:58:09,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=11475.0, ans=0.01885416666666667 2024-09-13 18:58:22,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=11503.333333333334, ans=0.125 2024-09-13 18:58:27,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=11531.666666666666, ans=0.01861805555555556 2024-09-13 18:58:27,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=11531.666666666666, ans=0.125 2024-09-13 18:59:04,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11588.333333333334, ans=0.18411666666666665 2024-09-13 18:59:14,034 INFO [train.py:1198] (0/2) Epoch 1, batch 4100, loss[loss=0.4431, ctc_loss=0.3525, cr_loss=0.453, over 21037.00 frames. ], tot_loss[loss=0.4387, ctc_loss=0.3507, cr_loss=0.4398, over 4090243.60 frames. ], batch size: 62, lr: 3.75e-02, grad_scale: 64.0 2024-09-13 18:59:24,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11616.666666666666, ans=0.18383333333333335 2024-09-13 18:59:27,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.32 vs. limit=4.7425 2024-09-13 18:59:27,333 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.290e+02 2.785e+02 3.445e+02 4.527e+02 6.726e+02, threshold=6.889e+02, percent-clipped=4.0 2024-09-13 18:59:32,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.70 vs. limit=10.8225 2024-09-13 18:59:44,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11645.0, ans=0.18355 2024-09-13 18:59:47,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=11673.333333333334, ans=0.025 2024-09-13 18:59:56,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=11673.333333333334, ans=0.018027777777777775 2024-09-13 19:00:35,236 INFO [train.py:1198] (0/2) Epoch 1, batch 4150, loss[loss=0.3766, ctc_loss=0.2958, cr_loss=0.4042, over 20979.00 frames. ], tot_loss[loss=0.4353, ctc_loss=0.3477, cr_loss=0.4383, over 4101558.70 frames. ], batch size: 52, lr: 3.74e-02, grad_scale: 64.0 2024-09-13 19:00:49,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=4.768 2024-09-13 19:01:08,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=11815.0, ans=0.125 2024-09-13 19:01:16,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=11815.0, ans=0.125 2024-09-13 19:01:38,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11871.666666666666, ans=0.18128333333333335 2024-09-13 19:01:50,540 INFO [train.py:1198] (0/2) Epoch 1, batch 4200, loss[loss=0.3813, ctc_loss=0.2971, cr_loss=0.4206, over 20901.00 frames. ], tot_loss[loss=0.4361, ctc_loss=0.3483, cr_loss=0.4391, over 4097698.12 frames. ], batch size: 54, lr: 3.74e-02, grad_scale: 64.0 2024-09-13 19:02:01,170 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.169e+02 2.906e+02 3.671e+02 5.051e+02 9.182e+02, threshold=7.342e+02, percent-clipped=6.0 2024-09-13 19:02:20,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=11956.666666666666, ans=0.008270289855072464 2024-09-13 19:02:22,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=11956.666666666666, ans=0.01684722222222223 2024-09-13 19:02:36,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=11985.0, ans=0.008264130434782609 2024-09-13 19:02:55,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=12013.333333333334, ans=0.016611111111111104 2024-09-13 19:03:05,790 INFO [train.py:1198] (0/2) Epoch 1, batch 4250, loss[loss=0.4353, ctc_loss=0.3481, cr_loss=0.4363, over 20939.00 frames. ], tot_loss[loss=0.4343, ctc_loss=0.3466, cr_loss=0.4385, over 4101410.30 frames. ], batch size: 60, lr: 3.73e-02, grad_scale: 64.0 2024-09-13 19:03:51,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=12126.666666666666, ans=0.125 2024-09-13 19:04:18,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=12155.0, ans=0.125 2024-09-13 19:04:21,248 INFO [train.py:1198] (0/2) Epoch 1, batch 4300, loss[loss=0.3516, ctc_loss=0.281, cr_loss=0.3527, over 20936.00 frames. ], tot_loss[loss=0.4332, ctc_loss=0.3454, cr_loss=0.4387, over 4096600.89 frames. ], batch size: 48, lr: 3.73e-02, grad_scale: 32.0 2024-09-13 19:04:31,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=12183.333333333334, ans=0.125 2024-09-13 19:04:33,491 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.151e+02 2.684e+02 3.342e+02 4.037e+02 7.738e+02, threshold=6.683e+02, percent-clipped=1.0 2024-09-13 19:04:35,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=12211.666666666666, ans=0.125 2024-09-13 19:04:45,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=12211.666666666666, ans=0.125 2024-09-13 19:05:09,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=12268.333333333334, ans=0.07 2024-09-13 19:05:26,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=12296.666666666666, ans=0.125 2024-09-13 19:05:31,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=12296.666666666666, ans=0.125 2024-09-13 19:05:39,982 INFO [train.py:1198] (0/2) Epoch 1, batch 4350, loss[loss=0.4316, ctc_loss=0.3451, cr_loss=0.4323, over 20837.00 frames. ], tot_loss[loss=0.4333, ctc_loss=0.3455, cr_loss=0.439, over 4088729.66 frames. ], batch size: 65, lr: 3.72e-02, grad_scale: 32.0 2024-09-13 19:05:49,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=12325.0, ans=0.0153125 2024-09-13 19:05:49,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=12325.0, ans=0.025 2024-09-13 19:05:55,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=12.1325 2024-09-13 19:06:49,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=12438.333333333334, ans=10.0 2024-09-13 19:06:52,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=12438.333333333334, ans=0.035 2024-09-13 19:06:57,631 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=12.175 2024-09-13 19:06:58,207 INFO [train.py:1198] (0/2) Epoch 1, batch 4400, loss[loss=0.4363, ctc_loss=0.3446, cr_loss=0.4585, over 20968.00 frames. ], tot_loss[loss=0.4309, ctc_loss=0.3432, cr_loss=0.4387, over 4103207.59 frames. ], batch size: 64, lr: 3.71e-02, grad_scale: 32.0 2024-09-13 19:06:58,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12466.666666666666, ans=0.17533333333333334 2024-09-13 19:07:08,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=4.87 2024-09-13 19:07:10,281 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.621e+02 3.031e+02 4.010e+02 7.445e+02, threshold=6.062e+02, percent-clipped=2.0 2024-09-13 19:07:29,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=16.8925 2024-09-13 19:07:48,968 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:08:10,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=12580.0, ans=0.008134782608695653 2024-09-13 19:08:14,421 INFO [train.py:1198] (0/2) Epoch 1, batch 4450, loss[loss=0.4148, ctc_loss=0.3268, cr_loss=0.4401, over 20803.00 frames. ], tot_loss[loss=0.4305, ctc_loss=0.3428, cr_loss=0.4389, over 4100505.05 frames. ], batch size: 53, lr: 3.71e-02, grad_scale: 32.0 2024-09-13 19:08:16,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12608.333333333334, ans=0.17391666666666666 2024-09-13 19:08:23,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=12608.333333333334, ans=0.125 2024-09-13 19:08:36,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12636.666666666666, ans=0.17363333333333333 2024-09-13 19:08:52,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=12665.0, ans=0.125 2024-09-13 19:09:04,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=17.02 2024-09-13 19:09:13,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12721.666666666666, ans=0.17278333333333334 2024-09-13 19:09:29,971 INFO [train.py:1198] (0/2) Epoch 1, batch 4500, loss[loss=0.4234, ctc_loss=0.3342, cr_loss=0.4459, over 21076.00 frames. ], tot_loss[loss=0.4309, ctc_loss=0.3428, cr_loss=0.4401, over 4093786.07 frames. ], batch size: 59, lr: 3.70e-02, grad_scale: 32.0 2024-09-13 19:09:41,740 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.160e+02 2.801e+02 3.289e+02 4.523e+02 8.110e+02, threshold=6.578e+02, percent-clipped=6.0 2024-09-13 19:10:41,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=12863.333333333334, ans=0.4497833333333333 2024-09-13 19:10:41,836 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.09 vs. limit=11.431666666666668 2024-09-13 19:10:45,615 INFO [train.py:1198] (0/2) Epoch 1, batch 4550, loss[loss=0.4376, ctc_loss=0.3404, cr_loss=0.4857, over 21033.00 frames. ], tot_loss[loss=0.4264, ctc_loss=0.3389, cr_loss=0.4373, over 4096800.57 frames. ], batch size: 63, lr: 3.70e-02, grad_scale: 32.0 2024-09-13 19:11:16,871 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:11:50,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13005.0, ans=0.16995 2024-09-13 19:12:06,677 INFO [train.py:1198] (0/2) Epoch 1, batch 4600, loss[loss=0.431, ctc_loss=0.3388, cr_loss=0.461, over 20836.00 frames. ], tot_loss[loss=0.4242, ctc_loss=0.337, cr_loss=0.4361, over 4100825.18 frames. ], batch size: 59, lr: 3.69e-02, grad_scale: 32.0 2024-09-13 19:12:13,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=13033.333333333334, ans=0.125 2024-09-13 19:12:18,931 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.656e+02 3.075e+02 3.759e+02 8.333e+02, threshold=6.151e+02, percent-clipped=3.0 2024-09-13 19:12:19,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.32 vs. limit=17.275 2024-09-13 19:12:34,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=13061.666666666666, ans=0.125 2024-09-13 19:12:50,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=13090.0, ans=0.125 2024-09-13 19:13:23,010 INFO [train.py:1198] (0/2) Epoch 1, batch 4650, loss[loss=0.3804, ctc_loss=0.2989, cr_loss=0.4075, over 20981.00 frames. ], tot_loss[loss=0.4244, ctc_loss=0.3372, cr_loss=0.4361, over 4100583.26 frames. ], batch size: 55, lr: 3.69e-02, grad_scale: 32.0 2024-09-13 19:13:27,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=13175.0, ans=0.008005434782608695 2024-09-13 19:14:19,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=13260.0, ans=0.125 2024-09-13 19:14:27,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=13288.333333333334, ans=0.125 2024-09-13 19:14:39,330 INFO [train.py:1198] (0/2) Epoch 1, batch 4700, loss[loss=0.4042, ctc_loss=0.324, cr_loss=0.401, over 20940.00 frames. ], tot_loss[loss=0.4224, ctc_loss=0.3353, cr_loss=0.4356, over 4109478.28 frames. ], batch size: 60, lr: 3.68e-02, grad_scale: 32.0 2024-09-13 19:14:42,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=4.9975000000000005 2024-09-13 19:14:42,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.94 vs. limit=17.4875 2024-09-13 19:14:51,188 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 2.668e+02 3.528e+02 4.722e+02 7.719e+02, threshold=7.056e+02, percent-clipped=7.0 2024-09-13 19:15:14,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=13373.333333333334, ans=0.125 2024-09-13 19:15:23,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=13401.666666666666, ans=0.007956159420289856 2024-09-13 19:15:27,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=13401.666666666666, ans=0.007956159420289856 2024-09-13 19:15:41,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=13430.0, ans=0.125 2024-09-13 19:15:43,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=13430.0, ans=0.00795 2024-09-13 19:15:43,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=12.536249999999999 2024-09-13 19:15:54,742 INFO [train.py:1198] (0/2) Epoch 1, batch 4750, loss[loss=0.471, ctc_loss=0.3792, cr_loss=0.4593, over 20981.00 frames. ], tot_loss[loss=0.4231, ctc_loss=0.3356, cr_loss=0.4376, over 4116385.95 frames. ], batch size: 64, lr: 3.68e-02, grad_scale: 32.0 2024-09-13 19:17:07,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=13571.666666666666, ans=0.4249916666666667 2024-09-13 19:17:13,776 INFO [train.py:1198] (0/2) Epoch 1, batch 4800, loss[loss=0.4313, ctc_loss=0.3417, cr_loss=0.4478, over 20841.00 frames. ], tot_loss[loss=0.424, ctc_loss=0.3363, cr_loss=0.4383, over 4106780.74 frames. ], batch size: 59, lr: 3.67e-02, grad_scale: 32.0 2024-09-13 19:17:25,729 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.230e+02 2.828e+02 3.314e+02 4.204e+02 6.768e+02, threshold=6.629e+02, percent-clipped=0.0 2024-09-13 19:18:13,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=12.631875 2024-09-13 19:18:19,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=13713.333333333334, ans=0.125 2024-09-13 19:18:32,418 INFO [train.py:1198] (0/2) Epoch 1, batch 4850, loss[loss=0.4569, ctc_loss=0.3572, cr_loss=0.4984, over 21008.00 frames. ], tot_loss[loss=0.4241, ctc_loss=0.3363, cr_loss=0.4388, over 4091256.46 frames. ], batch size: 61, lr: 3.67e-02, grad_scale: 32.0 2024-09-13 19:18:32,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=13741.666666666666, ans=0.4190416666666667 2024-09-13 19:18:40,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=13741.666666666666, ans=0.125 2024-09-13 19:18:50,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=13770.0, ans=0.025 2024-09-13 19:19:26,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=13826.666666666666, ans=0.125 2024-09-13 19:19:47,751 INFO [train.py:1198] (0/2) Epoch 1, batch 4900, loss[loss=0.4021, ctc_loss=0.3157, cr_loss=0.4319, over 20936.00 frames. ], tot_loss[loss=0.4227, ctc_loss=0.335, cr_loss=0.4383, over 4097069.44 frames. ], batch size: 60, lr: 3.66e-02, grad_scale: 16.0 2024-09-13 19:20:01,185 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.268e+02 2.911e+02 3.326e+02 4.192e+02 7.786e+02, threshold=6.652e+02, percent-clipped=4.0 2024-09-13 19:20:12,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13911.666666666666, ans=0.16088333333333335 2024-09-13 19:20:34,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=13968.333333333334, ans=0.025 2024-09-13 19:20:41,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=13968.333333333334, ans=0.007832971014492754 2024-09-13 19:20:56,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=13996.666666666666, ans=0.008347222222222228 2024-09-13 19:21:02,040 INFO [train.py:1198] (0/2) Epoch 1, batch 4950, loss[loss=0.3977, ctc_loss=0.31, cr_loss=0.4386, over 20976.00 frames. ], tot_loss[loss=0.4207, ctc_loss=0.3333, cr_loss=0.437, over 4096252.48 frames. ], batch size: 51, lr: 3.65e-02, grad_scale: 16.0 2024-09-13 19:21:19,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=14053.333333333334, ans=0.0 2024-09-13 19:21:21,037 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.50 vs. limit=12.77 2024-09-13 19:22:01,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=14138.333333333334, ans=0.007796014492753623 2024-09-13 19:22:16,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=14166.666666666666, ans=0.125 2024-09-13 19:22:17,897 INFO [train.py:1198] (0/2) Epoch 1, batch 5000, loss[loss=0.339, ctc_loss=0.2643, cr_loss=0.3739, over 19972.00 frames. ], tot_loss[loss=0.4191, ctc_loss=0.3319, cr_loss=0.4362, over 4098365.39 frames. ], batch size: 44, lr: 3.65e-02, grad_scale: 16.0 2024-09-13 19:22:31,796 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.674e+02 3.166e+02 4.204e+02 7.328e+02, threshold=6.333e+02, percent-clipped=1.0 2024-09-13 19:22:41,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=14195.0, ans=0.403175 2024-09-13 19:23:33,080 INFO [train.py:1198] (0/2) Epoch 1, batch 5050, loss[loss=0.4307, ctc_loss=0.3422, cr_loss=0.4426, over 20960.00 frames. ], tot_loss[loss=0.4188, ctc_loss=0.3314, cr_loss=0.4369, over 4099923.36 frames. ], batch size: 60, lr: 3.64e-02, grad_scale: 16.0 2024-09-13 19:23:36,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=14308.333333333334, ans=0.3992083333333333 2024-09-13 19:23:44,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=14308.333333333334, ans=0.05 2024-09-13 19:24:33,022 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=5.159 2024-09-13 19:24:36,140 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=12.908125 2024-09-13 19:24:45,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.10 vs. limit=8.605416666666667 2024-09-13 19:24:50,236 INFO [train.py:1198] (0/2) Epoch 1, batch 5100, loss[loss=0.3778, ctc_loss=0.2979, cr_loss=0.3994, over 20769.00 frames. ], tot_loss[loss=0.4184, ctc_loss=0.331, cr_loss=0.4367, over 4091930.13 frames. ], batch size: 53, lr: 3.64e-02, grad_scale: 16.0 2024-09-13 19:24:59,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=14450.0, ans=0.125 2024-09-13 19:25:03,777 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.319e+02 2.731e+02 3.124e+02 4.100e+02 6.990e+02, threshold=6.248e+02, percent-clipped=3.0 2024-09-13 19:25:15,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14478.333333333334, ans=0.15521666666666667 2024-09-13 19:25:19,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=14506.666666666666, ans=0.125 2024-09-13 19:25:38,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=14535.0, ans=0.125 2024-09-13 19:25:46,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=14535.0, ans=10.0 2024-09-13 19:25:46,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=14535.0, ans=0.125 2024-09-13 19:25:58,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=14563.333333333334, ans=0.005986111111111109 2024-09-13 19:26:05,731 INFO [train.py:1198] (0/2) Epoch 1, batch 5150, loss[loss=0.3904, ctc_loss=0.2997, cr_loss=0.4532, over 21059.00 frames. ], tot_loss[loss=0.419, ctc_loss=0.3316, cr_loss=0.4368, over 4090945.96 frames. ], batch size: 53, lr: 3.63e-02, grad_scale: 16.0 2024-09-13 19:26:19,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=12.971875 2024-09-13 19:26:26,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=14620.0, ans=0.125 2024-09-13 19:26:26,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=14620.0, ans=0.007691304347826087 2024-09-13 19:26:53,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=14676.666666666666, ans=0.125 2024-09-13 19:27:22,877 INFO [train.py:1198] (0/2) Epoch 1, batch 5200, loss[loss=0.3945, ctc_loss=0.3097, cr_loss=0.4241, over 21065.00 frames. ], tot_loss[loss=0.4162, ctc_loss=0.3291, cr_loss=0.4357, over 4105855.83 frames. ], batch size: 59, lr: 3.63e-02, grad_scale: 32.0 2024-09-13 19:27:27,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=14733.333333333334, ans=0.125 2024-09-13 19:27:36,069 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.143e+02 2.534e+02 3.157e+02 3.626e+02 7.530e+02, threshold=6.315e+02, percent-clipped=3.0 2024-09-13 19:27:44,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=14761.666666666666, ans=0.005159722222222225 2024-09-13 19:27:45,729 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:28:03,487 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:28:03,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=14790.0, ans=0.025 2024-09-13 19:28:09,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=13.056875 2024-09-13 19:28:22,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=14846.666666666666, ans=0.125 2024-09-13 19:28:34,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=14846.666666666666, ans=0.4227 2024-09-13 19:28:37,334 INFO [train.py:1198] (0/2) Epoch 1, batch 5250, loss[loss=0.3801, ctc_loss=0.2962, cr_loss=0.4197, over 20982.00 frames. ], tot_loss[loss=0.415, ctc_loss=0.328, cr_loss=0.4353, over 4111587.81 frames. ], batch size: 55, lr: 3.62e-02, grad_scale: 32.0 2024-09-13 19:29:41,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=14988.333333333334, ans=0.37540833333333334 2024-09-13 19:29:51,362 INFO [train.py:1198] (0/2) Epoch 1, batch 5300, loss[loss=0.4644, ctc_loss=0.3704, cr_loss=0.4698, over 20651.00 frames. ], tot_loss[loss=0.4135, ctc_loss=0.3266, cr_loss=0.4345, over 4113428.05 frames. ], batch size: 66, lr: 3.61e-02, grad_scale: 16.0 2024-09-13 19:30:05,888 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.236e+02 2.803e+02 3.346e+02 4.130e+02 6.068e+02, threshold=6.692e+02, percent-clipped=0.0 2024-09-13 19:30:06,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=13.141874999999999 2024-09-13 19:30:16,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=15045.0, ans=0.125 2024-09-13 19:30:18,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=15045.0, ans=0.425675 2024-09-13 19:30:22,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=15073.333333333334, ans=0.125 2024-09-13 19:30:34,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=15101.666666666666, ans=12.550833333333333 2024-09-13 19:30:40,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=15101.666666666666, ans=0.125 2024-09-13 19:31:05,736 INFO [train.py:1198] (0/2) Epoch 1, batch 5350, loss[loss=0.4016, ctc_loss=0.3143, cr_loss=0.4365, over 20968.00 frames. ], tot_loss[loss=0.4126, ctc_loss=0.3257, cr_loss=0.4347, over 4107265.64 frames. ], batch size: 58, lr: 3.61e-02, grad_scale: 16.0 2024-09-13 19:31:05,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=15158.333333333334, ans=0.125 2024-09-13 19:31:43,997 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.11 vs. limit=8.80375 2024-09-13 19:32:19,513 INFO [train.py:1198] (0/2) Epoch 1, batch 5400, loss[loss=0.4132, ctc_loss=0.3263, cr_loss=0.4346, over 20873.00 frames. ], tot_loss[loss=0.412, ctc_loss=0.3252, cr_loss=0.4338, over 4104406.58 frames. ], batch size: 57, lr: 3.60e-02, grad_scale: 16.0 2024-09-13 19:32:34,312 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.140e+02 2.674e+02 3.317e+02 3.992e+02 6.991e+02, threshold=6.635e+02, percent-clipped=1.0 2024-09-13 19:32:46,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=15328.333333333334, ans=0.002798611111111106 2024-09-13 19:33:03,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.04 vs. limit=12.692499999999999 2024-09-13 19:33:15,206 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=8.84625 2024-09-13 19:33:33,894 INFO [train.py:1198] (0/2) Epoch 1, batch 5450, loss[loss=0.4236, ctc_loss=0.3407, cr_loss=0.4143, over 20695.00 frames. ], tot_loss[loss=0.413, ctc_loss=0.3261, cr_loss=0.4348, over 4097159.23 frames. ], batch size: 68, lr: 3.60e-02, grad_scale: 16.0 2024-09-13 19:33:55,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=15470.0, ans=0.125 2024-09-13 19:34:02,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=8.8675 2024-09-13 19:34:06,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=15498.333333333334, ans=0.3575583333333333 2024-09-13 19:34:06,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=15498.333333333334, ans=0.125 2024-09-13 19:34:19,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=15526.666666666666, ans=0.125 2024-09-13 19:34:20,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=15526.666666666666, ans=0.125 2024-09-13 19:34:34,949 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=10.222000000000001 2024-09-13 19:34:37,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15555.0, ans=0.14445000000000002 2024-09-13 19:34:39,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=15555.0, ans=0.125 2024-09-13 19:34:40,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=15555.0, ans=0.125 2024-09-13 19:34:50,820 INFO [train.py:1198] (0/2) Epoch 1, batch 5500, loss[loss=0.3909, ctc_loss=0.3085, cr_loss=0.4123, over 20958.00 frames. ], tot_loss[loss=0.4116, ctc_loss=0.3247, cr_loss=0.4346, over 4092203.63 frames. ], batch size: 48, lr: 3.59e-02, grad_scale: 16.0 2024-09-13 19:34:52,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=15583.333333333334, ans=0.35458333333333336 2024-09-13 19:35:05,968 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.556e+02 3.075e+02 3.989e+02 8.223e+02, threshold=6.149e+02, percent-clipped=3.0 2024-09-13 19:35:19,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=15640.0, ans=0.0 2024-09-13 19:35:54,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=15696.666666666666, ans=0.14303333333333335 2024-09-13 19:36:07,838 INFO [train.py:1198] (0/2) Epoch 1, batch 5550, loss[loss=0.4117, ctc_loss=0.3212, cr_loss=0.4527, over 20984.00 frames. ], tot_loss[loss=0.4124, ctc_loss=0.3253, cr_loss=0.4355, over 4093957.52 frames. ], batch size: 67, lr: 3.59e-02, grad_scale: 16.0 2024-09-13 19:36:38,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=15781.666666666666, ans=0.34764166666666674 2024-09-13 19:36:44,387 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=19.33625 2024-09-13 19:36:57,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=15810.0, ans=13.42875 2024-09-13 19:37:11,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=15838.333333333334, ans=0.125 2024-09-13 19:37:22,225 INFO [train.py:1198] (0/2) Epoch 1, batch 5600, loss[loss=0.4094, ctc_loss=0.3181, cr_loss=0.4565, over 21057.00 frames. ], tot_loss[loss=0.4109, ctc_loss=0.3239, cr_loss=0.4353, over 4097398.54 frames. ], batch size: 62, lr: 3.58e-02, grad_scale: 32.0 2024-09-13 19:37:26,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=15866.666666666666, ans=0.0005555555555555522 2024-09-13 19:37:36,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=10.358 2024-09-13 19:37:36,939 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.264e+02 2.772e+02 3.418e+02 4.529e+02 7.633e+02, threshold=6.837e+02, percent-clipped=8.0 2024-09-13 19:37:38,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15895.0, ans=0.14105 2024-09-13 19:37:52,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=15923.333333333334, ans=0.125 2024-09-13 19:38:37,300 INFO [train.py:1198] (0/2) Epoch 1, batch 5650, loss[loss=0.4394, ctc_loss=0.3445, cr_loss=0.4742, over 20941.00 frames. ], tot_loss[loss=0.411, ctc_loss=0.3239, cr_loss=0.4357, over 4098979.80 frames. ], batch size: 64, lr: 3.57e-02, grad_scale: 32.0 2024-09-13 19:38:43,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=16008.333333333334, ans=0.3397083333333333 2024-09-13 19:38:47,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=16008.333333333334, ans=0.0 2024-09-13 19:39:03,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=16036.666666666666, ans=0.125 2024-09-13 19:39:15,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=16065.0, ans=0.125 2024-09-13 19:39:40,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=16121.666666666666, ans=0.0 2024-09-13 19:39:48,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.84 vs. limit=9.030416666666667 2024-09-13 19:39:50,604 INFO [train.py:1198] (0/2) Epoch 1, batch 5700, loss[loss=0.4228, ctc_loss=0.3301, cr_loss=0.4634, over 20828.00 frames. ], tot_loss[loss=0.4089, ctc_loss=0.322, cr_loss=0.4346, over 4103952.55 frames. ], batch size: 59, lr: 3.57e-02, grad_scale: 32.0 2024-09-13 19:40:05,384 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.027e+02 2.575e+02 2.972e+02 3.924e+02 7.518e+02, threshold=5.944e+02, percent-clipped=1.0 2024-09-13 19:40:14,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=5.42675 2024-09-13 19:40:31,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=16206.666666666666, ans=0.07 2024-09-13 19:41:04,671 INFO [train.py:1198] (0/2) Epoch 1, batch 5750, loss[loss=0.3876, ctc_loss=0.3019, cr_loss=0.4283, over 20923.00 frames. ], tot_loss[loss=0.4072, ctc_loss=0.3204, cr_loss=0.4342, over 4106383.00 frames. ], batch size: 60, lr: 3.56e-02, grad_scale: 32.0 2024-09-13 19:41:22,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=16320.0, ans=0.0 2024-09-13 19:41:22,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16320.0, ans=0.1368 2024-09-13 19:41:37,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=16348.333333333334, ans=0.125 2024-09-13 19:41:41,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.23 vs. limit=13.174166666666666 2024-09-13 19:41:43,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=16348.333333333334, ans=10.0 2024-09-13 19:41:44,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=16348.333333333334, ans=0.025 2024-09-13 19:42:04,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=16405.0, ans=0.125 2024-09-13 19:42:18,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=16433.333333333332, ans=0.125 2024-09-13 19:42:19,113 INFO [train.py:1198] (0/2) Epoch 1, batch 5800, loss[loss=0.4291, ctc_loss=0.3445, cr_loss=0.423, over 20621.00 frames. ], tot_loss[loss=0.4079, ctc_loss=0.321, cr_loss=0.4345, over 4102565.82 frames. ], batch size: 68, lr: 3.56e-02, grad_scale: 32.0 2024-09-13 19:42:28,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=16433.333333333332, ans=0.125 2024-09-13 19:42:35,905 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.153e+02 2.659e+02 3.253e+02 3.888e+02 7.350e+02, threshold=6.505e+02, percent-clipped=4.0 2024-09-13 19:42:37,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=16461.666666666668, ans=0.125 2024-09-13 19:42:43,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=16461.666666666668, ans=0.3238416666666667 2024-09-13 19:43:06,089 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=10.607333333333333 2024-09-13 19:43:11,560 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:43:11,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=10.607333333333333 2024-09-13 19:43:17,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=16518.333333333332, ans=0.0 2024-09-13 19:43:34,870 INFO [train.py:1198] (0/2) Epoch 1, batch 5850, loss[loss=0.3777, ctc_loss=0.2935, cr_loss=0.4207, over 20960.00 frames. ], tot_loss[loss=0.4071, ctc_loss=0.3204, cr_loss=0.4336, over 4093332.65 frames. ], batch size: 51, lr: 3.55e-02, grad_scale: 16.0 2024-09-13 19:43:36,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.89 vs. limit=5.0 2024-09-13 19:43:48,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=16603.333333333332, ans=0.0 2024-09-13 19:44:26,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=16660.0, ans=0.0 2024-09-13 19:44:45,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=16688.333333333332, ans=0.025 2024-09-13 19:44:51,395 INFO [train.py:1198] (0/2) Epoch 1, batch 5900, loss[loss=0.3882, ctc_loss=0.299, cr_loss=0.4461, over 20961.00 frames. ], tot_loss[loss=0.4071, ctc_loss=0.3205, cr_loss=0.4331, over 4078413.52 frames. ], batch size: 55, lr: 3.55e-02, grad_scale: 16.0 2024-09-13 19:45:06,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=16745.0, ans=0.025 2024-09-13 19:45:07,769 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.487e+02 2.847e+02 3.551e+02 7.104e+02, threshold=5.695e+02, percent-clipped=2.0 2024-09-13 19:45:35,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=16801.666666666668, ans=0.125 2024-09-13 19:45:45,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=16801.666666666668, ans=0.2 2024-09-13 19:46:02,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=13.811250000000001 2024-09-13 19:46:06,672 INFO [train.py:1198] (0/2) Epoch 1, batch 5950, loss[loss=0.3842, ctc_loss=0.2949, cr_loss=0.4467, over 20776.00 frames. ], tot_loss[loss=0.406, ctc_loss=0.3193, cr_loss=0.4333, over 4078405.69 frames. ], batch size: 53, lr: 3.54e-02, grad_scale: 16.0 2024-09-13 19:47:20,174 INFO [train.py:1198] (0/2) Epoch 1, batch 6000, loss[loss=0.3673, ctc_loss=0.2851, cr_loss=0.4112, over 20800.00 frames. ], tot_loss[loss=0.4049, ctc_loss=0.3183, cr_loss=0.4332, over 4080470.94 frames. ], batch size: 53, lr: 3.53e-02, grad_scale: 32.0 2024-09-13 19:47:20,175 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-13 19:47:38,753 INFO [train.py:1230] (0/2) Epoch 1, validation: loss=0.1271, ctc_loss=0.1271, cr_loss=9.075e-15, over 944034.00 frames. 2024-09-13 19:47:38,754 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-13 19:47:42,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=17000.0, ans=0.0071739130434782614 2024-09-13 19:47:50,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=17000.0, ans=0.13 2024-09-13 19:47:55,523 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.209e+02 2.871e+02 3.340e+02 4.119e+02 8.577e+02, threshold=6.680e+02, percent-clipped=5.0 2024-09-13 19:47:55,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=17028.333333333332, ans=0.125 2024-09-13 19:48:21,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.92 vs. limit=13.896250000000002 2024-09-13 19:48:43,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=17113.333333333332, ans=0.125 2024-09-13 19:48:52,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17141.666666666668, ans=0.12858333333333333 2024-09-13 19:48:53,446 INFO [train.py:1198] (0/2) Epoch 1, batch 6050, loss[loss=0.4599, ctc_loss=0.364, cr_loss=0.4798, over 18148.00 frames. ], tot_loss[loss=0.4042, ctc_loss=0.3175, cr_loss=0.4334, over 4079408.36 frames. ], batch size: 108, lr: 3.53e-02, grad_scale: 32.0 2024-09-13 19:48:55,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17141.666666666668, ans=0.12858333333333333 2024-09-13 19:49:06,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=17141.666666666668, ans=0.07 2024-09-13 19:49:12,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=17170.0, ans=0.125 2024-09-13 19:49:49,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.69 vs. limit=13.613333333333335 2024-09-13 19:50:08,049 INFO [train.py:1198] (0/2) Epoch 1, batch 6100, loss[loss=0.3912, ctc_loss=0.3007, cr_loss=0.4523, over 21004.00 frames. ], tot_loss[loss=0.4059, ctc_loss=0.3187, cr_loss=0.4357, over 4089120.37 frames. ], batch size: 55, lr: 3.52e-02, grad_scale: 32.0 2024-09-13 19:50:23,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.95 vs. limit=20.48375 2024-09-13 19:50:25,917 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.620e+02 3.206e+02 3.814e+02 6.311e+02, threshold=6.411e+02, percent-clipped=1.0 2024-09-13 19:50:45,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=17340.0, ans=0.09899494936611666 2024-09-13 19:50:56,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.88 vs. limit=5.0 2024-09-13 19:51:24,074 INFO [train.py:1198] (0/2) Epoch 1, batch 6150, loss[loss=0.3658, ctc_loss=0.2791, cr_loss=0.4332, over 20880.00 frames. ], tot_loss[loss=0.404, ctc_loss=0.317, cr_loss=0.4348, over 4098310.01 frames. ], batch size: 54, lr: 3.52e-02, grad_scale: 32.0 2024-09-13 19:51:38,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=14.044999999999998 2024-09-13 19:52:09,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=17510.0, ans=0.0 2024-09-13 19:52:14,663 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=5.6265 2024-09-13 19:52:39,307 INFO [train.py:1198] (0/2) Epoch 1, batch 6200, loss[loss=0.3632, ctc_loss=0.2808, cr_loss=0.412, over 20999.00 frames. ], tot_loss[loss=0.4021, ctc_loss=0.3154, cr_loss=0.4334, over 4080711.25 frames. ], batch size: 52, lr: 3.51e-02, grad_scale: 32.0 2024-09-13 19:52:42,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17566.666666666668, ans=0.12433333333333332 2024-09-13 19:52:50,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=17566.666666666668, ans=0.125 2024-09-13 19:52:51,952 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.69 vs. limit=14.0875 2024-09-13 19:52:55,548 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.483e+02 2.892e+02 3.680e+02 7.766e+02, threshold=5.783e+02, percent-clipped=2.0 2024-09-13 19:53:08,084 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:53:53,213 INFO [train.py:1198] (0/2) Epoch 1, batch 6250, loss[loss=0.4197, ctc_loss=0.324, cr_loss=0.4786, over 21041.00 frames. ], tot_loss[loss=0.4041, ctc_loss=0.3172, cr_loss=0.4346, over 4072857.05 frames. ], batch size: 62, lr: 3.51e-02, grad_scale: 32.0 2024-09-13 19:53:55,216 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=20.78125 2024-09-13 19:54:09,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=17736.666666666668, ans=0.125 2024-09-13 19:54:24,119 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=5.66475 2024-09-13 19:54:43,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=17793.333333333332, ans=0.125 2024-09-13 19:55:06,021 INFO [train.py:1198] (0/2) Epoch 1, batch 6300, loss[loss=0.3932, ctc_loss=0.3133, cr_loss=0.3996, over 20967.00 frames. ], tot_loss[loss=0.4081, ctc_loss=0.3212, cr_loss=0.4344, over 3978262.80 frames. ], batch size: 52, lr: 3.50e-02, grad_scale: 16.0 2024-09-13 19:55:14,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=17850.0, ans=0.12150000000000002 2024-09-13 19:55:22,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=14.204374999999999 2024-09-13 19:55:22,871 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.696e+02 3.200e+02 4.525e+02 7.555e+02, threshold=6.401e+02, percent-clipped=7.0 2024-09-13 19:55:54,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=17935.0, ans=0.125 2024-09-13 19:55:56,774 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:56:06,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=17963.333333333332, ans=0.125 2024-09-13 19:56:07,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.81 vs. limit=9.490833333333333 2024-09-13 19:56:09,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=17963.333333333332, ans=0.025 2024-09-13 19:56:12,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=17963.333333333332, ans=0.0 2024-09-13 19:56:16,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=14.246875 2024-09-13 19:56:17,290 INFO [train.py:1198] (0/2) Epoch 1, batch 6350, loss[loss=0.5458, ctc_loss=0.4539, cr_loss=0.4592, over 13926.00 frames. ], tot_loss[loss=0.4173, ctc_loss=0.3302, cr_loss=0.4358, over 3830874.71 frames. ], batch size: 149, lr: 3.49e-02, grad_scale: 16.0 2024-09-13 19:56:20,738 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=14.246875 2024-09-13 19:57:01,806 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=14.278749999999999 2024-09-13 19:57:05,662 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 19:57:15,235 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-1.pt 2024-09-13 19:58:00,851 INFO [train.py:1198] (0/2) Epoch 2, batch 0, loss[loss=0.4556, ctc_loss=0.3641, cr_loss=0.4571, over 20241.00 frames. ], tot_loss[loss=0.4556, ctc_loss=0.3641, cr_loss=0.4571, over 20241.00 frames. ], batch size: 74, lr: 3.42e-02, grad_scale: 32.0 2024-09-13 19:58:00,851 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-13 19:58:19,498 INFO [train.py:1230] (0/2) Epoch 2, validation: loss=0.1284, ctc_loss=0.1284, cr_loss=9.944e-15, over 944034.00 frames. 2024-09-13 19:58:19,499 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-13 19:58:51,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.429e+02 2.766e+02 3.397e+02 7.112e+02, threshold=5.531e+02, percent-clipped=1.0 2024-09-13 19:58:51,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18164.5, ans=0.11835500000000002 2024-09-13 19:58:57,675 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=14.3116875 2024-09-13 19:59:33,857 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.36 vs. limit=14.12475 2024-09-13 19:59:34,686 INFO [train.py:1198] (0/2) Epoch 2, batch 50, loss[loss=0.4008, ctc_loss=0.3125, cr_loss=0.4412, over 20775.00 frames. ], tot_loss[loss=0.4009, ctc_loss=0.3147, cr_loss=0.4309, over 914469.03 frames. ], batch size: 56, lr: 3.42e-02, grad_scale: 32.0 2024-09-13 19:59:42,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=18249.5, ans=0.2612675 2024-09-13 19:59:56,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.71 vs. limit=14.138916666666665 2024-09-13 20:00:02,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=18277.833333333332, ans=0.2602758333333335 2024-09-13 20:00:17,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18306.166666666668, ans=0.11693833333333334 2024-09-13 20:00:37,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=21.272125000000003 2024-09-13 20:00:41,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.87 vs. limit=14.386062500000001 2024-09-13 20:00:50,780 INFO [train.py:1198] (0/2) Epoch 2, batch 100, loss[loss=0.4042, ctc_loss=0.3147, cr_loss=0.4472, over 21070.00 frames. ], tot_loss[loss=0.4023, ctc_loss=0.3158, cr_loss=0.4326, over 1623679.80 frames. ], batch size: 56, lr: 3.41e-02, grad_scale: 32.0 2024-09-13 20:00:58,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=18391.166666666668, ans=0.125 2024-09-13 20:01:22,597 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.606e+02 3.253e+02 4.380e+02 9.911e+02, threshold=6.506e+02, percent-clipped=8.0 2024-09-13 20:01:40,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=18476.166666666668, ans=0.025 2024-09-13 20:01:46,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=18476.166666666668, ans=0.25333416666666664 2024-09-13 20:01:59,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.78 vs. limit=21.378375 2024-09-13 20:02:06,278 INFO [train.py:1198] (0/2) Epoch 2, batch 150, loss[loss=0.3958, ctc_loss=0.3086, cr_loss=0.4361, over 20946.00 frames. ], tot_loss[loss=0.4019, ctc_loss=0.3151, cr_loss=0.4342, over 2173182.85 frames. ], batch size: 55, lr: 3.40e-02, grad_scale: 32.0 2024-09-13 20:02:10,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=5.779925 2024-09-13 20:02:17,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=18532.833333333332, ans=0.006840688405797102 2024-09-13 20:02:18,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=18532.833333333332, ans=0.2513508333333334 2024-09-13 20:02:57,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=18617.833333333332, ans=0.125 2024-09-13 20:03:01,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=18617.833333333332, ans=0.006822210144927537 2024-09-13 20:03:06,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18617.833333333332, ans=0.11382166666666668 2024-09-13 20:03:24,003 INFO [train.py:1198] (0/2) Epoch 2, batch 200, loss[loss=0.3489, ctc_loss=0.265, cr_loss=0.4193, over 21065.00 frames. ], tot_loss[loss=0.3987, ctc_loss=0.3124, cr_loss=0.4318, over 2596182.84 frames. ], batch size: 59, lr: 3.40e-02, grad_scale: 32.0 2024-09-13 20:03:39,496 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:03:54,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=18731.166666666668, ans=0.125 2024-09-13 20:03:55,551 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.554e+02 3.174e+02 4.190e+02 6.448e+02, threshold=6.348e+02, percent-clipped=0.0 2024-09-13 20:04:40,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=18787.833333333332, ans=0.24242583333333345 2024-09-13 20:04:42,710 INFO [train.py:1198] (0/2) Epoch 2, batch 250, loss[loss=0.3714, ctc_loss=0.2844, cr_loss=0.4349, over 21059.00 frames. ], tot_loss[loss=0.3983, ctc_loss=0.3116, cr_loss=0.4335, over 2930800.81 frames. ], batch size: 53, lr: 3.39e-02, grad_scale: 32.0 2024-09-13 20:04:49,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=18816.166666666668, ans=0.125 2024-09-13 20:05:32,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=18901.166666666668, ans=0.125 2024-09-13 20:05:38,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=18901.166666666668, ans=0.0 2024-09-13 20:05:58,163 INFO [train.py:1198] (0/2) Epoch 2, batch 300, loss[loss=0.4083, ctc_loss=0.3187, cr_loss=0.4481, over 20852.00 frames. ], tot_loss[loss=0.3985, ctc_loss=0.3118, cr_loss=0.4336, over 3187123.65 frames. ], batch size: 65, lr: 3.39e-02, grad_scale: 32.0 2024-09-13 20:06:29,686 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.370e+02 2.734e+02 3.151e+02 5.678e+02, threshold=5.469e+02, percent-clipped=0.0 2024-09-13 20:06:59,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.79 vs. limit=14.651687500000001 2024-09-13 20:07:10,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=19071.166666666668, ans=0.0 2024-09-13 20:07:13,401 INFO [train.py:1198] (0/2) Epoch 2, batch 350, loss[loss=0.4948, ctc_loss=0.4097, cr_loss=0.4259, over 14163.00 frames. ], tot_loss[loss=0.3946, ctc_loss=0.3084, cr_loss=0.4309, over 3380381.12 frames. ], batch size: 149, lr: 3.38e-02, grad_scale: 32.0 2024-09-13 20:07:33,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=19127.833333333332, ans=0.125 2024-09-13 20:08:02,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=19184.5, ans=0.125 2024-09-13 20:08:11,737 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=14.6941875 2024-09-13 20:08:32,012 INFO [train.py:1198] (0/2) Epoch 2, batch 400, loss[loss=0.3856, ctc_loss=0.3047, cr_loss=0.4046, over 20792.00 frames. ], tot_loss[loss=0.3913, ctc_loss=0.3057, cr_loss=0.4283, over 3546115.97 frames. ], batch size: 53, lr: 3.38e-02, grad_scale: 32.0 2024-09-13 20:08:44,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19241.166666666668, ans=0.10758833333333334 2024-09-13 20:08:57,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=19269.5, ans=0.125 2024-09-13 20:09:04,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.079e+02 2.473e+02 2.858e+02 3.592e+02 5.873e+02, threshold=5.717e+02, percent-clipped=3.0 2024-09-13 20:09:06,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=19297.833333333332, ans=0.22457583333333342 2024-09-13 20:09:11,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=14.7366875 2024-09-13 20:09:22,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19326.166666666668, ans=0.0 2024-09-13 20:09:26,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=19326.166666666668, ans=0.006668224637681159 2024-09-13 20:09:37,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=19354.5, ans=0.125 2024-09-13 20:09:50,753 INFO [train.py:1198] (0/2) Epoch 2, batch 450, loss[loss=0.3879, ctc_loss=0.3024, cr_loss=0.4274, over 20787.00 frames. ], tot_loss[loss=0.3907, ctc_loss=0.3049, cr_loss=0.4291, over 3675170.74 frames. ], batch size: 56, lr: 3.37e-02, grad_scale: 32.0 2024-09-13 20:10:04,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=19411.166666666668, ans=0.09899494936611666 2024-09-13 20:10:26,560 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.89 vs. limit=5.915925 2024-09-13 20:10:34,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=19467.833333333332, ans=0.0 2024-09-13 20:10:47,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=19467.833333333332, ans=0.0 2024-09-13 20:10:52,218 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=11.798466666666666 2024-09-13 20:10:59,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=19496.166666666668, ans=0.0066312681159420295 2024-09-13 20:11:06,541 INFO [train.py:1198] (0/2) Epoch 2, batch 500, loss[loss=0.449, ctc_loss=0.353, cr_loss=0.4801, over 19468.00 frames. ], tot_loss[loss=0.3917, ctc_loss=0.3057, cr_loss=0.4297, over 3769165.77 frames. ], batch size: 90, lr: 3.37e-02, grad_scale: 32.0 2024-09-13 20:11:15,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=19524.5, ans=0.125 2024-09-13 20:11:31,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=22.164625 2024-09-13 20:11:38,540 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.462e+02 2.855e+02 3.643e+02 5.655e+02, threshold=5.711e+02, percent-clipped=0.0 2024-09-13 20:11:47,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=19581.166666666668, ans=0.125 2024-09-13 20:12:22,410 INFO [train.py:1198] (0/2) Epoch 2, batch 550, loss[loss=0.3554, ctc_loss=0.2675, cr_loss=0.4392, over 21058.00 frames. ], tot_loss[loss=0.3915, ctc_loss=0.3055, cr_loss=0.43, over 3841743.99 frames. ], batch size: 53, lr: 3.36e-02, grad_scale: 32.0 2024-09-13 20:12:30,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=19666.166666666668, ans=0.21168416666666667 2024-09-13 20:12:31,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=5.949925 2024-09-13 20:12:50,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=19694.5, ans=0.0 2024-09-13 20:13:07,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=19751.166666666668, ans=0.05 2024-09-13 20:13:20,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=19751.166666666668, ans=0.125 2024-09-13 20:13:38,075 INFO [train.py:1198] (0/2) Epoch 2, batch 600, loss[loss=0.4099, ctc_loss=0.3223, cr_loss=0.4381, over 20042.00 frames. ], tot_loss[loss=0.3906, ctc_loss=0.3048, cr_loss=0.4291, over 3898824.11 frames. ], batch size: 80, lr: 3.35e-02, grad_scale: 32.0 2024-09-13 20:13:41,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19807.833333333332, ans=0.10192166666666669 2024-09-13 20:13:50,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=19807.833333333332, ans=0.125 2024-09-13 20:14:06,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=19836.166666666668, ans=0.0065573550724637684 2024-09-13 20:14:12,706 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 2.660e+02 3.440e+02 4.172e+02 6.583e+02, threshold=6.880e+02, percent-clipped=8.0 2024-09-13 20:14:17,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=19864.5, ans=0.0 2024-09-13 20:14:20,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=19864.5, ans=0.20474250000000005 2024-09-13 20:14:41,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=19921.166666666668, ans=0.125 2024-09-13 20:14:56,379 INFO [train.py:1198] (0/2) Epoch 2, batch 650, loss[loss=0.3999, ctc_loss=0.3142, cr_loss=0.4287, over 21007.00 frames. ], tot_loss[loss=0.3911, ctc_loss=0.3053, cr_loss=0.4291, over 3930801.09 frames. ], batch size: 61, lr: 3.35e-02, grad_scale: 32.0 2024-09-13 20:15:17,571 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:16:15,338 INFO [train.py:1198] (0/2) Epoch 2, batch 700, loss[loss=0.3999, ctc_loss=0.3106, cr_loss=0.4466, over 21004.00 frames. ], tot_loss[loss=0.3908, ctc_loss=0.305, cr_loss=0.4293, over 3972978.00 frames. ], batch size: 61, lr: 3.34e-02, grad_scale: 32.0 2024-09-13 20:16:15,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=20091.166666666668, ans=0.125 2024-09-13 20:16:20,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=20091.166666666668, ans=0.006501920289855072 2024-09-13 20:16:32,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=20119.5, ans=0.0 2024-09-13 20:16:47,002 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.108e+02 2.676e+02 3.222e+02 3.947e+02 6.924e+02, threshold=6.443e+02, percent-clipped=1.0 2024-09-13 20:17:02,391 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:17:03,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=20176.166666666668, ans=0.006483442028985507 2024-09-13 20:17:08,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20176.166666666668, ans=0.1 2024-09-13 20:17:31,016 INFO [train.py:1198] (0/2) Epoch 2, batch 750, loss[loss=0.4002, ctc_loss=0.3192, cr_loss=0.405, over 18248.00 frames. ], tot_loss[loss=0.3904, ctc_loss=0.3046, cr_loss=0.4289, over 3994134.13 frames. ], batch size: 108, lr: 3.34e-02, grad_scale: 32.0 2024-09-13 20:17:39,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=20232.833333333332, ans=0.0064711231884057975 2024-09-13 20:17:45,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=20261.166666666668, ans=0.125 2024-09-13 20:18:35,123 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2024-09-13 20:18:46,836 INFO [train.py:1198] (0/2) Epoch 2, batch 800, loss[loss=0.3745, ctc_loss=0.2794, cr_loss=0.4754, over 21056.00 frames. ], tot_loss[loss=0.3889, ctc_loss=0.3032, cr_loss=0.4286, over 4022918.02 frames. ], batch size: 53, lr: 3.33e-02, grad_scale: 32.0 2024-09-13 20:18:53,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=20374.5, ans=0.0 2024-09-13 20:19:18,908 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.626e+02 3.213e+02 4.247e+02 9.413e+02, threshold=6.425e+02, percent-clipped=6.0 2024-09-13 20:19:37,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20459.5, ans=0.1 2024-09-13 20:19:43,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=20459.5, ans=0.125 2024-09-13 20:20:05,913 INFO [train.py:1198] (0/2) Epoch 2, batch 850, loss[loss=0.3964, ctc_loss=0.3058, cr_loss=0.4528, over 20833.00 frames. ], tot_loss[loss=0.3876, ctc_loss=0.302, cr_loss=0.428, over 4038181.30 frames. ], batch size: 59, lr: 3.33e-02, grad_scale: 32.0 2024-09-13 20:20:33,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=20544.5, ans=0.006403369565217391 2024-09-13 20:20:42,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=20572.833333333332, ans=0.2 2024-09-13 20:20:56,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20601.166666666668, ans=0.1 2024-09-13 20:21:03,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=20601.166666666668, ans=0.0 2024-09-13 20:21:21,543 INFO [train.py:1198] (0/2) Epoch 2, batch 900, loss[loss=0.3826, ctc_loss=0.2906, cr_loss=0.46, over 20899.00 frames. ], tot_loss[loss=0.3885, ctc_loss=0.3026, cr_loss=0.4297, over 4054831.32 frames. ], batch size: 54, lr: 3.32e-02, grad_scale: 32.0 2024-09-13 20:21:23,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=20657.833333333332, ans=0.125 2024-09-13 20:21:31,280 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:21:38,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=20686.166666666668, ans=0.0 2024-09-13 20:21:56,539 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.569e+02 2.898e+02 3.447e+02 5.755e+02, threshold=5.795e+02, percent-clipped=0.0 2024-09-13 20:22:01,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=20714.5, ans=0.125 2024-09-13 20:22:10,889 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:22:40,378 INFO [train.py:1198] (0/2) Epoch 2, batch 950, loss[loss=0.3437, ctc_loss=0.2613, cr_loss=0.4118, over 20976.00 frames. ], tot_loss[loss=0.3894, ctc_loss=0.3032, cr_loss=0.431, over 4073498.05 frames. ], batch size: 48, lr: 3.32e-02, grad_scale: 32.0 2024-09-13 20:22:42,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=20799.5, ans=0.125 2024-09-13 20:22:50,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=22.5 2024-09-13 20:22:53,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=20799.5, ans=0.2 2024-09-13 20:23:00,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=20827.833333333332, ans=0.125 2024-09-13 20:23:54,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.71 vs. limit=22.5 2024-09-13 20:23:55,291 INFO [train.py:1198] (0/2) Epoch 2, batch 1000, loss[loss=0.418, ctc_loss=0.3277, cr_loss=0.4516, over 21031.00 frames. ], tot_loss[loss=0.3891, ctc_loss=0.3029, cr_loss=0.4311, over 4079006.47 frames. ], batch size: 62, lr: 3.31e-02, grad_scale: 32.0 2024-09-13 20:24:26,894 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.521e+02 3.115e+02 3.641e+02 5.149e+02, threshold=6.231e+02, percent-clipped=0.0 2024-09-13 20:24:33,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=20997.833333333332, ans=0.00630481884057971 2024-09-13 20:24:42,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-13 20:25:04,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=21054.5, ans=0.125 2024-09-13 20:25:10,148 INFO [train.py:1198] (0/2) Epoch 2, batch 1050, loss[loss=0.3665, ctc_loss=0.2793, cr_loss=0.4357, over 20957.00 frames. ], tot_loss[loss=0.3891, ctc_loss=0.303, cr_loss=0.4308, over 4082011.19 frames. ], batch size: 51, lr: 3.30e-02, grad_scale: 32.0 2024-09-13 20:25:12,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=21082.833333333332, ans=0.125 2024-09-13 20:25:19,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=21082.833333333332, ans=0.125 2024-09-13 20:25:27,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=21111.166666666668, ans=0.025 2024-09-13 20:25:42,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21139.5, ans=0.1 2024-09-13 20:26:15,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=21196.166666666668, ans=0.125 2024-09-13 20:26:23,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=21196.166666666668, ans=0.2 2024-09-13 20:26:26,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=21196.166666666668, ans=0.04949747468305833 2024-09-13 20:26:28,983 INFO [train.py:1198] (0/2) Epoch 2, batch 1100, loss[loss=0.3538, ctc_loss=0.2713, cr_loss=0.4125, over 21045.00 frames. ], tot_loss[loss=0.3851, ctc_loss=0.2995, cr_loss=0.4279, over 4095316.33 frames. ], batch size: 62, lr: 3.30e-02, grad_scale: 32.0 2024-09-13 20:26:33,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=21224.5, ans=0.00625554347826087 2024-09-13 20:26:38,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-09-13 20:26:42,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=21252.833333333332, ans=0.2 2024-09-13 20:26:44,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21252.833333333332, ans=0.1 2024-09-13 20:26:46,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=21252.833333333332, ans=0.125 2024-09-13 20:27:00,761 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.126e+02 2.640e+02 3.406e+02 4.590e+02 7.864e+02, threshold=6.813e+02, percent-clipped=6.0 2024-09-13 20:27:20,953 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:27:47,715 INFO [train.py:1198] (0/2) Epoch 2, batch 1150, loss[loss=0.4159, ctc_loss=0.3223, cr_loss=0.4681, over 20971.00 frames. ], tot_loss[loss=0.3846, ctc_loss=0.299, cr_loss=0.428, over 4098470.23 frames. ], batch size: 64, lr: 3.29e-02, grad_scale: 32.0 2024-09-13 20:27:57,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=21366.166666666668, ans=0.125 2024-09-13 20:28:44,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.95 vs. limit=15.0 2024-09-13 20:29:00,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=21479.5, ans=0.125 2024-09-13 20:29:03,493 INFO [train.py:1198] (0/2) Epoch 2, batch 1200, loss[loss=0.4028, ctc_loss=0.3128, cr_loss=0.4498, over 20938.00 frames. ], tot_loss[loss=0.3861, ctc_loss=0.3002, cr_loss=0.4296, over 4094325.19 frames. ], batch size: 60, lr: 3.29e-02, grad_scale: 32.0 2024-09-13 20:29:35,539 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.574e+02 2.852e+02 3.470e+02 7.772e+02, threshold=5.704e+02, percent-clipped=1.0 2024-09-13 20:30:09,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=21621.166666666668, ans=0.125 2024-09-13 20:30:13,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=21621.166666666668, ans=0.125 2024-09-13 20:30:19,069 INFO [train.py:1198] (0/2) Epoch 2, batch 1250, loss[loss=0.4012, ctc_loss=0.3155, cr_loss=0.4287, over 20380.00 frames. ], tot_loss[loss=0.3871, ctc_loss=0.3008, cr_loss=0.4315, over 4105345.70 frames. ], batch size: 74, lr: 3.28e-02, grad_scale: 32.0 2024-09-13 20:31:13,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21734.5, ans=0.1 2024-09-13 20:31:20,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=21762.833333333332, ans=0.125 2024-09-13 20:31:28,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=21762.833333333332, ans=0.125 2024-09-13 20:31:31,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=21762.833333333332, ans=0.125 2024-09-13 20:31:37,302 INFO [train.py:1198] (0/2) Epoch 2, batch 1300, loss[loss=0.4391, ctc_loss=0.3448, cr_loss=0.4717, over 20300.00 frames. ], tot_loss[loss=0.3895, ctc_loss=0.3029, cr_loss=0.4331, over 4098586.79 frames. ], batch size: 74, lr: 3.28e-02, grad_scale: 32.0 2024-09-13 20:31:51,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-13 20:32:01,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=21819.5, ans=0.025 2024-09-13 20:32:03,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=21819.5, ans=0.95 2024-09-13 20:32:09,143 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.667e+02 3.109e+02 3.694e+02 5.570e+02, threshold=6.217e+02, percent-clipped=0.0 2024-09-13 20:32:52,983 INFO [train.py:1198] (0/2) Epoch 2, batch 1350, loss[loss=0.3031, ctc_loss=0.2276, cr_loss=0.3773, over 20994.00 frames. ], tot_loss[loss=0.3863, ctc_loss=0.3001, cr_loss=0.4307, over 4096044.91 frames. ], batch size: 50, lr: 3.27e-02, grad_scale: 32.0 2024-09-13 20:33:01,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=21932.833333333332, ans=0.04949747468305833 2024-09-13 20:33:58,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=22046.166666666668, ans=0.125 2024-09-13 20:34:11,381 INFO [train.py:1198] (0/2) Epoch 2, batch 1400, loss[loss=0.3229, ctc_loss=0.2482, cr_loss=0.3735, over 20950.00 frames. ], tot_loss[loss=0.3868, ctc_loss=0.3006, cr_loss=0.4308, over 4090814.86 frames. ], batch size: 51, lr: 3.27e-02, grad_scale: 32.0 2024-09-13 20:34:22,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=22074.5, ans=0.006070760869565217 2024-09-13 20:34:25,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=22102.833333333332, ans=0.0 2024-09-13 20:34:43,477 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.664e+02 3.154e+02 3.884e+02 7.129e+02, threshold=6.308e+02, percent-clipped=2.0 2024-09-13 20:35:21,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=22187.833333333332, ans=0.0 2024-09-13 20:35:22,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=22187.833333333332, ans=0.125 2024-09-13 20:35:26,908 INFO [train.py:1198] (0/2) Epoch 2, batch 1450, loss[loss=0.445, ctc_loss=0.3508, cr_loss=0.4707, over 19620.00 frames. ], tot_loss[loss=0.3869, ctc_loss=0.3008, cr_loss=0.4305, over 4085396.85 frames. ], batch size: 90, lr: 3.26e-02, grad_scale: 32.0 2024-09-13 20:35:40,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2024-09-13 20:35:48,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=22244.5, ans=0.006033804347826088 2024-09-13 20:35:58,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=22272.833333333332, ans=0.125 2024-09-13 20:36:04,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22272.833333333332, ans=0.1 2024-09-13 20:36:06,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.07 vs. limit=15.0 2024-09-13 20:36:19,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=22301.166666666668, ans=0.025 2024-09-13 20:36:25,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=22329.5, ans=0.125 2024-09-13 20:36:28,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=22329.5, ans=0.125 2024-09-13 20:36:42,127 INFO [train.py:1198] (0/2) Epoch 2, batch 1500, loss[loss=0.3699, ctc_loss=0.2879, cr_loss=0.4099, over 21036.00 frames. ], tot_loss[loss=0.3855, ctc_loss=0.2997, cr_loss=0.4292, over 4090104.47 frames. ], batch size: 62, lr: 3.26e-02, grad_scale: 32.0 2024-09-13 20:37:01,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22386.166666666668, ans=0.1 2024-09-13 20:37:07,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=22386.166666666668, ans=0.125 2024-09-13 20:37:16,346 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.427e+02 2.958e+02 3.513e+02 7.679e+02, threshold=5.916e+02, percent-clipped=2.0 2024-09-13 20:37:41,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=22442.833333333332, ans=0.025 2024-09-13 20:37:41,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=22442.833333333332, ans=0.125 2024-09-13 20:37:41,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=22442.833333333332, ans=0.125 2024-09-13 20:37:44,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22471.166666666668, ans=0.1 2024-09-13 20:37:59,592 INFO [train.py:1198] (0/2) Epoch 2, batch 1550, loss[loss=0.2982, ctc_loss=0.2276, cr_loss=0.3529, over 21009.00 frames. ], tot_loss[loss=0.3845, ctc_loss=0.2987, cr_loss=0.4292, over 4093905.34 frames. ], batch size: 49, lr: 3.25e-02, grad_scale: 16.0 2024-09-13 20:38:04,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=22499.5, ans=0.125 2024-09-13 20:38:06,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=22499.5, ans=0.125 2024-09-13 20:38:07,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=22499.5, ans=0.0 2024-09-13 20:38:09,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=22499.5, ans=0.0 2024-09-13 20:38:16,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22527.833333333332, ans=0.1 2024-09-13 20:38:26,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=22527.833333333332, ans=0.125 2024-09-13 20:38:52,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=22584.5, ans=0.125 2024-09-13 20:38:55,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=22584.5, ans=0.125 2024-09-13 20:39:03,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=22612.833333333332, ans=0.005953731884057971 2024-09-13 20:39:17,597 INFO [train.py:1198] (0/2) Epoch 2, batch 1600, loss[loss=0.3814, ctc_loss=0.2974, cr_loss=0.4196, over 19659.00 frames. ], tot_loss[loss=0.3866, ctc_loss=0.3006, cr_loss=0.4297, over 4073168.67 frames. ], batch size: 90, lr: 3.24e-02, grad_scale: 32.0 2024-09-13 20:39:22,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=22641.166666666668, ans=0.2 2024-09-13 20:39:29,800 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-8000.pt 2024-09-13 20:39:33,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22669.5, ans=0.1 2024-09-13 20:39:36,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=22669.5, ans=0.2 2024-09-13 20:39:51,374 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.024e+02 2.460e+02 2.947e+02 3.352e+02 5.392e+02, threshold=5.894e+02, percent-clipped=0.0 2024-09-13 20:39:54,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=22697.833333333332, ans=0.125 2024-09-13 20:40:33,864 INFO [train.py:1198] (0/2) Epoch 2, batch 1650, loss[loss=0.3202, ctc_loss=0.2404, cr_loss=0.3992, over 20952.00 frames. ], tot_loss[loss=0.3845, ctc_loss=0.2986, cr_loss=0.4293, over 4086369.81 frames. ], batch size: 50, lr: 3.24e-02, grad_scale: 32.0 2024-09-13 20:40:34,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=22782.833333333332, ans=0.125 2024-09-13 20:40:49,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22811.166666666668, ans=0.1 2024-09-13 20:41:49,977 INFO [train.py:1198] (0/2) Epoch 2, batch 1700, loss[loss=0.4388, ctc_loss=0.3405, cr_loss=0.4915, over 21022.00 frames. ], tot_loss[loss=0.3839, ctc_loss=0.2981, cr_loss=0.4291, over 4091901.41 frames. ], batch size: 61, lr: 3.23e-02, grad_scale: 32.0 2024-09-13 20:42:02,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=22924.5, ans=0.125 2024-09-13 20:42:20,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=22981.166666666668, ans=0.125 2024-09-13 20:42:22,875 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.632e+02 3.164e+02 4.211e+02 6.486e+02, threshold=6.328e+02, percent-clipped=4.0 2024-09-13 20:42:51,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=23037.833333333332, ans=0.125 2024-09-13 20:42:54,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23037.833333333332, ans=0.1 2024-09-13 20:43:02,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=23037.833333333332, ans=0.125 2024-09-13 20:43:08,306 INFO [train.py:1198] (0/2) Epoch 2, batch 1750, loss[loss=0.4194, ctc_loss=0.3263, cr_loss=0.4654, over 21051.00 frames. ], tot_loss[loss=0.383, ctc_loss=0.2973, cr_loss=0.4287, over 4102654.33 frames. ], batch size: 62, lr: 3.23e-02, grad_scale: 32.0 2024-09-13 20:43:23,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=23094.5, ans=0.125 2024-09-13 20:44:10,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=23179.5, ans=0.125 2024-09-13 20:44:25,828 INFO [train.py:1198] (0/2) Epoch 2, batch 1800, loss[loss=0.357, ctc_loss=0.275, cr_loss=0.41, over 20791.00 frames. ], tot_loss[loss=0.3832, ctc_loss=0.2972, cr_loss=0.4295, over 4112595.56 frames. ], batch size: 53, lr: 3.22e-02, grad_scale: 32.0 2024-09-13 20:44:47,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.69 vs. limit=5.0 2024-09-13 20:44:49,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=23236.166666666668, ans=0.0 2024-09-13 20:44:54,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2024-09-13 20:44:56,925 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:44:57,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.14 vs. limit=10.0 2024-09-13 20:44:59,715 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.462e+02 2.981e+02 3.679e+02 7.398e+02, threshold=5.962e+02, percent-clipped=3.0 2024-09-13 20:45:03,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=23264.5, ans=0.125 2024-09-13 20:45:06,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=23264.5, ans=0.025 2024-09-13 20:45:22,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=23292.833333333332, ans=0.025 2024-09-13 20:45:23,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=23292.833333333332, ans=0.125 2024-09-13 20:45:25,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=23321.166666666668, ans=0.125 2024-09-13 20:45:41,630 INFO [train.py:1198] (0/2) Epoch 2, batch 1850, loss[loss=0.3435, ctc_loss=0.2651, cr_loss=0.3919, over 20969.00 frames. ], tot_loss[loss=0.3836, ctc_loss=0.2976, cr_loss=0.4301, over 4108184.61 frames. ], batch size: 49, lr: 3.22e-02, grad_scale: 32.0 2024-09-13 20:46:13,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=23406.166666666668, ans=0.2 2024-09-13 20:46:24,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=23406.166666666668, ans=0.1 2024-09-13 20:46:57,715 INFO [train.py:1198] (0/2) Epoch 2, batch 1900, loss[loss=0.3836, ctc_loss=0.2898, cr_loss=0.4691, over 21069.00 frames. ], tot_loss[loss=0.3832, ctc_loss=0.2972, cr_loss=0.4303, over 4104488.35 frames. ], batch size: 59, lr: 3.21e-02, grad_scale: 32.0 2024-09-13 20:47:30,522 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.489e+02 3.030e+02 3.557e+02 7.221e+02, threshold=6.060e+02, percent-clipped=3.0 2024-09-13 20:47:36,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=23547.833333333332, ans=0.0 2024-09-13 20:48:07,421 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:48:12,876 INFO [train.py:1198] (0/2) Epoch 2, batch 1950, loss[loss=0.385, ctc_loss=0.2893, cr_loss=0.4781, over 21054.00 frames. ], tot_loss[loss=0.3842, ctc_loss=0.2979, cr_loss=0.4316, over 4105642.97 frames. ], batch size: 56, lr: 3.21e-02, grad_scale: 32.0 2024-09-13 20:49:03,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-13 20:49:31,260 INFO [train.py:1198] (0/2) Epoch 2, batch 2000, loss[loss=0.3838, ctc_loss=0.2928, cr_loss=0.4547, over 20774.00 frames. ], tot_loss[loss=0.382, ctc_loss=0.296, cr_loss=0.4297, over 4110922.41 frames. ], batch size: 56, lr: 3.20e-02, grad_scale: 32.0 2024-09-13 20:50:02,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=23802.833333333332, ans=0.125 2024-09-13 20:50:07,683 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.510e+02 2.846e+02 3.575e+02 7.268e+02, threshold=5.692e+02, percent-clipped=1.0 2024-09-13 20:50:33,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23887.833333333332, ans=0.1 2024-09-13 20:50:48,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=23916.166666666668, ans=0.04949747468305833 2024-09-13 20:50:49,558 INFO [train.py:1198] (0/2) Epoch 2, batch 2050, loss[loss=0.4001, ctc_loss=0.3179, cr_loss=0.411, over 18440.00 frames. ], tot_loss[loss=0.3823, ctc_loss=0.2963, cr_loss=0.43, over 4109034.49 frames. ], batch size: 108, lr: 3.20e-02, grad_scale: 32.0 2024-09-13 20:50:57,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-09-13 20:51:15,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=23944.5, ans=0.0 2024-09-13 20:51:21,854 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=15.0 2024-09-13 20:51:44,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=24001.166666666668, ans=0.0 2024-09-13 20:52:04,324 INFO [train.py:1198] (0/2) Epoch 2, batch 2100, loss[loss=0.4819, ctc_loss=0.3948, cr_loss=0.4354, over 14706.00 frames. ], tot_loss[loss=0.3825, ctc_loss=0.2966, cr_loss=0.4299, over 4096688.03 frames. ], batch size: 149, lr: 3.19e-02, grad_scale: 32.0 2024-09-13 20:52:12,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=22.5 2024-09-13 20:52:19,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2024-09-13 20:52:37,207 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.482e+02 2.856e+02 3.518e+02 6.098e+02, threshold=5.713e+02, percent-clipped=2.0 2024-09-13 20:53:15,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=24171.166666666668, ans=0.0 2024-09-13 20:53:19,413 INFO [train.py:1198] (0/2) Epoch 2, batch 2150, loss[loss=0.4249, ctc_loss=0.3302, cr_loss=0.4735, over 20816.00 frames. ], tot_loss[loss=0.3814, ctc_loss=0.2955, cr_loss=0.4294, over 4100986.78 frames. ], batch size: 65, lr: 3.19e-02, grad_scale: 32.0 2024-09-13 20:53:34,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=24227.833333333332, ans=0.125 2024-09-13 20:53:47,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=24227.833333333332, ans=0.125 2024-09-13 20:54:34,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=24312.833333333332, ans=0.125 2024-09-13 20:54:37,397 INFO [train.py:1198] (0/2) Epoch 2, batch 2200, loss[loss=0.3742, ctc_loss=0.2853, cr_loss=0.4442, over 20970.00 frames. ], tot_loss[loss=0.3798, ctc_loss=0.294, cr_loss=0.4292, over 4104515.52 frames. ], batch size: 58, lr: 3.18e-02, grad_scale: 32.0 2024-09-13 20:54:45,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=24341.166666666668, ans=0.0 2024-09-13 20:55:00,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=24369.5, ans=0.0 2024-09-13 20:55:10,696 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.109e+02 2.531e+02 2.986e+02 3.747e+02 8.423e+02, threshold=5.972e+02, percent-clipped=7.0 2024-09-13 20:55:11,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=24397.833333333332, ans=0.005565688405797102 2024-09-13 20:55:42,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=24454.5, ans=0.125 2024-09-13 20:55:54,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=24482.833333333332, ans=0.125 2024-09-13 20:55:55,517 INFO [train.py:1198] (0/2) Epoch 2, batch 2250, loss[loss=0.3311, ctc_loss=0.2517, cr_loss=0.3973, over 21052.00 frames. ], tot_loss[loss=0.3808, ctc_loss=0.2947, cr_loss=0.4302, over 4107252.45 frames. ], batch size: 56, lr: 3.18e-02, grad_scale: 32.0 2024-09-13 20:56:26,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=24539.5, ans=0.125 2024-09-13 20:56:35,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=24539.5, ans=0.125 2024-09-13 20:56:38,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=24539.5, ans=0.125 2024-09-13 20:56:41,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=24567.833333333332, ans=0.125 2024-09-13 20:56:52,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.41 vs. limit=15.0 2024-09-13 20:57:03,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=24596.166666666668, ans=0.125 2024-09-13 20:57:10,396 INFO [train.py:1198] (0/2) Epoch 2, batch 2300, loss[loss=0.3449, ctc_loss=0.2672, cr_loss=0.3883, over 20938.00 frames. ], tot_loss[loss=0.3806, ctc_loss=0.2946, cr_loss=0.43, over 4102457.12 frames. ], batch size: 48, lr: 3.17e-02, grad_scale: 32.0 2024-09-13 20:57:43,431 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.578e+02 3.081e+02 3.650e+02 6.598e+02, threshold=6.161e+02, percent-clipped=1.0 2024-09-13 20:57:43,866 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:57:43,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=24681.166666666668, ans=0.2 2024-09-13 20:58:14,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=24737.833333333332, ans=0.0 2024-09-13 20:58:25,709 INFO [train.py:1198] (0/2) Epoch 2, batch 2350, loss[loss=0.3471, ctc_loss=0.2636, cr_loss=0.4173, over 20351.00 frames. ], tot_loss[loss=0.3821, ctc_loss=0.2957, cr_loss=0.432, over 4100032.11 frames. ], batch size: 45, lr: 3.17e-02, grad_scale: 32.0 2024-09-13 20:58:35,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=24766.166666666668, ans=0.0 2024-09-13 20:59:16,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24851.166666666668, ans=0.1 2024-09-13 20:59:18,326 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 20:59:26,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2024-09-13 20:59:36,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=24879.5, ans=0.125 2024-09-13 20:59:43,377 INFO [train.py:1198] (0/2) Epoch 2, batch 2400, loss[loss=0.3667, ctc_loss=0.2848, cr_loss=0.4095, over 21045.00 frames. ], tot_loss[loss=0.3827, ctc_loss=0.2961, cr_loss=0.433, over 4095197.79 frames. ], batch size: 53, lr: 3.16e-02, grad_scale: 32.0 2024-09-13 20:59:46,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=24907.833333333332, ans=0.95 2024-09-13 20:59:46,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=24907.833333333332, ans=0.125 2024-09-13 20:59:48,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=24907.833333333332, ans=0.125 2024-09-13 21:00:03,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=24936.166666666668, ans=0.0 2024-09-13 21:00:16,358 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.468e+02 2.899e+02 3.440e+02 7.814e+02, threshold=5.798e+02, percent-clipped=2.0 2024-09-13 21:00:30,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=24992.833333333332, ans=0.125 2024-09-13 21:00:57,846 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 21:00:58,971 INFO [train.py:1198] (0/2) Epoch 2, batch 2450, loss[loss=0.4237, ctc_loss=0.3339, cr_loss=0.4489, over 20654.00 frames. ], tot_loss[loss=0.3809, ctc_loss=0.2945, cr_loss=0.4321, over 4103391.93 frames. ], batch size: 71, lr: 3.15e-02, grad_scale: 32.0 2024-09-13 21:01:08,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25049.5, ans=0.1 2024-09-13 21:01:36,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=25106.166666666668, ans=0.125 2024-09-13 21:02:16,416 INFO [train.py:1198] (0/2) Epoch 2, batch 2500, loss[loss=0.3799, ctc_loss=0.2911, cr_loss=0.4442, over 21071.00 frames. ], tot_loss[loss=0.379, ctc_loss=0.2929, cr_loss=0.4304, over 4107414.09 frames. ], batch size: 59, lr: 3.15e-02, grad_scale: 32.0 2024-09-13 21:02:49,724 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.439e+02 2.976e+02 3.826e+02 6.695e+02, threshold=5.952e+02, percent-clipped=3.0 2024-09-13 21:03:05,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=25276.166666666668, ans=0.005374746376811594 2024-09-13 21:03:17,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=25304.5, ans=0.025 2024-09-13 21:03:32,045 INFO [train.py:1198] (0/2) Epoch 2, batch 2550, loss[loss=0.3757, ctc_loss=0.2913, cr_loss=0.4218, over 21078.00 frames. ], tot_loss[loss=0.3783, ctc_loss=0.2922, cr_loss=0.4307, over 4105296.09 frames. ], batch size: 56, lr: 3.14e-02, grad_scale: 32.0 2024-09-13 21:03:40,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=25332.833333333332, ans=0.005362427536231885 2024-09-13 21:04:21,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-13 21:04:38,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=25446.166666666668, ans=0.0 2024-09-13 21:04:46,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=25474.5, ans=0.125 2024-09-13 21:04:47,630 INFO [train.py:1198] (0/2) Epoch 2, batch 2600, loss[loss=0.3953, ctc_loss=0.3025, cr_loss=0.4642, over 20067.00 frames. ], tot_loss[loss=0.3771, ctc_loss=0.2912, cr_loss=0.4295, over 4105687.91 frames. ], batch size: 80, lr: 3.14e-02, grad_scale: 32.0 2024-09-13 21:05:03,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=25502.833333333332, ans=0.125 2024-09-13 21:05:11,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=25502.833333333332, ans=0.125 2024-09-13 21:05:11,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=25502.833333333332, ans=0.125 2024-09-13 21:05:20,962 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.492e+02 2.894e+02 3.636e+02 6.965e+02, threshold=5.787e+02, percent-clipped=1.0 2024-09-13 21:05:49,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=25587.833333333332, ans=0.125 2024-09-13 21:06:05,908 INFO [train.py:1198] (0/2) Epoch 2, batch 2650, loss[loss=0.3401, ctc_loss=0.2584, cr_loss=0.4087, over 20977.00 frames. ], tot_loss[loss=0.3785, ctc_loss=0.2924, cr_loss=0.4306, over 4095001.62 frames. ], batch size: 51, lr: 3.13e-02, grad_scale: 32.0 2024-09-13 21:06:45,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=25672.833333333332, ans=0.125 2024-09-13 21:07:24,669 INFO [train.py:1198] (0/2) Epoch 2, batch 2700, loss[loss=0.3398, ctc_loss=0.2653, cr_loss=0.3722, over 20990.00 frames. ], tot_loss[loss=0.3773, ctc_loss=0.2914, cr_loss=0.4293, over 4089513.97 frames. ], batch size: 52, lr: 3.13e-02, grad_scale: 32.0 2024-09-13 21:07:28,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=25757.833333333332, ans=0.125 2024-09-13 21:07:55,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=12.0 2024-09-13 21:07:58,091 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.583e+02 3.121e+02 3.841e+02 6.542e+02, threshold=6.242e+02, percent-clipped=1.0 2024-09-13 21:08:07,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=25814.5, ans=0.125 2024-09-13 21:08:34,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=25871.166666666668, ans=0.125 2024-09-13 21:08:38,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=12.0 2024-09-13 21:08:40,549 INFO [train.py:1198] (0/2) Epoch 2, batch 2750, loss[loss=0.4224, ctc_loss=0.3335, cr_loss=0.4445, over 19953.00 frames. ], tot_loss[loss=0.3779, ctc_loss=0.292, cr_loss=0.4294, over 4088783.89 frames. ], batch size: 80, lr: 3.12e-02, grad_scale: 32.0 2024-09-13 21:08:48,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=25899.5, ans=0.2 2024-09-13 21:09:22,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=25956.166666666668, ans=0.125 2024-09-13 21:09:44,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2024-09-13 21:09:55,779 INFO [train.py:1198] (0/2) Epoch 2, batch 2800, loss[loss=0.3994, ctc_loss=0.3127, cr_loss=0.4334, over 21061.00 frames. ], tot_loss[loss=0.3787, ctc_loss=0.2926, cr_loss=0.4302, over 4094871.53 frames. ], batch size: 62, lr: 3.12e-02, grad_scale: 32.0 2024-09-13 21:10:17,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=26069.5, ans=0.125 2024-09-13 21:10:28,887 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.420e+02 2.869e+02 3.483e+02 5.905e+02, threshold=5.737e+02, percent-clipped=0.0 2024-09-13 21:10:39,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=26126.166666666668, ans=0.2 2024-09-13 21:10:41,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=26126.166666666668, ans=0.07 2024-09-13 21:10:43,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2024-09-13 21:10:44,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26126.166666666668, ans=0.1 2024-09-13 21:10:59,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=26154.5, ans=0.125 2024-09-13 21:11:02,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=26154.5, ans=0.125 2024-09-13 21:11:10,501 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.89 vs. limit=10.0 2024-09-13 21:11:11,197 INFO [train.py:1198] (0/2) Epoch 2, batch 2850, loss[loss=0.3389, ctc_loss=0.2617, cr_loss=0.3864, over 21068.00 frames. ], tot_loss[loss=0.379, ctc_loss=0.293, cr_loss=0.4297, over 4089047.94 frames. ], batch size: 53, lr: 3.11e-02, grad_scale: 32.0 2024-09-13 21:12:10,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=26267.833333333332, ans=0.005159166666666667 2024-09-13 21:12:29,214 INFO [train.py:1198] (0/2) Epoch 2, batch 2900, loss[loss=0.4274, ctc_loss=0.3313, cr_loss=0.4806, over 19651.00 frames. ], tot_loss[loss=0.3785, ctc_loss=0.2925, cr_loss=0.4301, over 4089879.02 frames. ], batch size: 90, lr: 3.11e-02, grad_scale: 32.0 2024-09-13 21:12:38,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=26324.5, ans=0.125 2024-09-13 21:12:46,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.74 vs. limit=10.0 2024-09-13 21:12:54,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=26352.833333333332, ans=0.04949747468305833 2024-09-13 21:13:04,775 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.375e+02 2.665e+02 3.126e+02 5.503e+02, threshold=5.330e+02, percent-clipped=0.0 2024-09-13 21:13:38,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=26437.833333333332, ans=0.2 2024-09-13 21:13:46,943 INFO [train.py:1198] (0/2) Epoch 2, batch 2950, loss[loss=0.3697, ctc_loss=0.2849, cr_loss=0.4241, over 20921.00 frames. ], tot_loss[loss=0.3791, ctc_loss=0.2928, cr_loss=0.4311, over 4091799.66 frames. ], batch size: 60, lr: 3.10e-02, grad_scale: 32.0 2024-09-13 21:13:53,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=26466.166666666668, ans=0.125 2024-09-13 21:13:54,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=26466.166666666668, ans=0.04949747468305833 2024-09-13 21:14:00,943 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 21:14:08,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.70 vs. limit=12.0 2024-09-13 21:15:02,748 INFO [train.py:1198] (0/2) Epoch 2, batch 3000, loss[loss=0.4257, ctc_loss=0.333, cr_loss=0.4636, over 18523.00 frames. ], tot_loss[loss=0.3778, ctc_loss=0.2917, cr_loss=0.4305, over 4092800.60 frames. ], batch size: 108, lr: 3.10e-02, grad_scale: 32.0 2024-09-13 21:15:02,749 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-13 21:15:27,695 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.0114, 3.5740, 2.4373, 3.3842], device='cuda:0') 2024-09-13 21:15:29,471 INFO [train.py:1230] (0/2) Epoch 2, validation: loss=0.1081, ctc_loss=0.1081, cr_loss=9.791e-15, over 944034.00 frames. 2024-09-13 21:15:29,471 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-13 21:15:51,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=26636.166666666668, ans=0.125 2024-09-13 21:16:00,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=26664.5, ans=0.2 2024-09-13 21:16:03,218 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.207e+02 2.524e+02 3.002e+02 3.755e+02 5.598e+02, threshold=6.003e+02, percent-clipped=2.0 2024-09-13 21:16:12,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=26664.5, ans=0.005072934782608696 2024-09-13 21:16:27,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=26692.833333333332, ans=0.125 2024-09-13 21:16:42,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=26721.166666666668, ans=0.2 2024-09-13 21:16:45,085 INFO [train.py:1198] (0/2) Epoch 2, batch 3050, loss[loss=0.3494, ctc_loss=0.2697, cr_loss=0.3984, over 21056.00 frames. ], tot_loss[loss=0.3787, ctc_loss=0.2924, cr_loss=0.4317, over 4092335.71 frames. ], batch size: 53, lr: 3.09e-02, grad_scale: 32.0 2024-09-13 21:16:50,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=26749.5, ans=0.125 2024-09-13 21:16:52,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=22.5 2024-09-13 21:17:06,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=26777.833333333332, ans=0.035 2024-09-13 21:17:26,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26806.166666666668, ans=0.1 2024-09-13 21:17:58,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.10 vs. limit=22.5 2024-09-13 21:18:03,498 INFO [train.py:1198] (0/2) Epoch 2, batch 3100, loss[loss=0.3868, ctc_loss=0.3, cr_loss=0.4339, over 20928.00 frames. ], tot_loss[loss=0.377, ctc_loss=0.2909, cr_loss=0.4308, over 4093954.03 frames. ], batch size: 60, lr: 3.09e-02, grad_scale: 32.0 2024-09-13 21:18:31,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=26919.5, ans=0.125 2024-09-13 21:18:39,888 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.192e+02 2.723e+02 3.199e+02 3.771e+02 6.385e+02, threshold=6.398e+02, percent-clipped=1.0 2024-09-13 21:19:22,443 INFO [train.py:1198] (0/2) Epoch 2, batch 3150, loss[loss=0.341, ctc_loss=0.2544, cr_loss=0.4328, over 20884.00 frames. ], tot_loss[loss=0.3769, ctc_loss=0.2907, cr_loss=0.4312, over 4103365.30 frames. ], batch size: 57, lr: 3.08e-02, grad_scale: 32.0 2024-09-13 21:20:08,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=27117.833333333332, ans=0.2 2024-09-13 21:20:09,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=27117.833333333332, ans=0.125 2024-09-13 21:20:38,002 INFO [train.py:1198] (0/2) Epoch 2, batch 3200, loss[loss=0.3766, ctc_loss=0.2849, cr_loss=0.4588, over 20769.00 frames. ], tot_loss[loss=0.3771, ctc_loss=0.2907, cr_loss=0.4318, over 4096851.04 frames. ], batch size: 53, lr: 3.08e-02, grad_scale: 32.0 2024-09-13 21:20:55,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=27202.833333333332, ans=0.05 2024-09-13 21:21:11,579 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 2.544e+02 3.082e+02 3.821e+02 6.642e+02, threshold=6.165e+02, percent-clipped=1.0 2024-09-13 21:21:14,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=27231.166666666668, ans=0.125 2024-09-13 21:21:17,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27231.166666666668, ans=0.1 2024-09-13 21:21:31,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=27259.5, ans=0.2 2024-09-13 21:21:40,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.88 vs. limit=22.5 2024-09-13 21:21:43,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-09-13 21:21:52,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=27316.166666666668, ans=0.2 2024-09-13 21:21:53,315 INFO [train.py:1198] (0/2) Epoch 2, batch 3250, loss[loss=0.3607, ctc_loss=0.277, cr_loss=0.4189, over 21065.00 frames. ], tot_loss[loss=0.3754, ctc_loss=0.2893, cr_loss=0.4301, over 4090910.62 frames. ], batch size: 53, lr: 3.07e-02, grad_scale: 32.0 2024-09-13 21:22:02,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=27316.166666666668, ans=0.025 2024-09-13 21:22:06,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=27344.5, ans=0.125 2024-09-13 21:22:43,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=27401.166666666668, ans=0.125 2024-09-13 21:23:04,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=27429.5, ans=0.0 2024-09-13 21:23:11,249 INFO [train.py:1198] (0/2) Epoch 2, batch 3300, loss[loss=0.361, ctc_loss=0.2752, cr_loss=0.4285, over 21051.00 frames. ], tot_loss[loss=0.3751, ctc_loss=0.2891, cr_loss=0.4301, over 4088802.68 frames. ], batch size: 62, lr: 3.07e-02, grad_scale: 16.0 2024-09-13 21:23:22,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=27457.833333333332, ans=0.2 2024-09-13 21:23:45,995 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.393e+02 2.950e+02 3.326e+02 5.687e+02, threshold=5.901e+02, percent-clipped=0.0 2024-09-13 21:23:54,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27514.5, ans=0.1 2024-09-13 21:24:30,031 INFO [train.py:1198] (0/2) Epoch 2, batch 3350, loss[loss=0.3878, ctc_loss=0.3009, cr_loss=0.4346, over 20842.00 frames. ], tot_loss[loss=0.3761, ctc_loss=0.2899, cr_loss=0.4312, over 4099301.65 frames. ], batch size: 59, lr: 3.06e-02, grad_scale: 16.0 2024-09-13 21:24:57,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2024-09-13 21:25:18,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=27684.5, ans=0.125 2024-09-13 21:25:28,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=27712.833333333332, ans=0.025 2024-09-13 21:25:30,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=27712.833333333332, ans=0.025 2024-09-13 21:25:34,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=27712.833333333332, ans=0.07 2024-09-13 21:25:44,732 INFO [train.py:1198] (0/2) Epoch 2, batch 3400, loss[loss=0.3275, ctc_loss=0.2466, cr_loss=0.4044, over 20775.00 frames. ], tot_loss[loss=0.3774, ctc_loss=0.2908, cr_loss=0.4331, over 4096047.82 frames. ], batch size: 53, lr: 3.06e-02, grad_scale: 16.0 2024-09-13 21:26:19,132 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.117e+02 2.622e+02 3.125e+02 3.859e+02 8.234e+02, threshold=6.251e+02, percent-clipped=1.0 2024-09-13 21:26:46,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=27854.5, ans=0.0 2024-09-13 21:26:49,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=27854.5, ans=0.125 2024-09-13 21:26:58,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=27882.833333333332, ans=0.025 2024-09-13 21:26:58,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27882.833333333332, ans=0.1 2024-09-13 21:26:58,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=27882.833333333332, ans=0.025 2024-09-13 21:27:00,195 INFO [train.py:1198] (0/2) Epoch 2, batch 3450, loss[loss=0.317, ctc_loss=0.2366, cr_loss=0.4019, over 20986.00 frames. ], tot_loss[loss=0.3764, ctc_loss=0.2901, cr_loss=0.4314, over 4088948.62 frames. ], batch size: 51, lr: 3.05e-02, grad_scale: 16.0 2024-09-13 21:27:19,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2024-09-13 21:27:37,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=27939.5, ans=0.0 2024-09-13 21:28:13,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=27996.166666666668, ans=0.025 2024-09-13 21:28:16,035 INFO [train.py:1198] (0/2) Epoch 2, batch 3500, loss[loss=0.3423, ctc_loss=0.2612, cr_loss=0.4056, over 20808.00 frames. ], tot_loss[loss=0.3738, ctc_loss=0.2878, cr_loss=0.43, over 4097866.77 frames. ], batch size: 53, lr: 3.05e-02, grad_scale: 16.0 2024-09-13 21:28:53,350 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.524e+02 3.059e+02 3.692e+02 6.896e+02, threshold=6.118e+02, percent-clipped=2.0 2024-09-13 21:29:13,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=28109.5, ans=0.0 2024-09-13 21:29:30,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=28137.833333333332, ans=0.125 2024-09-13 21:29:33,982 INFO [train.py:1198] (0/2) Epoch 2, batch 3550, loss[loss=0.4193, ctc_loss=0.3259, cr_loss=0.4667, over 20701.00 frames. ], tot_loss[loss=0.3745, ctc_loss=0.2885, cr_loss=0.4304, over 4087875.93 frames. ], batch size: 68, lr: 3.04e-02, grad_scale: 16.0 2024-09-13 21:29:45,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=28166.166666666668, ans=0.004746485507246377 2024-09-13 21:29:51,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=28194.5, ans=0.125 2024-09-13 21:30:09,322 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 21:30:09,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=28222.833333333332, ans=0.125 2024-09-13 21:30:52,733 INFO [train.py:1198] (0/2) Epoch 2, batch 3600, loss[loss=0.3542, ctc_loss=0.2718, cr_loss=0.4119, over 21065.00 frames. ], tot_loss[loss=0.3747, ctc_loss=0.2886, cr_loss=0.4309, over 4097310.47 frames. ], batch size: 53, lr: 3.04e-02, grad_scale: 32.0 2024-09-13 21:31:27,881 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.592e+02 2.935e+02 3.605e+02 6.016e+02, threshold=5.870e+02, percent-clipped=0.0 2024-09-13 21:31:32,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=28364.5, ans=0.125 2024-09-13 21:31:51,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=28421.166666666668, ans=0.125 2024-09-13 21:32:04,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=28421.166666666668, ans=0.125 2024-09-13 21:32:08,402 INFO [train.py:1198] (0/2) Epoch 2, batch 3650, loss[loss=0.3103, ctc_loss=0.2365, cr_loss=0.3687, over 20798.00 frames. ], tot_loss[loss=0.375, ctc_loss=0.2888, cr_loss=0.4307, over 4085825.05 frames. ], batch size: 53, lr: 3.03e-02, grad_scale: 32.0 2024-09-13 21:32:37,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28506.166666666668, ans=0.1 2024-09-13 21:33:04,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=28534.5, ans=0.125 2024-09-13 21:33:23,181 INFO [train.py:1198] (0/2) Epoch 2, batch 3700, loss[loss=0.3605, ctc_loss=0.2714, cr_loss=0.4455, over 20969.00 frames. ], tot_loss[loss=0.3758, ctc_loss=0.2897, cr_loss=0.431, over 4079380.18 frames. ], batch size: 58, lr: 3.03e-02, grad_scale: 32.0 2024-09-13 21:34:01,185 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.509e+02 2.913e+02 3.469e+02 6.670e+02, threshold=5.827e+02, percent-clipped=1.0 2024-09-13 21:34:07,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28647.833333333332, ans=0.1 2024-09-13 21:34:41,826 INFO [train.py:1198] (0/2) Epoch 2, batch 3750, loss[loss=0.3306, ctc_loss=0.255, cr_loss=0.3781, over 20950.00 frames. ], tot_loss[loss=0.3741, ctc_loss=0.2881, cr_loss=0.4296, over 4090052.71 frames. ], batch size: 51, lr: 3.02e-02, grad_scale: 32.0 2024-09-13 21:34:42,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=12.0 2024-09-13 21:35:30,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=28817.833333333332, ans=0.125 2024-09-13 21:35:49,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=28846.166666666668, ans=10.0 2024-09-13 21:35:51,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=28846.166666666668, ans=0.125 2024-09-13 21:36:00,611 INFO [train.py:1198] (0/2) Epoch 2, batch 3800, loss[loss=0.3233, ctc_loss=0.2477, cr_loss=0.3782, over 20962.00 frames. ], tot_loss[loss=0.375, ctc_loss=0.289, cr_loss=0.4301, over 4086189.39 frames. ], batch size: 50, lr: 3.02e-02, grad_scale: 32.0 2024-09-13 21:36:09,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=28874.5, ans=0.125 2024-09-13 21:36:35,156 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.098e+02 2.554e+02 2.968e+02 3.753e+02 5.956e+02, threshold=5.936e+02, percent-clipped=1.0 2024-09-13 21:36:47,586 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 21:36:52,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=28959.5, ans=0.05 2024-09-13 21:37:03,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=28987.833333333332, ans=0.0 2024-09-13 21:37:15,818 INFO [train.py:1198] (0/2) Epoch 2, batch 3850, loss[loss=0.3058, ctc_loss=0.2296, cr_loss=0.381, over 20984.00 frames. ], tot_loss[loss=0.3751, ctc_loss=0.2892, cr_loss=0.4295, over 4092444.13 frames. ], batch size: 51, lr: 3.01e-02, grad_scale: 32.0 2024-09-13 21:38:06,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=29101.166666666668, ans=0.125 2024-09-13 21:38:31,307 INFO [train.py:1198] (0/2) Epoch 2, batch 3900, loss[loss=0.3793, ctc_loss=0.2899, cr_loss=0.4469, over 21100.00 frames. ], tot_loss[loss=0.3752, ctc_loss=0.2891, cr_loss=0.4309, over 4097998.67 frames. ], batch size: 59, lr: 3.01e-02, grad_scale: 32.0 2024-09-13 21:38:55,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29186.166666666668, ans=0.1 2024-09-13 21:39:06,251 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.522e+02 2.954e+02 3.504e+02 5.465e+02, threshold=5.909e+02, percent-clipped=0.0 2024-09-13 21:39:24,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=29242.833333333332, ans=0.1 2024-09-13 21:39:50,088 INFO [train.py:1198] (0/2) Epoch 2, batch 3950, loss[loss=0.3335, ctc_loss=0.2543, cr_loss=0.3959, over 20971.00 frames. ], tot_loss[loss=0.3751, ctc_loss=0.2889, cr_loss=0.431, over 4096774.88 frames. ], batch size: 48, lr: 3.01e-02, grad_scale: 32.0 2024-09-13 21:40:20,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=29356.166666666668, ans=0.0 2024-09-13 21:40:45,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.97 vs. limit=10.0 2024-09-13 21:40:55,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=29412.833333333332, ans=0.125 2024-09-13 21:41:05,512 INFO [train.py:1198] (0/2) Epoch 2, batch 4000, loss[loss=0.3487, ctc_loss=0.2641, cr_loss=0.4228, over 20983.00 frames. ], tot_loss[loss=0.3734, ctc_loss=0.2874, cr_loss=0.4303, over 4098447.95 frames. ], batch size: 49, lr: 3.00e-02, grad_scale: 32.0 2024-09-13 21:41:28,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=29469.5, ans=0.125 2024-09-13 21:41:43,176 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.438e+02 2.873e+02 3.463e+02 6.218e+02, threshold=5.747e+02, percent-clipped=1.0 2024-09-13 21:41:51,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.55 vs. limit=10.0 2024-09-13 21:42:13,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=29554.5, ans=0.07 2024-09-13 21:42:18,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2024-09-13 21:42:23,822 INFO [train.py:1198] (0/2) Epoch 2, batch 4050, loss[loss=0.3242, ctc_loss=0.2484, cr_loss=0.379, over 20970.00 frames. ], tot_loss[loss=0.373, ctc_loss=0.287, cr_loss=0.4297, over 4094845.93 frames. ], batch size: 48, lr: 3.00e-02, grad_scale: 32.0 2024-09-13 21:42:27,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=29582.833333333332, ans=15.0 2024-09-13 21:42:49,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=29611.166666666668, ans=0.125 2024-09-13 21:43:11,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=29667.833333333332, ans=6.0 2024-09-13 21:43:39,654 INFO [train.py:1198] (0/2) Epoch 2, batch 4100, loss[loss=0.389, ctc_loss=0.2974, cr_loss=0.4583, over 20618.00 frames. ], tot_loss[loss=0.3704, ctc_loss=0.2848, cr_loss=0.4281, over 4097905.12 frames. ], batch size: 71, lr: 2.99e-02, grad_scale: 32.0 2024-09-13 21:43:42,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=29724.5, ans=0.125 2024-09-13 21:43:52,245 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 21:43:55,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=29752.833333333332, ans=0.125 2024-09-13 21:44:14,286 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.475e+02 2.989e+02 3.810e+02 6.440e+02, threshold=5.979e+02, percent-clipped=4.0 2024-09-13 21:44:17,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=29781.166666666668, ans=0.125 2024-09-13 21:44:55,207 INFO [train.py:1198] (0/2) Epoch 2, batch 4150, loss[loss=0.4583, ctc_loss=0.3741, cr_loss=0.4209, over 13799.00 frames. ], tot_loss[loss=0.3702, ctc_loss=0.2845, cr_loss=0.4285, over 4099149.58 frames. ], batch size: 149, lr: 2.99e-02, grad_scale: 32.0 2024-09-13 21:45:34,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=29922.833333333332, ans=0.125 2024-09-13 21:45:36,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=29922.833333333332, ans=0.004364601449275362 2024-09-13 21:45:41,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=29922.833333333332, ans=0.5 2024-09-13 21:46:00,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=29979.5, ans=0.0 2024-09-13 21:46:05,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=29979.5, ans=0.0 2024-09-13 21:46:13,565 INFO [train.py:1198] (0/2) Epoch 2, batch 4200, loss[loss=0.3154, ctc_loss=0.2426, cr_loss=0.3639, over 19851.00 frames. ], tot_loss[loss=0.3719, ctc_loss=0.2859, cr_loss=0.4297, over 4086969.64 frames. ], batch size: 44, lr: 2.98e-02, grad_scale: 32.0 2024-09-13 21:46:18,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=30007.833333333332, ans=0.125 2024-09-13 21:46:48,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=30064.5, ans=0.95 2024-09-13 21:46:50,698 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2024-09-13 21:46:51,467 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.095e+02 2.607e+02 3.097e+02 3.773e+02 6.203e+02, threshold=6.195e+02, percent-clipped=1.0 2024-09-13 21:47:02,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2024-09-13 21:47:32,376 INFO [train.py:1198] (0/2) Epoch 2, batch 4250, loss[loss=0.3791, ctc_loss=0.2863, cr_loss=0.4639, over 20136.00 frames. ], tot_loss[loss=0.372, ctc_loss=0.2858, cr_loss=0.4311, over 4096055.65 frames. ], batch size: 80, lr: 2.98e-02, grad_scale: 32.0 2024-09-13 21:47:50,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=30177.833333333332, ans=0.125 2024-09-13 21:47:58,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=30177.833333333332, ans=0.07 2024-09-13 21:48:11,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=30206.166666666668, ans=0.125 2024-09-13 21:48:17,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=30234.5, ans=0.004296847826086956 2024-09-13 21:48:24,189 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.41 vs. limit=10.0 2024-09-13 21:48:30,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.44 vs. limit=22.5 2024-09-13 21:48:47,571 INFO [train.py:1198] (0/2) Epoch 2, batch 4300, loss[loss=0.3767, ctc_loss=0.2922, cr_loss=0.4223, over 20663.00 frames. ], tot_loss[loss=0.3715, ctc_loss=0.2853, cr_loss=0.431, over 4095260.29 frames. ], batch size: 68, lr: 2.97e-02, grad_scale: 32.0 2024-09-13 21:48:54,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=30291.166666666668, ans=0.125 2024-09-13 21:48:54,421 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-13 21:49:00,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=30291.166666666668, ans=0.0 2024-09-13 21:49:03,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=30319.5, ans=0.1 2024-09-13 21:49:15,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=30319.5, ans=0.125 2024-09-13 21:49:22,293 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.987e+02 2.448e+02 2.886e+02 3.660e+02 4.843e+02, threshold=5.772e+02, percent-clipped=0.0 2024-09-13 21:49:50,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30404.5, ans=0.1 2024-09-13 21:49:53,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=30404.5, ans=0.5 2024-09-13 21:50:03,215 INFO [train.py:1198] (0/2) Epoch 2, batch 4350, loss[loss=0.3631, ctc_loss=0.2753, cr_loss=0.4393, over 21000.00 frames. ], tot_loss[loss=0.3703, ctc_loss=0.2841, cr_loss=0.431, over 4111263.47 frames. ], batch size: 55, lr: 2.97e-02, grad_scale: 32.0 2024-09-13 21:50:08,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=30432.833333333332, ans=0.2 2024-09-13 21:50:36,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2024-09-13 21:50:56,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=30517.833333333332, ans=0.125 2024-09-13 21:51:00,503 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 21:51:02,346 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.73 vs. limit=22.5 2024-09-13 21:51:17,962 INFO [train.py:1198] (0/2) Epoch 2, batch 4400, loss[loss=0.3395, ctc_loss=0.2612, cr_loss=0.3912, over 20934.00 frames. ], tot_loss[loss=0.3695, ctc_loss=0.2835, cr_loss=0.4303, over 4107634.05 frames. ], batch size: 50, lr: 2.96e-02, grad_scale: 32.0 2024-09-13 21:51:55,784 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.494e+02 2.792e+02 3.361e+02 5.398e+02, threshold=5.584e+02, percent-clipped=0.0 2024-09-13 21:52:32,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=30687.833333333332, ans=0.0 2024-09-13 21:52:34,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=30687.833333333332, ans=0.125 2024-09-13 21:52:37,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=30687.833333333332, ans=0.2 2024-09-13 21:52:39,911 INFO [train.py:1198] (0/2) Epoch 2, batch 4450, loss[loss=0.3608, ctc_loss=0.2805, cr_loss=0.4011, over 20801.00 frames. ], tot_loss[loss=0.3707, ctc_loss=0.2845, cr_loss=0.431, over 4103333.90 frames. ], batch size: 53, lr: 2.96e-02, grad_scale: 32.0 2024-09-13 21:52:44,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30716.166666666668, ans=0.125 2024-09-13 21:53:25,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=30801.166666666668, ans=0.125 2024-09-13 21:53:29,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30801.166666666668, ans=0.1 2024-09-13 21:53:41,158 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=6.0 2024-09-13 21:53:55,513 INFO [train.py:1198] (0/2) Epoch 2, batch 4500, loss[loss=0.3336, ctc_loss=0.2529, cr_loss=0.4039, over 19944.00 frames. ], tot_loss[loss=0.371, ctc_loss=0.2846, cr_loss=0.4316, over 4101754.30 frames. ], batch size: 44, lr: 2.95e-02, grad_scale: 16.0 2024-09-13 21:54:09,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30886.166666666668, ans=0.1 2024-09-13 21:54:31,705 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.641e+02 3.317e+02 3.996e+02 5.933e+02, threshold=6.633e+02, percent-clipped=3.0 2024-09-13 21:54:32,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=30914.5, ans=0.125 2024-09-13 21:54:45,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=30942.833333333332, ans=0.125 2024-09-13 21:55:04,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30971.166666666668, ans=0.1 2024-09-13 21:55:11,450 INFO [train.py:1198] (0/2) Epoch 2, batch 4550, loss[loss=0.4336, ctc_loss=0.3394, cr_loss=0.4708, over 18275.00 frames. ], tot_loss[loss=0.3704, ctc_loss=0.2841, cr_loss=0.4315, over 4107777.41 frames. ], batch size: 108, lr: 2.95e-02, grad_scale: 16.0 2024-09-13 21:55:30,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=31027.833333333332, ans=0.125 2024-09-13 21:55:39,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=31027.833333333332, ans=0.0 2024-09-13 21:56:11,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=31112.833333333332, ans=0.125 2024-09-13 21:56:21,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=31112.833333333332, ans=0.5 2024-09-13 21:56:23,317 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2024-09-13 21:56:27,111 INFO [train.py:1198] (0/2) Epoch 2, batch 4600, loss[loss=0.3902, ctc_loss=0.3042, cr_loss=0.43, over 20960.00 frames. ], tot_loss[loss=0.3702, ctc_loss=0.2839, cr_loss=0.4313, over 4101204.56 frames. ], batch size: 58, lr: 2.94e-02, grad_scale: 16.0 2024-09-13 21:56:41,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=31169.5, ans=0.0 2024-09-13 21:56:54,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=31169.5, ans=0.0 2024-09-13 21:56:56,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=31197.833333333332, ans=0.125 2024-09-13 21:57:03,540 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.477e+02 2.896e+02 3.591e+02 8.040e+02, threshold=5.792e+02, percent-clipped=3.0 2024-09-13 21:57:09,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.93 vs. limit=22.5 2024-09-13 21:57:22,173 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.63 vs. limit=10.0 2024-09-13 21:57:29,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=31254.5, ans=0.004075108695652174 2024-09-13 21:57:45,443 INFO [train.py:1198] (0/2) Epoch 2, batch 4650, loss[loss=0.3126, ctc_loss=0.2334, cr_loss=0.3962, over 20990.00 frames. ], tot_loss[loss=0.3702, ctc_loss=0.2837, cr_loss=0.4323, over 4109010.94 frames. ], batch size: 52, lr: 2.94e-02, grad_scale: 16.0 2024-09-13 21:58:04,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31311.166666666668, ans=0.1 2024-09-13 21:58:10,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=31311.166666666668, ans=0.125 2024-09-13 21:58:10,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2024-09-13 21:58:17,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31339.5, ans=0.1 2024-09-13 21:59:03,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=31424.5, ans=0.125 2024-09-13 21:59:04,607 INFO [train.py:1198] (0/2) Epoch 2, batch 4700, loss[loss=0.3835, ctc_loss=0.294, cr_loss=0.4474, over 20960.00 frames. ], tot_loss[loss=0.3717, ctc_loss=0.2852, cr_loss=0.4325, over 4090982.79 frames. ], batch size: 58, lr: 2.94e-02, grad_scale: 16.0 2024-09-13 21:59:31,127 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=22.5 2024-09-13 21:59:38,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.80 vs. limit=6.0 2024-09-13 21:59:40,799 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.365e+02 2.733e+02 3.647e+02 6.319e+02, threshold=5.467e+02, percent-clipped=3.0 2024-09-13 21:59:41,522 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2024-09-13 21:59:57,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=31509.5, ans=0.004019673913043478 2024-09-13 22:00:14,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=31537.833333333332, ans=0.125 2024-09-13 22:00:20,082 INFO [train.py:1198] (0/2) Epoch 2, batch 4750, loss[loss=0.2964, ctc_loss=0.2229, cr_loss=0.3679, over 20966.00 frames. ], tot_loss[loss=0.3718, ctc_loss=0.2853, cr_loss=0.4327, over 4094680.72 frames. ], batch size: 48, lr: 2.93e-02, grad_scale: 16.0 2024-09-13 22:00:32,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=31566.166666666668, ans=0.004007355072463767 2024-09-13 22:00:54,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=31622.833333333332, ans=0.125 2024-09-13 22:00:59,949 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-09-13 22:01:11,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=31651.166666666668, ans=0.025 2024-09-13 22:01:26,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=31679.5, ans=0.2 2024-09-13 22:01:29,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=31679.5, ans=0.2 2024-09-13 22:01:35,356 INFO [train.py:1198] (0/2) Epoch 2, batch 4800, loss[loss=0.4207, ctc_loss=0.3271, cr_loss=0.4682, over 21076.00 frames. ], tot_loss[loss=0.3716, ctc_loss=0.285, cr_loss=0.4326, over 4094537.19 frames. ], batch size: 59, lr: 2.93e-02, grad_scale: 32.0 2024-09-13 22:01:52,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=31736.166666666668, ans=0.025 2024-09-13 22:01:58,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=31736.166666666668, ans=0.125 2024-09-13 22:02:12,248 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.403e+02 2.827e+02 3.591e+02 5.312e+02, threshold=5.653e+02, percent-clipped=0.0 2024-09-13 22:02:26,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=31792.833333333332, ans=0.07 2024-09-13 22:02:33,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=31792.833333333332, ans=0.0 2024-09-13 22:02:45,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=31821.166666666668, ans=0.125 2024-09-13 22:02:51,331 INFO [train.py:1198] (0/2) Epoch 2, batch 4850, loss[loss=0.3695, ctc_loss=0.2869, cr_loss=0.413, over 20899.00 frames. ], tot_loss[loss=0.3709, ctc_loss=0.2846, cr_loss=0.4319, over 4095914.26 frames. ], batch size: 54, lr: 2.92e-02, grad_scale: 32.0 2024-09-13 22:02:56,325 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:03:02,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=31849.5, ans=0.003945760869565217 2024-09-13 22:03:06,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=31849.5, ans=0.0 2024-09-13 22:03:07,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2024-09-13 22:03:17,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31877.833333333332, ans=0.1 2024-09-13 22:04:13,413 INFO [train.py:1198] (0/2) Epoch 2, batch 4900, loss[loss=0.3658, ctc_loss=0.2804, cr_loss=0.427, over 20682.00 frames. ], tot_loss[loss=0.3697, ctc_loss=0.2835, cr_loss=0.4311, over 4101482.27 frames. ], batch size: 71, lr: 2.92e-02, grad_scale: 32.0 2024-09-13 22:04:16,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31991.166666666668, ans=0.1 2024-09-13 22:04:30,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=32019.5, ans=0.0 2024-09-13 22:04:32,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-09-13 22:04:37,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=32019.5, ans=0.0 2024-09-13 22:04:49,376 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.121e+02 2.447e+02 2.815e+02 3.133e+02 5.324e+02, threshold=5.630e+02, percent-clipped=0.0 2024-09-13 22:04:51,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=32047.833333333332, ans=0.125 2024-09-13 22:05:15,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=32104.5, ans=0.2 2024-09-13 22:05:27,983 INFO [train.py:1198] (0/2) Epoch 2, batch 4950, loss[loss=0.4028, ctc_loss=0.3148, cr_loss=0.4404, over 20827.00 frames. ], tot_loss[loss=0.3695, ctc_loss=0.2832, cr_loss=0.4312, over 4105908.77 frames. ], batch size: 65, lr: 2.91e-02, grad_scale: 32.0 2024-09-13 22:05:34,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=32132.833333333332, ans=0.07 2024-09-13 22:05:47,895 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=12.0 2024-09-13 22:05:48,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2024-09-13 22:06:00,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=32189.5, ans=0.125 2024-09-13 22:06:22,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=32217.833333333332, ans=0.0 2024-09-13 22:06:35,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=32246.166666666668, ans=0.025 2024-09-13 22:06:41,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=32274.5, ans=0.125 2024-09-13 22:06:42,391 INFO [train.py:1198] (0/2) Epoch 2, batch 5000, loss[loss=0.3806, ctc_loss=0.2924, cr_loss=0.4405, over 20029.00 frames. ], tot_loss[loss=0.37, ctc_loss=0.2835, cr_loss=0.4326, over 4107497.93 frames. ], batch size: 80, lr: 2.91e-02, grad_scale: 32.0 2024-09-13 22:06:54,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=32274.5, ans=0.125 2024-09-13 22:07:16,070 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=22.5 2024-09-13 22:07:18,323 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.111e+02 2.453e+02 2.845e+02 3.442e+02 5.343e+02, threshold=5.689e+02, percent-clipped=0.0 2024-09-13 22:07:44,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=32387.833333333332, ans=0.0 2024-09-13 22:07:45,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=32387.833333333332, ans=0.05 2024-09-13 22:07:57,360 INFO [train.py:1198] (0/2) Epoch 2, batch 5050, loss[loss=0.3572, ctc_loss=0.2739, cr_loss=0.4167, over 20778.00 frames. ], tot_loss[loss=0.3692, ctc_loss=0.283, cr_loss=0.4313, over 4097214.43 frames. ], batch size: 56, lr: 2.90e-02, grad_scale: 32.0 2024-09-13 22:08:07,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=32416.166666666668, ans=0.0 2024-09-13 22:08:15,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32444.5, ans=0.1 2024-09-13 22:08:37,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=32472.833333333332, ans=0.125 2024-09-13 22:08:50,697 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:09:10,932 INFO [train.py:1198] (0/2) Epoch 2, batch 5100, loss[loss=0.3883, ctc_loss=0.2942, cr_loss=0.4704, over 20016.00 frames. ], tot_loss[loss=0.3702, ctc_loss=0.2838, cr_loss=0.432, over 4089538.52 frames. ], batch size: 80, lr: 2.90e-02, grad_scale: 32.0 2024-09-13 22:09:37,269 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2024-09-13 22:09:46,715 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.554e+02 2.891e+02 3.791e+02 6.853e+02, threshold=5.783e+02, percent-clipped=7.0 2024-09-13 22:09:58,093 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-09-13 22:10:09,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=32671.166666666668, ans=0.0 2024-09-13 22:10:25,636 INFO [train.py:1198] (0/2) Epoch 2, batch 5150, loss[loss=0.3567, ctc_loss=0.271, cr_loss=0.4285, over 20992.00 frames. ], tot_loss[loss=0.3683, ctc_loss=0.282, cr_loss=0.4312, over 4102545.90 frames. ], batch size: 61, lr: 2.90e-02, grad_scale: 32.0 2024-09-13 22:10:37,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=32699.5, ans=0.125 2024-09-13 22:11:22,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=32784.5, ans=0.125 2024-09-13 22:11:28,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-09-13 22:11:39,994 INFO [train.py:1198] (0/2) Epoch 2, batch 5200, loss[loss=0.3088, ctc_loss=0.2303, cr_loss=0.3926, over 21019.00 frames. ], tot_loss[loss=0.3691, ctc_loss=0.2828, cr_loss=0.4316, over 4093406.92 frames. ], batch size: 48, lr: 2.89e-02, grad_scale: 32.0 2024-09-13 22:12:09,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=32869.5, ans=0.025 2024-09-13 22:12:18,431 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.137e+02 2.583e+02 3.016e+02 3.697e+02 5.995e+02, threshold=6.031e+02, percent-clipped=1.0 2024-09-13 22:12:32,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=32926.166666666664, ans=0.003711702898550725 2024-09-13 22:12:57,137 INFO [train.py:1198] (0/2) Epoch 2, batch 5250, loss[loss=0.3269, ctc_loss=0.2471, cr_loss=0.3991, over 20985.00 frames. ], tot_loss[loss=0.368, ctc_loss=0.2816, cr_loss=0.4319, over 4097101.36 frames. ], batch size: 52, lr: 2.89e-02, grad_scale: 32.0 2024-09-13 22:13:31,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=33039.5, ans=0.125 2024-09-13 22:13:45,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=33067.833333333336, ans=0.125 2024-09-13 22:14:03,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=33096.166666666664, ans=0.0 2024-09-13 22:14:06,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=33096.166666666664, ans=0.035 2024-09-13 22:14:11,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=33096.166666666664, ans=0.02 2024-09-13 22:14:13,852 INFO [train.py:1198] (0/2) Epoch 2, batch 5300, loss[loss=0.3648, ctc_loss=0.2792, cr_loss=0.428, over 20783.00 frames. ], tot_loss[loss=0.3679, ctc_loss=0.2815, cr_loss=0.4321, over 4100846.60 frames. ], batch size: 56, lr: 2.88e-02, grad_scale: 32.0 2024-09-13 22:14:14,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=33124.5, ans=0.025 2024-09-13 22:14:22,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=33124.5, ans=0.0 2024-09-13 22:14:49,499 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.444e+02 2.819e+02 3.355e+02 5.739e+02, threshold=5.638e+02, percent-clipped=0.0 2024-09-13 22:14:55,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=33181.166666666664, ans=0.015 2024-09-13 22:15:10,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=33209.5, ans=0.0 2024-09-13 22:15:15,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=33237.833333333336, ans=0.0 2024-09-13 22:15:15,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=33237.833333333336, ans=0.125 2024-09-13 22:15:28,413 INFO [train.py:1198] (0/2) Epoch 2, batch 5350, loss[loss=0.3628, ctc_loss=0.2766, cr_loss=0.4308, over 21029.00 frames. ], tot_loss[loss=0.3674, ctc_loss=0.2812, cr_loss=0.431, over 4100574.94 frames. ], batch size: 56, lr: 2.88e-02, grad_scale: 32.0 2024-09-13 22:15:43,487 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:16:42,197 INFO [train.py:1198] (0/2) Epoch 2, batch 5400, loss[loss=0.3848, ctc_loss=0.293, cr_loss=0.4591, over 20990.00 frames. ], tot_loss[loss=0.3671, ctc_loss=0.281, cr_loss=0.4308, over 4097668.98 frames. ], batch size: 61, lr: 2.87e-02, grad_scale: 32.0 2024-09-13 22:17:07,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=33436.166666666664, ans=0.025 2024-09-13 22:17:17,989 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 2.537e+02 2.920e+02 3.595e+02 6.063e+02, threshold=5.839e+02, percent-clipped=1.0 2024-09-13 22:17:24,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.52 vs. limit=6.0 2024-09-13 22:17:49,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=33521.166666666664, ans=0.125 2024-09-13 22:17:56,130 INFO [train.py:1198] (0/2) Epoch 2, batch 5450, loss[loss=0.3466, ctc_loss=0.2671, cr_loss=0.3973, over 21081.00 frames. ], tot_loss[loss=0.3665, ctc_loss=0.2804, cr_loss=0.4306, over 4100990.14 frames. ], batch size: 56, lr: 2.87e-02, grad_scale: 32.0 2024-09-13 22:17:56,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=33549.5, ans=0.0 2024-09-13 22:17:59,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33549.5, ans=0.1 2024-09-13 22:18:23,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=33577.833333333336, ans=0.05 2024-09-13 22:19:10,438 INFO [train.py:1198] (0/2) Epoch 2, batch 5500, loss[loss=0.3627, ctc_loss=0.2821, cr_loss=0.4028, over 20958.00 frames. ], tot_loss[loss=0.3652, ctc_loss=0.2793, cr_loss=0.4295, over 4101228.45 frames. ], batch size: 55, lr: 2.86e-02, grad_scale: 32.0 2024-09-13 22:19:18,670 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-09-13 22:19:27,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=33719.5, ans=0.1 2024-09-13 22:19:37,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=33719.5, ans=0.035 2024-09-13 22:19:46,229 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.566e+02 3.176e+02 3.818e+02 6.045e+02, threshold=6.352e+02, percent-clipped=1.0 2024-09-13 22:20:24,738 INFO [train.py:1198] (0/2) Epoch 2, batch 5550, loss[loss=0.3274, ctc_loss=0.2434, cr_loss=0.4198, over 20972.00 frames. ], tot_loss[loss=0.3651, ctc_loss=0.2794, cr_loss=0.4284, over 4082497.49 frames. ], batch size: 50, lr: 2.86e-02, grad_scale: 32.0 2024-09-13 22:20:53,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=33889.5, ans=0.125 2024-09-13 22:21:41,738 INFO [train.py:1198] (0/2) Epoch 2, batch 5600, loss[loss=0.3591, ctc_loss=0.2794, cr_loss=0.3983, over 21020.00 frames. ], tot_loss[loss=0.3636, ctc_loss=0.2781, cr_loss=0.4273, over 4091871.99 frames. ], batch size: 61, lr: 2.86e-02, grad_scale: 32.0 2024-09-13 22:21:53,762 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-12000.pt 2024-09-13 22:22:05,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=34002.833333333336, ans=0.2 2024-09-13 22:22:05,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-09-13 22:22:17,982 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.493e+02 2.811e+02 3.455e+02 6.224e+02, threshold=5.621e+02, percent-clipped=0.0 2024-09-13 22:22:54,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=34087.833333333336, ans=0.025 2024-09-13 22:22:58,562 INFO [train.py:1198] (0/2) Epoch 2, batch 5650, loss[loss=0.3087, ctc_loss=0.2328, cr_loss=0.3795, over 21064.00 frames. ], tot_loss[loss=0.3649, ctc_loss=0.2793, cr_loss=0.4281, over 4087375.29 frames. ], batch size: 53, lr: 2.85e-02, grad_scale: 32.0 2024-09-13 22:23:44,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2024-09-13 22:23:48,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=34201.166666666664, ans=0.125 2024-09-13 22:23:58,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=34229.5, ans=0.125 2024-09-13 22:24:12,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.30 vs. limit=15.0 2024-09-13 22:24:13,019 INFO [train.py:1198] (0/2) Epoch 2, batch 5700, loss[loss=0.3262, ctc_loss=0.2448, cr_loss=0.407, over 20983.00 frames. ], tot_loss[loss=0.3643, ctc_loss=0.2787, cr_loss=0.4278, over 4088627.20 frames. ], batch size: 49, lr: 2.85e-02, grad_scale: 32.0 2024-09-13 22:24:16,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=34257.833333333336, ans=0.125 2024-09-13 22:24:29,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=34286.166666666664, ans=0.125 2024-09-13 22:24:48,578 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.427e+02 2.786e+02 3.610e+02 6.157e+02, threshold=5.572e+02, percent-clipped=1.0 2024-09-13 22:25:07,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.17 vs. limit=22.5 2024-09-13 22:25:27,254 INFO [train.py:1198] (0/2) Epoch 2, batch 5750, loss[loss=0.3343, ctc_loss=0.2454, cr_loss=0.4445, over 20985.00 frames. ], tot_loss[loss=0.364, ctc_loss=0.2783, cr_loss=0.4282, over 4089642.41 frames. ], batch size: 55, lr: 2.84e-02, grad_scale: 32.0 2024-09-13 22:25:32,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-09-13 22:25:35,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=34399.5, ans=0.0 2024-09-13 22:25:59,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=34456.166666666664, ans=6.0 2024-09-13 22:26:42,074 INFO [train.py:1198] (0/2) Epoch 2, batch 5800, loss[loss=0.3676, ctc_loss=0.28, cr_loss=0.4379, over 21051.00 frames. ], tot_loss[loss=0.3644, ctc_loss=0.2786, cr_loss=0.4287, over 4093742.72 frames. ], batch size: 56, lr: 2.84e-02, grad_scale: 32.0 2024-09-13 22:26:45,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=34541.166666666664, ans=0.125 2024-09-13 22:27:01,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.14 vs. limit=22.5 2024-09-13 22:27:08,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34569.5, ans=0.1 2024-09-13 22:27:17,376 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.563e+02 3.034e+02 3.601e+02 5.396e+02, threshold=6.068e+02, percent-clipped=0.0 2024-09-13 22:27:23,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=34597.833333333336, ans=0.05 2024-09-13 22:27:32,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=34626.166666666664, ans=0.125 2024-09-13 22:27:38,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=34626.166666666664, ans=0.0 2024-09-13 22:27:53,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=34654.5, ans=0.125 2024-09-13 22:27:55,910 INFO [train.py:1198] (0/2) Epoch 2, batch 5850, loss[loss=0.3632, ctc_loss=0.2715, cr_loss=0.4587, over 20794.00 frames. ], tot_loss[loss=0.3644, ctc_loss=0.2784, cr_loss=0.4299, over 4102752.33 frames. ], batch size: 53, lr: 2.83e-02, grad_scale: 32.0 2024-09-13 22:28:23,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2024-09-13 22:28:33,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=34739.5, ans=0.025 2024-09-13 22:29:10,260 INFO [train.py:1198] (0/2) Epoch 2, batch 5900, loss[loss=0.367, ctc_loss=0.2816, cr_loss=0.4272, over 21024.00 frames. ], tot_loss[loss=0.3655, ctc_loss=0.2794, cr_loss=0.4303, over 4101653.93 frames. ], batch size: 61, lr: 2.83e-02, grad_scale: 32.0 2024-09-13 22:29:27,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=34852.833333333336, ans=0.2 2024-09-13 22:29:33,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=34852.833333333336, ans=0.125 2024-09-13 22:29:46,194 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.102e+02 2.494e+02 2.855e+02 3.280e+02 5.796e+02, threshold=5.711e+02, percent-clipped=0.0 2024-09-13 22:30:14,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=34937.833333333336, ans=0.04949747468305833 2024-09-13 22:30:15,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=34937.833333333336, ans=0.125 2024-09-13 22:30:19,118 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.58 vs. limit=10.0 2024-09-13 22:30:21,521 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:30:26,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=34966.166666666664, ans=0.2 2024-09-13 22:30:27,153 INFO [train.py:1198] (0/2) Epoch 2, batch 5950, loss[loss=0.447, ctc_loss=0.3612, cr_loss=0.429, over 13966.00 frames. ], tot_loss[loss=0.3662, ctc_loss=0.2802, cr_loss=0.4301, over 4080241.91 frames. ], batch size: 151, lr: 2.83e-02, grad_scale: 32.0 2024-09-13 22:30:55,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=35022.833333333336, ans=0.003255905797101449 2024-09-13 22:30:57,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2024-09-13 22:31:43,766 INFO [train.py:1198] (0/2) Epoch 2, batch 6000, loss[loss=0.3417, ctc_loss=0.2558, cr_loss=0.4292, over 20997.00 frames. ], tot_loss[loss=0.3651, ctc_loss=0.2791, cr_loss=0.4299, over 4095676.86 frames. ], batch size: 55, lr: 2.82e-02, grad_scale: 32.0 2024-09-13 22:31:43,768 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-13 22:32:10,257 INFO [train.py:1230] (0/2) Epoch 2, validation: loss=0.1047, ctc_loss=0.1047, cr_loss=9.365e-15, over 944034.00 frames. 2024-09-13 22:32:10,257 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-13 22:32:14,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35107.833333333336, ans=0.1 2024-09-13 22:32:31,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=35136.166666666664, ans=0.2 2024-09-13 22:32:45,561 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.538e+02 2.936e+02 3.462e+02 7.631e+02, threshold=5.872e+02, percent-clipped=1.0 2024-09-13 22:33:24,209 INFO [train.py:1198] (0/2) Epoch 2, batch 6050, loss[loss=0.3441, ctc_loss=0.2606, cr_loss=0.4174, over 20970.00 frames. ], tot_loss[loss=0.3651, ctc_loss=0.2792, cr_loss=0.4299, over 4096022.45 frames. ], batch size: 52, lr: 2.82e-02, grad_scale: 32.0 2024-09-13 22:33:30,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2024-09-13 22:33:40,084 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.25 vs. limit=6.0 2024-09-13 22:33:57,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-13 22:34:00,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=22.5 2024-09-13 22:34:26,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=12.0 2024-09-13 22:34:39,329 INFO [train.py:1198] (0/2) Epoch 2, batch 6100, loss[loss=0.4458, ctc_loss=0.3596, cr_loss=0.4309, over 14337.00 frames. ], tot_loss[loss=0.3643, ctc_loss=0.2785, cr_loss=0.4293, over 4094299.05 frames. ], batch size: 150, lr: 2.81e-02, grad_scale: 32.0 2024-09-13 22:34:48,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=35391.166666666664, ans=0.125 2024-09-13 22:35:14,677 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.160e+02 2.502e+02 2.994e+02 3.822e+02 5.911e+02, threshold=5.988e+02, percent-clipped=1.0 2024-09-13 22:35:52,642 INFO [train.py:1198] (0/2) Epoch 2, batch 6150, loss[loss=0.3775, ctc_loss=0.2883, cr_loss=0.446, over 20972.00 frames. ], tot_loss[loss=0.3679, ctc_loss=0.2814, cr_loss=0.4327, over 4082178.92 frames. ], batch size: 58, lr: 2.81e-02, grad_scale: 32.0 2024-09-13 22:35:53,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=35532.833333333336, ans=0.0 2024-09-13 22:36:01,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=35532.833333333336, ans=0.125 2024-09-13 22:36:29,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=35589.5, ans=0.125 2024-09-13 22:36:45,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=35617.833333333336, ans=0.125 2024-09-13 22:36:57,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=35646.166666666664, ans=0.025 2024-09-13 22:37:05,871 INFO [train.py:1198] (0/2) Epoch 2, batch 6200, loss[loss=0.449, ctc_loss=0.364, cr_loss=0.4251, over 14372.00 frames. ], tot_loss[loss=0.37, ctc_loss=0.2834, cr_loss=0.4331, over 4065499.66 frames. ], batch size: 150, lr: 2.81e-02, grad_scale: 32.0 2024-09-13 22:37:11,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.05 vs. limit=15.0 2024-09-13 22:37:13,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35674.5, ans=0.1 2024-09-13 22:37:14,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=35674.5, ans=0.2 2024-09-13 22:37:41,136 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.462e+02 2.815e+02 3.577e+02 5.826e+02, threshold=5.630e+02, percent-clipped=0.0 2024-09-13 22:37:42,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=35731.166666666664, ans=0.125 2024-09-13 22:37:57,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=35759.5, ans=0.2 2024-09-13 22:38:19,809 INFO [train.py:1198] (0/2) Epoch 2, batch 6250, loss[loss=0.3807, ctc_loss=0.2892, cr_loss=0.4573, over 20014.00 frames. ], tot_loss[loss=0.3713, ctc_loss=0.2847, cr_loss=0.4334, over 4049602.87 frames. ], batch size: 80, lr: 2.80e-02, grad_scale: 32.0 2024-09-13 22:38:20,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=35816.166666666664, ans=0.0 2024-09-13 22:38:40,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=35844.5, ans=0.0030772826086956527 2024-09-13 22:38:47,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=35872.833333333336, ans=0.05 2024-09-13 22:39:14,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35901.166666666664, ans=0.1 2024-09-13 22:39:23,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=35929.5, ans=0.125 2024-09-13 22:39:33,109 INFO [train.py:1198] (0/2) Epoch 2, batch 6300, loss[loss=0.3961, ctc_loss=0.3098, cr_loss=0.4312, over 20656.00 frames. ], tot_loss[loss=0.3766, ctc_loss=0.2894, cr_loss=0.4359, over 3994308.36 frames. ], batch size: 71, lr: 2.80e-02, grad_scale: 32.0 2024-09-13 22:39:49,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=35986.166666666664, ans=0.125 2024-09-13 22:40:08,011 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.583e+02 2.883e+02 3.459e+02 8.852e+02, threshold=5.766e+02, percent-clipped=1.0 2024-09-13 22:40:17,176 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=15.0 2024-09-13 22:40:25,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=36042.833333333336, ans=0.0 2024-09-13 22:40:32,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=36071.166666666664, ans=0.125 2024-09-13 22:40:45,356 INFO [train.py:1198] (0/2) Epoch 2, batch 6350, loss[loss=0.4585, ctc_loss=0.3726, cr_loss=0.4296, over 14178.00 frames. ], tot_loss[loss=0.3858, ctc_loss=0.2982, cr_loss=0.4377, over 3811574.65 frames. ], batch size: 149, lr: 2.79e-02, grad_scale: 32.0 2024-09-13 22:41:23,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=36156.166666666664, ans=0.125 2024-09-13 22:41:43,803 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-2.pt 2024-09-13 22:42:31,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=36215.666666666664, ans=0.125 2024-09-13 22:42:32,915 INFO [train.py:1198] (0/2) Epoch 3, batch 0, loss[loss=0.3499, ctc_loss=0.2712, cr_loss=0.3939, over 20960.00 frames. ], tot_loss[loss=0.3499, ctc_loss=0.2712, cr_loss=0.3939, over 20960.00 frames. ], batch size: 49, lr: 2.65e-02, grad_scale: 32.0 2024-09-13 22:42:32,921 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-13 22:42:54,040 INFO [train.py:1230] (0/2) Epoch 3, validation: loss=0.107, ctc_loss=0.107, cr_loss=1.077e-14, over 944034.00 frames. 2024-09-13 22:42:54,041 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-13 22:43:20,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=36244.0, ans=0.025 2024-09-13 22:43:20,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=36244.0, ans=0.125 2024-09-13 22:43:38,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=36300.666666666664, ans=0.0 2024-09-13 22:43:44,190 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.116e+02 2.481e+02 2.860e+02 3.412e+02 5.235e+02, threshold=5.719e+02, percent-clipped=0.0 2024-09-13 22:44:10,437 INFO [train.py:1198] (0/2) Epoch 3, batch 50, loss[loss=0.3065, ctc_loss=0.2308, cr_loss=0.3782, over 21008.00 frames. ], tot_loss[loss=0.3597, ctc_loss=0.2745, cr_loss=0.4259, over 936242.92 frames. ], batch size: 48, lr: 2.65e-02, grad_scale: 32.0 2024-09-13 22:44:22,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36357.333333333336, ans=0.1 2024-09-13 22:44:36,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=36385.666666666664, ans=0.125 2024-09-13 22:44:38,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=36385.666666666664, ans=0.125 2024-09-13 22:44:44,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=36414.0, ans=0.07 2024-09-13 22:44:49,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=22.5 2024-09-13 22:45:28,666 INFO [train.py:1198] (0/2) Epoch 3, batch 100, loss[loss=0.3245, ctc_loss=0.2485, cr_loss=0.3801, over 19948.00 frames. ], tot_loss[loss=0.3569, ctc_loss=0.272, cr_loss=0.4247, over 1639015.08 frames. ], batch size: 44, lr: 2.64e-02, grad_scale: 32.0 2024-09-13 22:45:56,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=36527.333333333336, ans=12.0 2024-09-13 22:46:13,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=36555.666666666664, ans=0.125 2024-09-13 22:46:23,404 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.307e+02 2.521e+02 3.027e+02 4.456e+02, threshold=5.041e+02, percent-clipped=0.0 2024-09-13 22:46:47,597 INFO [train.py:1198] (0/2) Epoch 3, batch 150, loss[loss=0.3961, ctc_loss=0.3017, cr_loss=0.4722, over 19404.00 frames. ], tot_loss[loss=0.3603, ctc_loss=0.2749, cr_loss=0.427, over 2163683.67 frames. ], batch size: 90, lr: 2.64e-02, grad_scale: 32.0 2024-09-13 22:47:11,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2024-09-13 22:47:24,972 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:47:42,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=36725.666666666664, ans=0.0 2024-09-13 22:47:45,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=36754.0, ans=0.125 2024-09-13 22:47:50,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=36754.0, ans=0.0028795652173913043 2024-09-13 22:48:02,142 INFO [train.py:1198] (0/2) Epoch 3, batch 200, loss[loss=0.3533, ctc_loss=0.2632, cr_loss=0.4504, over 20661.00 frames. ], tot_loss[loss=0.3615, ctc_loss=0.2758, cr_loss=0.4286, over 2579568.88 frames. ], batch size: 68, lr: 2.64e-02, grad_scale: 32.0 2024-09-13 22:48:12,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36782.333333333336, ans=0.1 2024-09-13 22:48:26,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2024-09-13 22:48:53,335 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.432e+02 2.851e+02 3.646e+02 7.141e+02, threshold=5.701e+02, percent-clipped=3.0 2024-09-13 22:48:56,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=36867.333333333336, ans=0.2 2024-09-13 22:49:07,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=36895.666666666664, ans=0.125 2024-09-13 22:49:17,349 INFO [train.py:1198] (0/2) Epoch 3, batch 250, loss[loss=0.3398, ctc_loss=0.2589, cr_loss=0.4045, over 21017.00 frames. ], tot_loss[loss=0.3613, ctc_loss=0.2751, cr_loss=0.431, over 2919940.33 frames. ], batch size: 48, lr: 2.63e-02, grad_scale: 32.0 2024-09-13 22:49:31,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=36952.333333333336, ans=0.0 2024-09-13 22:49:58,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36980.666666666664, ans=0.1 2024-09-13 22:50:00,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=36980.666666666664, ans=0.002830289855072464 2024-09-13 22:50:34,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=37037.333333333336, ans=0.125 2024-09-13 22:50:37,206 INFO [train.py:1198] (0/2) Epoch 3, batch 300, loss[loss=0.398, ctc_loss=0.3065, cr_loss=0.4577, over 20632.00 frames. ], tot_loss[loss=0.3593, ctc_loss=0.2734, cr_loss=0.4294, over 3193059.82 frames. ], batch size: 71, lr: 2.63e-02, grad_scale: 32.0 2024-09-13 22:50:38,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=37065.666666666664, ans=0.0 2024-09-13 22:50:40,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=37065.666666666664, ans=0.0 2024-09-13 22:50:43,496 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:51:02,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=37094.0, ans=0.2 2024-09-13 22:51:27,991 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.474e+02 2.870e+02 3.354e+02 5.882e+02, threshold=5.740e+02, percent-clipped=1.0 2024-09-13 22:51:28,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=37150.666666666664, ans=0.125 2024-09-13 22:51:56,170 INFO [train.py:1198] (0/2) Epoch 3, batch 350, loss[loss=0.355, ctc_loss=0.2705, cr_loss=0.4224, over 21021.00 frames. ], tot_loss[loss=0.357, ctc_loss=0.2714, cr_loss=0.428, over 3400705.55 frames. ], batch size: 61, lr: 2.62e-02, grad_scale: 32.0 2024-09-13 22:52:25,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=37264.0, ans=0.1 2024-09-13 22:53:12,717 INFO [train.py:1198] (0/2) Epoch 3, batch 400, loss[loss=0.411, ctc_loss=0.3173, cr_loss=0.468, over 20699.00 frames. ], tot_loss[loss=0.3574, ctc_loss=0.2716, cr_loss=0.4288, over 3564031.22 frames. ], batch size: 66, lr: 2.62e-02, grad_scale: 32.0 2024-09-13 22:53:19,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=37349.0, ans=0.125 2024-09-13 22:54:04,766 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.394e+02 2.667e+02 3.254e+02 6.566e+02, threshold=5.335e+02, percent-clipped=1.0 2024-09-13 22:54:08,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=37434.0, ans=0.0 2024-09-13 22:54:20,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=37462.333333333336, ans=0.125 2024-09-13 22:54:28,868 INFO [train.py:1198] (0/2) Epoch 3, batch 450, loss[loss=0.3363, ctc_loss=0.2523, cr_loss=0.4201, over 20792.00 frames. ], tot_loss[loss=0.3563, ctc_loss=0.2707, cr_loss=0.4283, over 3689364.39 frames. ], batch size: 53, lr: 2.62e-02, grad_scale: 32.0 2024-09-13 22:54:30,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=37490.666666666664, ans=0.2 2024-09-13 22:54:47,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=37519.0, ans=0.125 2024-09-13 22:55:10,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37547.333333333336, ans=0.1 2024-09-13 22:55:25,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=37575.666666666664, ans=0.002700942028985508 2024-09-13 22:55:25,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-13 22:55:44,645 INFO [train.py:1198] (0/2) Epoch 3, batch 500, loss[loss=0.3625, ctc_loss=0.2778, cr_loss=0.4236, over 19425.00 frames. ], tot_loss[loss=0.3574, ctc_loss=0.2715, cr_loss=0.4294, over 3780794.61 frames. ], batch size: 90, lr: 2.61e-02, grad_scale: 32.0 2024-09-13 22:55:46,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=37632.333333333336, ans=0.125 2024-09-13 22:56:19,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=37689.0, ans=0.1 2024-09-13 22:56:38,686 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.396e+02 2.755e+02 3.330e+02 7.145e+02, threshold=5.509e+02, percent-clipped=4.0 2024-09-13 22:56:42,298 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 22:57:02,959 INFO [train.py:1198] (0/2) Epoch 3, batch 550, loss[loss=0.3958, ctc_loss=0.3009, cr_loss=0.4744, over 21018.00 frames. ], tot_loss[loss=0.3585, ctc_loss=0.2726, cr_loss=0.4299, over 3848939.98 frames. ], batch size: 61, lr: 2.61e-02, grad_scale: 32.0 2024-09-13 22:57:21,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2024-09-13 22:57:34,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-09-13 22:57:37,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.22 vs. limit=5.0 2024-09-13 22:58:02,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=37859.0, ans=0.125 2024-09-13 22:58:15,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=22.5 2024-09-13 22:58:21,882 INFO [train.py:1198] (0/2) Epoch 3, batch 600, loss[loss=0.3447, ctc_loss=0.261, cr_loss=0.4183, over 20770.00 frames. ], tot_loss[loss=0.3575, ctc_loss=0.2717, cr_loss=0.4289, over 3907369.04 frames. ], batch size: 56, lr: 2.61e-02, grad_scale: 32.0 2024-09-13 22:58:45,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=37944.0, ans=0.125 2024-09-13 22:58:58,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=37972.333333333336, ans=0.125 2024-09-13 22:59:01,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=37972.333333333336, ans=0.125 2024-09-13 22:59:12,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=38000.666666666664, ans=0.0 2024-09-13 22:59:12,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-13 22:59:13,333 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.427e+02 2.740e+02 3.265e+02 5.783e+02, threshold=5.479e+02, percent-clipped=1.0 2024-09-13 22:59:37,822 INFO [train.py:1198] (0/2) Epoch 3, batch 650, loss[loss=0.3549, ctc_loss=0.2686, cr_loss=0.4311, over 20933.00 frames. ], tot_loss[loss=0.3572, ctc_loss=0.2715, cr_loss=0.4285, over 3947725.86 frames. ], batch size: 60, lr: 2.60e-02, grad_scale: 32.0 2024-09-13 23:00:00,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=38085.666666666664, ans=0.0 2024-09-13 23:00:06,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38114.0, ans=0.1 2024-09-13 23:00:18,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=38114.0, ans=0.125 2024-09-13 23:00:20,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=38114.0, ans=0.002583913043478261 2024-09-13 23:00:34,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=38142.333333333336, ans=0.0 2024-09-13 23:00:36,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=38170.666666666664, ans=0.125 2024-09-13 23:00:47,215 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=12.0 2024-09-13 23:00:52,653 INFO [train.py:1198] (0/2) Epoch 3, batch 700, loss[loss=0.3404, ctc_loss=0.2513, cr_loss=0.4454, over 21050.00 frames. ], tot_loss[loss=0.358, ctc_loss=0.2724, cr_loss=0.428, over 3980258.62 frames. ], batch size: 56, lr: 2.60e-02, grad_scale: 32.0 2024-09-13 23:01:25,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=38255.666666666664, ans=0.125 2024-09-13 23:01:44,258 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.506e+02 2.757e+02 3.512e+02 5.332e+02, threshold=5.515e+02, percent-clipped=0.0 2024-09-13 23:02:05,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=38312.333333333336, ans=0.125 2024-09-13 23:02:11,184 INFO [train.py:1198] (0/2) Epoch 3, batch 750, loss[loss=0.3544, ctc_loss=0.2704, cr_loss=0.4202, over 20927.00 frames. ], tot_loss[loss=0.3576, ctc_loss=0.272, cr_loss=0.4281, over 4003058.19 frames. ], batch size: 60, lr: 2.59e-02, grad_scale: 32.0 2024-09-13 23:02:20,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=38340.666666666664, ans=0.125 2024-09-13 23:02:44,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=38397.333333333336, ans=0.125 2024-09-13 23:02:49,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=38397.333333333336, ans=0.0 2024-09-13 23:03:00,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-09-13 23:03:02,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2024-09-13 23:03:11,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=38425.666666666664, ans=0.125 2024-09-13 23:03:30,713 INFO [train.py:1198] (0/2) Epoch 3, batch 800, loss[loss=0.3348, ctc_loss=0.2528, cr_loss=0.4095, over 21036.00 frames. ], tot_loss[loss=0.3573, ctc_loss=0.2716, cr_loss=0.4287, over 4034613.72 frames. ], batch size: 56, lr: 2.59e-02, grad_scale: 32.0 2024-09-13 23:03:45,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=38510.666666666664, ans=0.002497681159420291 2024-09-13 23:04:09,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=38539.0, ans=0.0 2024-09-13 23:04:22,755 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.609e+02 2.977e+02 3.655e+02 5.386e+02, threshold=5.955e+02, percent-clipped=0.0 2024-09-13 23:04:25,018 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.13 vs. limit=22.5 2024-09-13 23:04:26,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=38567.333333333336, ans=0.025 2024-09-13 23:04:27,630 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:04:39,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=38595.666666666664, ans=0.1 2024-09-13 23:04:46,695 INFO [train.py:1198] (0/2) Epoch 3, batch 850, loss[loss=0.3579, ctc_loss=0.2679, cr_loss=0.4501, over 21080.00 frames. ], tot_loss[loss=0.358, ctc_loss=0.2721, cr_loss=0.4292, over 4028982.49 frames. ], batch size: 59, lr: 2.59e-02, grad_scale: 32.0 2024-09-13 23:05:15,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=38680.666666666664, ans=0.125 2024-09-13 23:06:02,292 INFO [train.py:1198] (0/2) Epoch 3, batch 900, loss[loss=0.3274, ctc_loss=0.2467, cr_loss=0.4032, over 21035.00 frames. ], tot_loss[loss=0.3584, ctc_loss=0.2723, cr_loss=0.4307, over 4050159.27 frames. ], batch size: 62, lr: 2.58e-02, grad_scale: 32.0 2024-09-13 23:06:04,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=38765.666666666664, ans=0.0 2024-09-13 23:06:54,426 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.418e+02 2.830e+02 3.421e+02 5.289e+02, threshold=5.660e+02, percent-clipped=0.0 2024-09-13 23:06:59,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=38850.666666666664, ans=0.125 2024-09-13 23:07:14,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38879.0, ans=0.1 2024-09-13 23:07:18,713 INFO [train.py:1198] (0/2) Epoch 3, batch 950, loss[loss=0.3484, ctc_loss=0.2669, cr_loss=0.4075, over 21065.00 frames. ], tot_loss[loss=0.3573, ctc_loss=0.2715, cr_loss=0.429, over 4046789.83 frames. ], batch size: 53, lr: 2.58e-02, grad_scale: 32.0 2024-09-13 23:07:25,143 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:07:36,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=38935.666666666664, ans=0.125 2024-09-13 23:08:36,991 INFO [train.py:1198] (0/2) Epoch 3, batch 1000, loss[loss=0.3269, ctc_loss=0.2462, cr_loss=0.4035, over 21060.00 frames. ], tot_loss[loss=0.3569, ctc_loss=0.2711, cr_loss=0.4287, over 4062485.81 frames. ], batch size: 56, lr: 2.58e-02, grad_scale: 32.0 2024-09-13 23:09:05,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=39077.333333333336, ans=0.125 2024-09-13 23:09:20,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=39105.666666666664, ans=0.125 2024-09-13 23:09:29,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39134.0, ans=0.1 2024-09-13 23:09:32,050 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.397e+02 2.774e+02 3.547e+02 5.852e+02, threshold=5.548e+02, percent-clipped=2.0 2024-09-13 23:09:49,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=39162.333333333336, ans=22.5 2024-09-13 23:09:56,172 INFO [train.py:1198] (0/2) Epoch 3, batch 1050, loss[loss=0.3584, ctc_loss=0.2704, cr_loss=0.4402, over 20947.00 frames. ], tot_loss[loss=0.3576, ctc_loss=0.2717, cr_loss=0.4292, over 4062190.36 frames. ], batch size: 60, lr: 2.57e-02, grad_scale: 32.0 2024-09-13 23:10:23,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=39219.0, ans=0.1 2024-09-13 23:10:32,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=39247.333333333336, ans=0.0 2024-09-13 23:10:37,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2024-09-13 23:10:44,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=39275.666666666664, ans=0.0 2024-09-13 23:10:55,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-09-13 23:11:11,302 INFO [train.py:1198] (0/2) Epoch 3, batch 1100, loss[loss=0.3843, ctc_loss=0.2945, cr_loss=0.4489, over 21086.00 frames. ], tot_loss[loss=0.3576, ctc_loss=0.2717, cr_loss=0.4299, over 4073383.07 frames. ], batch size: 59, lr: 2.57e-02, grad_scale: 32.0 2024-09-13 23:11:59,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=39417.333333333336, ans=0.125 2024-09-13 23:12:02,534 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.365e+02 2.740e+02 3.415e+02 8.630e+02, threshold=5.481e+02, percent-clipped=5.0 2024-09-13 23:12:14,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=39445.666666666664, ans=0.125 2024-09-13 23:12:15,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=39445.666666666664, ans=0.125 2024-09-13 23:12:26,781 INFO [train.py:1198] (0/2) Epoch 3, batch 1150, loss[loss=0.3353, ctc_loss=0.2566, cr_loss=0.3939, over 20890.00 frames. ], tot_loss[loss=0.3578, ctc_loss=0.2718, cr_loss=0.4298, over 4056163.63 frames. ], batch size: 54, lr: 2.57e-02, grad_scale: 32.0 2024-09-13 23:12:37,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=39474.0, ans=0.125 2024-09-13 23:12:48,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2024-09-13 23:12:50,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=39502.333333333336, ans=0.0 2024-09-13 23:13:01,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=39530.666666666664, ans=0.025 2024-09-13 23:13:18,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2024-09-13 23:13:28,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=39587.333333333336, ans=0.2 2024-09-13 23:13:44,202 INFO [train.py:1198] (0/2) Epoch 3, batch 1200, loss[loss=0.3362, ctc_loss=0.2522, cr_loss=0.4198, over 20992.00 frames. ], tot_loss[loss=0.3572, ctc_loss=0.2713, cr_loss=0.4295, over 4065190.84 frames. ], batch size: 55, lr: 2.56e-02, grad_scale: 32.0 2024-09-13 23:13:57,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=39644.0, ans=0.125 2024-09-13 23:14:14,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=39672.333333333336, ans=0.0 2024-09-13 23:14:23,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39672.333333333336, ans=0.1 2024-09-13 23:14:38,508 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.392e+02 2.617e+02 3.564e+02 8.180e+02, threshold=5.235e+02, percent-clipped=5.0 2024-09-13 23:14:41,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=39700.666666666664, ans=0.0 2024-09-13 23:15:00,611 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2024-09-13 23:15:02,920 INFO [train.py:1198] (0/2) Epoch 3, batch 1250, loss[loss=0.32, ctc_loss=0.2409, cr_loss=0.3954, over 21060.00 frames. ], tot_loss[loss=0.3575, ctc_loss=0.2717, cr_loss=0.4288, over 4069314.99 frames. ], batch size: 56, lr: 2.56e-02, grad_scale: 32.0 2024-09-13 23:15:12,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=39757.333333333336, ans=0.125 2024-09-13 23:15:24,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=39785.666666666664, ans=0.0 2024-09-13 23:15:31,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.88 vs. limit=10.0 2024-09-13 23:15:55,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=39842.333333333336, ans=0.0022081884057971015 2024-09-13 23:16:16,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=39870.666666666664, ans=0.2 2024-09-13 23:16:19,430 INFO [train.py:1198] (0/2) Epoch 3, batch 1300, loss[loss=0.3272, ctc_loss=0.2486, cr_loss=0.3931, over 20997.00 frames. ], tot_loss[loss=0.3576, ctc_loss=0.2718, cr_loss=0.4293, over 4065146.56 frames. ], batch size: 52, lr: 2.56e-02, grad_scale: 32.0 2024-09-13 23:17:08,062 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:17:10,920 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.710e+02 3.108e+02 3.849e+02 6.117e+02, threshold=6.215e+02, percent-clipped=3.0 2024-09-13 23:17:35,230 INFO [train.py:1198] (0/2) Epoch 3, batch 1350, loss[loss=0.3035, ctc_loss=0.2245, cr_loss=0.395, over 20792.00 frames. ], tot_loss[loss=0.3551, ctc_loss=0.2695, cr_loss=0.4281, over 4073713.09 frames. ], batch size: 53, lr: 2.55e-02, grad_scale: 32.0 2024-09-13 23:18:02,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=40069.0, ans=10.0 2024-09-13 23:18:29,144 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.85 vs. limit=10.0 2024-09-13 23:18:51,041 INFO [train.py:1198] (0/2) Epoch 3, batch 1400, loss[loss=0.4163, ctc_loss=0.3264, cr_loss=0.4494, over 14577.00 frames. ], tot_loss[loss=0.3557, ctc_loss=0.2701, cr_loss=0.4281, over 4063363.72 frames. ], batch size: 149, lr: 2.55e-02, grad_scale: 32.0 2024-09-13 23:18:58,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=40182.333333333336, ans=0.1 2024-09-13 23:19:04,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=40182.333333333336, ans=0.0 2024-09-13 23:19:06,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=40182.333333333336, ans=0.0 2024-09-13 23:19:20,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=40210.666666666664, ans=0.0 2024-09-13 23:19:20,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=12.0 2024-09-13 23:19:46,060 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.428e+02 2.743e+02 3.383e+02 6.412e+02, threshold=5.486e+02, percent-clipped=1.0 2024-09-13 23:19:58,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-09-13 23:20:10,796 INFO [train.py:1198] (0/2) Epoch 3, batch 1450, loss[loss=0.377, ctc_loss=0.2845, cr_loss=0.4622, over 18179.00 frames. ], tot_loss[loss=0.3557, ctc_loss=0.27, cr_loss=0.4284, over 4075358.13 frames. ], batch size: 108, lr: 2.54e-02, grad_scale: 32.0 2024-09-13 23:21:20,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=40437.333333333336, ans=0.05 2024-09-13 23:21:29,573 INFO [train.py:1198] (0/2) Epoch 3, batch 1500, loss[loss=0.3414, ctc_loss=0.2586, cr_loss=0.4144, over 21024.00 frames. ], tot_loss[loss=0.3559, ctc_loss=0.2701, cr_loss=0.4292, over 4077606.96 frames. ], batch size: 63, lr: 2.54e-02, grad_scale: 32.0 2024-09-13 23:22:04,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=40522.333333333336, ans=0.2 2024-09-13 23:22:20,598 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.297e+02 2.781e+02 3.384e+02 6.120e+02, threshold=5.561e+02, percent-clipped=2.0 2024-09-13 23:22:44,510 INFO [train.py:1198] (0/2) Epoch 3, batch 1550, loss[loss=0.3087, ctc_loss=0.2353, cr_loss=0.3672, over 20991.00 frames. ], tot_loss[loss=0.3532, ctc_loss=0.2677, cr_loss=0.4274, over 4095971.65 frames. ], batch size: 52, lr: 2.54e-02, grad_scale: 32.0 2024-09-13 23:23:17,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=40664.0, ans=0.1 2024-09-13 23:23:55,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=40720.666666666664, ans=0.125 2024-09-13 23:23:59,568 INFO [train.py:1198] (0/2) Epoch 3, batch 1600, loss[loss=0.3673, ctc_loss=0.2781, cr_loss=0.4463, over 20830.00 frames. ], tot_loss[loss=0.3536, ctc_loss=0.268, cr_loss=0.4278, over 4095019.44 frames. ], batch size: 65, lr: 2.53e-02, grad_scale: 32.0 2024-09-13 23:24:34,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=40805.666666666664, ans=0.0 2024-09-13 23:24:53,871 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.310e+02 2.640e+02 3.137e+02 6.719e+02, threshold=5.281e+02, percent-clipped=4.0 2024-09-13 23:25:14,553 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-09-13 23:25:18,513 INFO [train.py:1198] (0/2) Epoch 3, batch 1650, loss[loss=0.3794, ctc_loss=0.2907, cr_loss=0.4434, over 20336.00 frames. ], tot_loss[loss=0.3537, ctc_loss=0.2681, cr_loss=0.428, over 4094541.50 frames. ], batch size: 74, lr: 2.53e-02, grad_scale: 32.0 2024-09-13 23:25:22,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=40890.666666666664, ans=0.2 2024-09-13 23:25:51,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=40947.333333333336, ans=0.0 2024-09-13 23:26:15,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2024-09-13 23:26:25,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=41004.0, ans=0.125 2024-09-13 23:26:37,250 INFO [train.py:1198] (0/2) Epoch 3, batch 1700, loss[loss=0.3238, ctc_loss=0.2453, cr_loss=0.3926, over 20976.00 frames. ], tot_loss[loss=0.3546, ctc_loss=0.2687, cr_loss=0.4293, over 4093527.52 frames. ], batch size: 51, lr: 2.53e-02, grad_scale: 32.0 2024-09-13 23:26:45,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=41032.333333333336, ans=0.125 2024-09-13 23:26:46,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=41032.333333333336, ans=0.125 2024-09-13 23:27:04,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=41060.666666666664, ans=0.025 2024-09-13 23:27:10,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=41089.0, ans=0.0 2024-09-13 23:27:13,707 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2024-09-13 23:27:26,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=41117.333333333336, ans=0.125 2024-09-13 23:27:27,906 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.488e+02 2.861e+02 3.723e+02 5.926e+02, threshold=5.722e+02, percent-clipped=3.0 2024-09-13 23:27:28,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=41117.333333333336, ans=0.125 2024-09-13 23:27:39,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=41145.666666666664, ans=0.0 2024-09-13 23:27:43,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=41145.666666666664, ans=0.125 2024-09-13 23:27:49,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=41145.666666666664, ans=0.2 2024-09-13 23:27:52,432 INFO [train.py:1198] (0/2) Epoch 3, batch 1750, loss[loss=0.3669, ctc_loss=0.2722, cr_loss=0.4731, over 20726.00 frames. ], tot_loss[loss=0.3544, ctc_loss=0.2686, cr_loss=0.4288, over 4084465.90 frames. ], batch size: 71, lr: 2.52e-02, grad_scale: 32.0 2024-09-13 23:28:04,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=41174.0, ans=0.125 2024-09-13 23:29:08,214 INFO [train.py:1198] (0/2) Epoch 3, batch 1800, loss[loss=0.354, ctc_loss=0.2701, cr_loss=0.4193, over 20685.00 frames. ], tot_loss[loss=0.3532, ctc_loss=0.2675, cr_loss=0.4282, over 4094205.81 frames. ], batch size: 68, lr: 2.52e-02, grad_scale: 32.0 2024-09-13 23:29:13,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=41315.666666666664, ans=0.125 2024-09-13 23:29:17,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=41315.666666666664, ans=0.125 2024-09-13 23:29:51,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41372.333333333336, ans=0.1 2024-09-13 23:29:53,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-13 23:30:00,297 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.573e+02 3.061e+02 3.744e+02 6.005e+02, threshold=6.123e+02, percent-clipped=3.0 2024-09-13 23:30:17,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41429.0, ans=0.1 2024-09-13 23:30:18,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=41429.0, ans=0.0 2024-09-13 23:30:27,776 INFO [train.py:1198] (0/2) Epoch 3, batch 1850, loss[loss=0.3015, ctc_loss=0.2274, cr_loss=0.3704, over 20986.00 frames. ], tot_loss[loss=0.3531, ctc_loss=0.2674, cr_loss=0.4281, over 4086247.42 frames. ], batch size: 51, lr: 2.52e-02, grad_scale: 32.0 2024-09-13 23:31:02,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=41514.0, ans=0.0 2024-09-13 23:31:10,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=41514.0, ans=0.2 2024-09-13 23:31:21,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=22.5 2024-09-13 23:31:23,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=41542.333333333336, ans=0.125 2024-09-13 23:31:27,460 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=15.0 2024-09-13 23:31:46,921 INFO [train.py:1198] (0/2) Epoch 3, batch 1900, loss[loss=0.2833, ctc_loss=0.2079, cr_loss=0.3769, over 21062.00 frames. ], tot_loss[loss=0.3523, ctc_loss=0.2667, cr_loss=0.4278, over 4089281.57 frames. ], batch size: 53, lr: 2.51e-02, grad_scale: 32.0 2024-09-13 23:32:20,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=41655.666666666664, ans=0.04949747468305833 2024-09-13 23:32:26,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=41655.666666666664, ans=0.05 2024-09-13 23:32:38,077 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.418e+02 2.994e+02 3.662e+02 6.504e+02, threshold=5.988e+02, percent-clipped=1.0 2024-09-13 23:33:02,497 INFO [train.py:1198] (0/2) Epoch 3, batch 1950, loss[loss=0.3092, ctc_loss=0.2278, cr_loss=0.407, over 20875.00 frames. ], tot_loss[loss=0.3513, ctc_loss=0.266, cr_loss=0.4265, over 4091781.44 frames. ], batch size: 57, lr: 2.51e-02, grad_scale: 32.0 2024-09-13 23:33:32,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=41797.333333333336, ans=0.125 2024-09-13 23:34:03,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=41854.0, ans=0.05 2024-09-13 23:34:09,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=41854.0, ans=15.0 2024-09-13 23:34:16,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-13 23:34:17,743 INFO [train.py:1198] (0/2) Epoch 3, batch 2000, loss[loss=0.3584, ctc_loss=0.2697, cr_loss=0.4432, over 21008.00 frames. ], tot_loss[loss=0.3512, ctc_loss=0.2659, cr_loss=0.4267, over 4091128.73 frames. ], batch size: 63, lr: 2.51e-02, grad_scale: 32.0 2024-09-13 23:34:48,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=41939.0, ans=0.0 2024-09-13 23:35:09,862 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.358e+02 2.581e+02 2.979e+02 5.095e+02, threshold=5.161e+02, percent-clipped=0.0 2024-09-13 23:35:10,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41967.333333333336, ans=0.1 2024-09-13 23:35:30,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=41995.666666666664, ans=0.2 2024-09-13 23:35:33,509 INFO [train.py:1198] (0/2) Epoch 3, batch 2050, loss[loss=0.318, ctc_loss=0.2381, cr_loss=0.3993, over 20790.00 frames. ], tot_loss[loss=0.3528, ctc_loss=0.2672, cr_loss=0.4279, over 4082083.84 frames. ], batch size: 53, lr: 2.50e-02, grad_scale: 32.0 2024-09-13 23:35:52,126 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:36:50,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=42165.666666666664, ans=0.95 2024-09-13 23:36:51,974 INFO [train.py:1198] (0/2) Epoch 3, batch 2100, loss[loss=0.3638, ctc_loss=0.2757, cr_loss=0.4409, over 20964.00 frames. ], tot_loss[loss=0.3527, ctc_loss=0.267, cr_loss=0.4283, over 4091503.60 frames. ], batch size: 64, lr: 2.50e-02, grad_scale: 64.0 2024-09-13 23:36:53,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=42165.666666666664, ans=0.2 2024-09-13 23:37:13,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2024-09-13 23:37:40,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=12.0 2024-09-13 23:37:41,215 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:37:42,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=42250.666666666664, ans=0.025 2024-09-13 23:37:46,661 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.498e+02 2.992e+02 3.605e+02 7.205e+02, threshold=5.983e+02, percent-clipped=5.0 2024-09-13 23:37:50,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=42250.666666666664, ans=0.125 2024-09-13 23:38:10,910 INFO [train.py:1198] (0/2) Epoch 3, batch 2150, loss[loss=0.3134, ctc_loss=0.2363, cr_loss=0.3857, over 20332.00 frames. ], tot_loss[loss=0.3514, ctc_loss=0.2659, cr_loss=0.4278, over 4097721.55 frames. ], batch size: 45, lr: 2.50e-02, grad_scale: 64.0 2024-09-13 23:38:23,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=42307.333333333336, ans=0.125 2024-09-13 23:38:29,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=42335.666666666664, ans=0.07 2024-09-13 23:39:13,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=42420.666666666664, ans=0.125 2024-09-13 23:39:16,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=42420.666666666664, ans=0.0 2024-09-13 23:39:27,157 INFO [train.py:1198] (0/2) Epoch 3, batch 2200, loss[loss=0.4173, ctc_loss=0.3213, cr_loss=0.4799, over 18598.00 frames. ], tot_loss[loss=0.3501, ctc_loss=0.2648, cr_loss=0.4264, over 4106409.71 frames. ], batch size: 108, lr: 2.49e-02, grad_scale: 16.0 2024-09-13 23:39:33,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=42449.0, ans=0.125 2024-09-13 23:39:50,451 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:40:04,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=42505.666666666664, ans=0.035 2024-09-13 23:40:19,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=42534.0, ans=0.125 2024-09-13 23:40:22,341 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.395e+02 2.642e+02 3.226e+02 6.726e+02, threshold=5.285e+02, percent-clipped=3.0 2024-09-13 23:40:43,301 INFO [train.py:1198] (0/2) Epoch 3, batch 2250, loss[loss=0.3684, ctc_loss=0.2812, cr_loss=0.4359, over 20965.00 frames. ], tot_loss[loss=0.3484, ctc_loss=0.2634, cr_loss=0.4252, over 4108055.51 frames. ], batch size: 64, lr: 2.49e-02, grad_scale: 16.0 2024-09-13 23:41:30,712 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2024-09-13 23:41:45,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=42704.0, ans=0.001586086956521739 2024-09-13 23:41:58,911 INFO [train.py:1198] (0/2) Epoch 3, batch 2300, loss[loss=0.3208, ctc_loss=0.2338, cr_loss=0.435, over 21081.00 frames. ], tot_loss[loss=0.349, ctc_loss=0.2637, cr_loss=0.4264, over 4116468.24 frames. ], batch size: 53, lr: 2.49e-02, grad_scale: 16.0 2024-09-13 23:42:19,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=42760.666666666664, ans=0.125 2024-09-13 23:42:41,654 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:42:56,076 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.338e+02 2.519e+02 3.097e+02 4.952e+02, threshold=5.037e+02, percent-clipped=0.0 2024-09-13 23:43:05,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=42845.666666666664, ans=0.125 2024-09-13 23:43:13,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=22.5 2024-09-13 23:43:15,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=42845.666666666664, ans=0.125 2024-09-13 23:43:19,973 INFO [train.py:1198] (0/2) Epoch 3, batch 2350, loss[loss=0.4009, ctc_loss=0.308, cr_loss=0.4647, over 19291.00 frames. ], tot_loss[loss=0.3488, ctc_loss=0.2637, cr_loss=0.4258, over 4124051.35 frames. ], batch size: 90, lr: 2.48e-02, grad_scale: 16.0 2024-09-13 23:44:12,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=42959.0, ans=0.0 2024-09-13 23:44:13,260 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-09-13 23:44:21,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=42987.333333333336, ans=0.0 2024-09-13 23:44:35,186 INFO [train.py:1198] (0/2) Epoch 3, batch 2400, loss[loss=0.3334, ctc_loss=0.25, cr_loss=0.4171, over 20943.00 frames. ], tot_loss[loss=0.3486, ctc_loss=0.2634, cr_loss=0.4262, over 4124382.28 frames. ], batch size: 49, lr: 2.48e-02, grad_scale: 32.0 2024-09-13 23:45:09,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=43072.333333333336, ans=0.2 2024-09-13 23:45:20,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2024-09-13 23:45:29,085 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.391e+02 2.830e+02 3.681e+02 8.924e+02, threshold=5.660e+02, percent-clipped=3.0 2024-09-13 23:45:40,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=43129.0, ans=0.125 2024-09-13 23:45:50,509 INFO [train.py:1198] (0/2) Epoch 3, batch 2450, loss[loss=0.3386, ctc_loss=0.2578, cr_loss=0.4041, over 20677.00 frames. ], tot_loss[loss=0.3504, ctc_loss=0.2649, cr_loss=0.4274, over 4113209.73 frames. ], batch size: 71, lr: 2.48e-02, grad_scale: 32.0 2024-09-13 23:46:26,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43214.0, ans=0.1 2024-09-13 23:46:42,538 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.92 vs. limit=15.0 2024-09-13 23:47:05,288 INFO [train.py:1198] (0/2) Epoch 3, batch 2500, loss[loss=0.2875, ctc_loss=0.2176, cr_loss=0.3495, over 20936.00 frames. ], tot_loss[loss=0.3493, ctc_loss=0.2642, cr_loss=0.4255, over 4103283.58 frames. ], batch size: 49, lr: 2.47e-02, grad_scale: 32.0 2024-09-13 23:47:05,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43299.0, ans=0.1 2024-09-13 23:48:02,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.453e+02 3.011e+02 3.504e+02 5.504e+02, threshold=6.022e+02, percent-clipped=0.0 2024-09-13 23:48:23,056 INFO [train.py:1198] (0/2) Epoch 3, batch 2550, loss[loss=0.3635, ctc_loss=0.2786, cr_loss=0.4248, over 20967.00 frames. ], tot_loss[loss=0.3485, ctc_loss=0.2635, cr_loss=0.4251, over 4107460.65 frames. ], batch size: 64, lr: 2.47e-02, grad_scale: 32.0 2024-09-13 23:48:43,496 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:49:10,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=43525.666666666664, ans=0.125 2024-09-13 23:49:41,878 INFO [train.py:1198] (0/2) Epoch 3, batch 2600, loss[loss=0.3895, ctc_loss=0.2967, cr_loss=0.4639, over 19408.00 frames. ], tot_loss[loss=0.347, ctc_loss=0.2621, cr_loss=0.4248, over 4111061.01 frames. ], batch size: 90, lr: 2.47e-02, grad_scale: 32.0 2024-09-13 23:50:01,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=43610.666666666664, ans=0.025 2024-09-13 23:50:01,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=43610.666666666664, ans=0.1 2024-09-13 23:50:18,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=43639.0, ans=0.125 2024-09-13 23:50:26,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-09-13 23:50:35,643 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.359e+02 2.647e+02 3.183e+02 6.616e+02, threshold=5.295e+02, percent-clipped=1.0 2024-09-13 23:50:56,896 INFO [train.py:1198] (0/2) Epoch 3, batch 2650, loss[loss=0.3489, ctc_loss=0.2589, cr_loss=0.45, over 21044.00 frames. ], tot_loss[loss=0.3489, ctc_loss=0.2636, cr_loss=0.4265, over 4103998.73 frames. ], batch size: 62, lr: 2.46e-02, grad_scale: 32.0 2024-09-13 23:51:06,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=43724.0, ans=0.125 2024-09-13 23:51:22,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-09-13 23:51:48,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=43809.0, ans=15.0 2024-09-13 23:52:12,180 INFO [train.py:1198] (0/2) Epoch 3, batch 2700, loss[loss=0.3627, ctc_loss=0.275, cr_loss=0.4385, over 20835.00 frames. ], tot_loss[loss=0.3483, ctc_loss=0.2632, cr_loss=0.4258, over 4107965.85 frames. ], batch size: 65, lr: 2.46e-02, grad_scale: 32.0 2024-09-13 23:52:35,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-09-13 23:53:06,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.12 vs. limit=5.0 2024-09-13 23:53:06,399 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.329e+02 2.737e+02 3.277e+02 1.023e+03, threshold=5.475e+02, percent-clipped=1.0 2024-09-13 23:53:19,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.08 vs. limit=15.0 2024-09-13 23:53:27,733 INFO [train.py:1198] (0/2) Epoch 3, batch 2750, loss[loss=0.3942, ctc_loss=0.2998, cr_loss=0.4718, over 20670.00 frames. ], tot_loss[loss=0.35, ctc_loss=0.2646, cr_loss=0.4273, over 4103173.06 frames. ], batch size: 66, lr: 2.46e-02, grad_scale: 32.0 2024-09-13 23:54:18,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2024-09-13 23:54:32,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=44120.666666666664, ans=0.125 2024-09-13 23:54:49,033 INFO [train.py:1198] (0/2) Epoch 3, batch 2800, loss[loss=0.3447, ctc_loss=0.2591, cr_loss=0.4281, over 20767.00 frames. ], tot_loss[loss=0.3491, ctc_loss=0.2638, cr_loss=0.4268, over 4107104.32 frames. ], batch size: 56, lr: 2.45e-02, grad_scale: 32.0 2024-09-13 23:55:43,113 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.473e+02 2.845e+02 3.511e+02 5.408e+02, threshold=5.690e+02, percent-clipped=0.0 2024-09-13 23:55:48,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2024-09-13 23:55:52,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=44262.333333333336, ans=0.0 2024-09-13 23:56:04,549 INFO [train.py:1198] (0/2) Epoch 3, batch 2850, loss[loss=0.3257, ctc_loss=0.246, cr_loss=0.3989, over 20779.00 frames. ], tot_loss[loss=0.3491, ctc_loss=0.2636, cr_loss=0.4272, over 4103227.13 frames. ], batch size: 53, lr: 2.45e-02, grad_scale: 32.0 2024-09-13 23:56:25,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=15.0 2024-09-13 23:56:31,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=44319.0, ans=0.001235 2024-09-13 23:56:36,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=44347.333333333336, ans=0.125 2024-09-13 23:56:49,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=44375.666666666664, ans=0.025 2024-09-13 23:56:51,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=44375.666666666664, ans=0.125 2024-09-13 23:57:02,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-13 23:57:19,503 INFO [train.py:1198] (0/2) Epoch 3, batch 2900, loss[loss=0.3726, ctc_loss=0.284, cr_loss=0.4427, over 20713.00 frames. ], tot_loss[loss=0.3502, ctc_loss=0.2645, cr_loss=0.4283, over 4099182.92 frames. ], batch size: 71, lr: 2.45e-02, grad_scale: 32.0 2024-09-13 23:58:05,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=44517.333333333336, ans=0.125 2024-09-13 23:58:13,695 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.437e+02 2.698e+02 3.177e+02 4.737e+02, threshold=5.396e+02, percent-clipped=0.0 2024-09-13 23:58:30,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=44545.666666666664, ans=0.125 2024-09-13 23:58:34,835 INFO [train.py:1198] (0/2) Epoch 3, batch 2950, loss[loss=0.4044, ctc_loss=0.3069, cr_loss=0.4876, over 19443.00 frames. ], tot_loss[loss=0.3492, ctc_loss=0.2637, cr_loss=0.4275, over 4094495.21 frames. ], batch size: 90, lr: 2.44e-02, grad_scale: 32.0 2024-09-13 23:58:37,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2024-09-13 23:58:48,932 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-13 23:58:51,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=44602.333333333336, ans=0.0 2024-09-13 23:58:52,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=12.0 2024-09-13 23:59:05,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=44630.666666666664, ans=0.125 2024-09-13 23:59:12,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-09-13 23:59:17,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=44630.666666666664, ans=0.125 2024-09-13 23:59:33,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=44659.0, ans=0.025 2024-09-13 23:59:48,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=44687.333333333336, ans=0.0 2024-09-13 23:59:54,101 INFO [train.py:1198] (0/2) Epoch 3, batch 3000, loss[loss=0.2896, ctc_loss=0.2098, cr_loss=0.3992, over 20962.00 frames. ], tot_loss[loss=0.3503, ctc_loss=0.2648, cr_loss=0.4274, over 4084077.15 frames. ], batch size: 50, lr: 2.44e-02, grad_scale: 32.0 2024-09-13 23:59:54,102 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 00:00:14,193 INFO [train.py:1230] (0/2) Epoch 3, validation: loss=0.09133, ctc_loss=0.09133, cr_loss=9.464e-15, over 944034.00 frames. 2024-09-14 00:00:14,193 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 00:00:22,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=22.5 2024-09-14 00:00:23,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=44715.666666666664, ans=0.2 2024-09-14 00:00:31,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-09-14 00:01:08,305 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.615e+02 2.994e+02 3.669e+02 6.655e+02, threshold=5.989e+02, percent-clipped=5.0 2024-09-14 00:01:29,339 INFO [train.py:1198] (0/2) Epoch 3, batch 3050, loss[loss=0.368, ctc_loss=0.279, cr_loss=0.445, over 20975.00 frames. ], tot_loss[loss=0.3509, ctc_loss=0.2654, cr_loss=0.4276, over 4080020.43 frames. ], batch size: 58, lr: 2.44e-02, grad_scale: 32.0 2024-09-14 00:01:38,022 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.67 vs. limit=10.0 2024-09-14 00:01:43,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=44885.666666666664, ans=0.0011118115942028994 2024-09-14 00:01:46,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44885.666666666664, ans=0.1 2024-09-14 00:02:05,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=44914.0, ans=0.125 2024-09-14 00:02:13,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=44942.333333333336, ans=0.1 2024-09-14 00:02:26,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=44942.333333333336, ans=0.07 2024-09-14 00:02:37,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=44970.666666666664, ans=0.125 2024-09-14 00:02:40,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=44970.666666666664, ans=0.1 2024-09-14 00:02:44,436 INFO [train.py:1198] (0/2) Epoch 3, batch 3100, loss[loss=0.3373, ctc_loss=0.254, cr_loss=0.4167, over 20862.00 frames. ], tot_loss[loss=0.3488, ctc_loss=0.2635, cr_loss=0.4264, over 4084969.99 frames. ], batch size: 65, lr: 2.44e-02, grad_scale: 32.0 2024-09-14 00:03:00,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=45027.333333333336, ans=0.125 2024-09-14 00:03:22,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=45055.666666666664, ans=0.0 2024-09-14 00:03:26,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=12.0 2024-09-14 00:03:38,684 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.421e+02 2.946e+02 3.449e+02 6.819e+02, threshold=5.892e+02, percent-clipped=1.0 2024-09-14 00:03:59,642 INFO [train.py:1198] (0/2) Epoch 3, batch 3150, loss[loss=0.368, ctc_loss=0.2784, cr_loss=0.4478, over 20299.00 frames. ], tot_loss[loss=0.35, ctc_loss=0.2644, cr_loss=0.428, over 4081436.46 frames. ], batch size: 74, lr: 2.43e-02, grad_scale: 32.0 2024-09-14 00:04:01,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=45140.666666666664, ans=0.5 2024-09-14 00:05:03,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=45254.0, ans=0.125 2024-09-14 00:05:18,117 INFO [train.py:1198] (0/2) Epoch 3, batch 3200, loss[loss=0.3635, ctc_loss=0.2752, cr_loss=0.4416, over 21012.00 frames. ], tot_loss[loss=0.3502, ctc_loss=0.2646, cr_loss=0.428, over 4083607.41 frames. ], batch size: 63, lr: 2.43e-02, grad_scale: 32.0 2024-09-14 00:05:21,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45282.333333333336, ans=0.1 2024-09-14 00:05:46,624 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-16000.pt 2024-09-14 00:06:08,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=45367.333333333336, ans=0.125 2024-09-14 00:06:15,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=45367.333333333336, ans=0.125 2024-09-14 00:06:16,198 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.452e+02 2.785e+02 3.417e+02 6.516e+02, threshold=5.570e+02, percent-clipped=1.0 2024-09-14 00:06:37,167 INFO [train.py:1198] (0/2) Epoch 3, batch 3250, loss[loss=0.3969, ctc_loss=0.3028, cr_loss=0.4704, over 20727.00 frames. ], tot_loss[loss=0.3506, ctc_loss=0.2649, cr_loss=0.4288, over 4090378.12 frames. ], batch size: 71, lr: 2.43e-02, grad_scale: 32.0 2024-09-14 00:06:47,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=45424.0, ans=0.125 2024-09-14 00:06:52,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=45452.333333333336, ans=0.0009886231884057971 2024-09-14 00:07:17,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=45480.666666666664, ans=0.125 2024-09-14 00:07:21,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=45509.0, ans=0.125 2024-09-14 00:07:21,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=45509.0, ans=0.125 2024-09-14 00:07:28,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45509.0, ans=0.1 2024-09-14 00:07:30,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=45509.0, ans=0.125 2024-09-14 00:07:52,700 INFO [train.py:1198] (0/2) Epoch 3, batch 3300, loss[loss=0.3182, ctc_loss=0.2435, cr_loss=0.3734, over 19961.00 frames. ], tot_loss[loss=0.348, ctc_loss=0.2625, cr_loss=0.4272, over 4100881.06 frames. ], batch size: 44, lr: 2.42e-02, grad_scale: 32.0 2024-09-14 00:08:05,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.13 vs. limit=12.0 2024-09-14 00:08:11,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=45594.0, ans=0.0 2024-09-14 00:08:15,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=45594.0, ans=0.125 2024-09-14 00:08:33,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=22.5 2024-09-14 00:08:46,857 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.285e+02 2.532e+02 2.921e+02 4.809e+02, threshold=5.064e+02, percent-clipped=0.0 2024-09-14 00:08:47,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=45650.666666666664, ans=0.0 2024-09-14 00:09:07,793 INFO [train.py:1198] (0/2) Epoch 3, batch 3350, loss[loss=0.3691, ctc_loss=0.2845, cr_loss=0.4228, over 20988.00 frames. ], tot_loss[loss=0.3489, ctc_loss=0.2635, cr_loss=0.4268, over 4086697.36 frames. ], batch size: 63, lr: 2.42e-02, grad_scale: 32.0 2024-09-14 00:09:20,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2024-09-14 00:09:32,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=22.5 2024-09-14 00:09:37,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=45764.0, ans=0.125 2024-09-14 00:09:47,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45764.0, ans=0.1 2024-09-14 00:09:52,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.18 vs. limit=12.0 2024-09-14 00:09:53,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2024-09-14 00:10:08,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=45820.666666666664, ans=0.125 2024-09-14 00:10:09,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=45820.666666666664, ans=0.0009085507246376825 2024-09-14 00:10:23,221 INFO [train.py:1198] (0/2) Epoch 3, batch 3400, loss[loss=0.3851, ctc_loss=0.2909, cr_loss=0.4712, over 20982.00 frames. ], tot_loss[loss=0.3477, ctc_loss=0.2625, cr_loss=0.4262, over 4096777.42 frames. ], batch size: 64, lr: 2.42e-02, grad_scale: 32.0 2024-09-14 00:10:41,393 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-09-14 00:10:45,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45877.333333333336, ans=0.1 2024-09-14 00:10:47,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=45877.333333333336, ans=0.00089623188405797 2024-09-14 00:11:04,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=45905.666666666664, ans=0.125 2024-09-14 00:11:09,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=45934.0, ans=0.025 2024-09-14 00:11:22,864 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.330e+02 2.613e+02 3.141e+02 5.836e+02, threshold=5.227e+02, percent-clipped=3.0 2024-09-14 00:11:44,121 INFO [train.py:1198] (0/2) Epoch 3, batch 3450, loss[loss=0.375, ctc_loss=0.2816, cr_loss=0.4672, over 20864.00 frames. ], tot_loss[loss=0.3483, ctc_loss=0.2628, cr_loss=0.4275, over 4103748.16 frames. ], batch size: 65, lr: 2.41e-02, grad_scale: 32.0 2024-09-14 00:11:44,743 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-14 00:12:53,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=46104.0, ans=0.0 2024-09-14 00:12:53,895 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2024-09-14 00:12:59,159 INFO [train.py:1198] (0/2) Epoch 3, batch 3500, loss[loss=0.2826, ctc_loss=0.2098, cr_loss=0.364, over 20970.00 frames. ], tot_loss[loss=0.3487, ctc_loss=0.2631, cr_loss=0.4281, over 4103726.31 frames. ], batch size: 51, lr: 2.41e-02, grad_scale: 32.0 2024-09-14 00:13:49,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=46217.333333333336, ans=0.0 2024-09-14 00:13:53,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.367e+02 2.706e+02 3.266e+02 5.266e+02, threshold=5.411e+02, percent-clipped=2.0 2024-09-14 00:14:01,904 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-09-14 00:14:14,913 INFO [train.py:1198] (0/2) Epoch 3, batch 3550, loss[loss=0.3526, ctc_loss=0.2667, cr_loss=0.4295, over 21055.00 frames. ], tot_loss[loss=0.3479, ctc_loss=0.2624, cr_loss=0.4274, over 4104759.25 frames. ], batch size: 62, lr: 2.41e-02, grad_scale: 32.0 2024-09-14 00:14:30,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=12.0 2024-09-14 00:14:30,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.00 vs. limit=15.0 2024-09-14 00:14:31,944 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=12.0 2024-09-14 00:14:43,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=46330.666666666664, ans=0.025 2024-09-14 00:15:07,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=46359.0, ans=0.0 2024-09-14 00:15:29,883 INFO [train.py:1198] (0/2) Epoch 3, batch 3600, loss[loss=0.3625, ctc_loss=0.2756, cr_loss=0.4345, over 20872.00 frames. ], tot_loss[loss=0.3488, ctc_loss=0.2632, cr_loss=0.4279, over 4096358.59 frames. ], batch size: 57, lr: 2.40e-02, grad_scale: 32.0 2024-09-14 00:16:13,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=46472.333333333336, ans=0.0007668840579710141 2024-09-14 00:16:26,765 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.054e+02 2.371e+02 2.596e+02 2.997e+02 4.705e+02, threshold=5.193e+02, percent-clipped=0.0 2024-09-14 00:16:35,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-14 00:16:40,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=46529.0, ans=0.125 2024-09-14 00:16:50,489 INFO [train.py:1198] (0/2) Epoch 3, batch 3650, loss[loss=0.3275, ctc_loss=0.2456, cr_loss=0.4095, over 20786.00 frames. ], tot_loss[loss=0.3486, ctc_loss=0.2629, cr_loss=0.4283, over 4099595.28 frames. ], batch size: 53, lr: 2.40e-02, grad_scale: 32.0 2024-09-14 00:16:58,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=46557.333333333336, ans=0.2 2024-09-14 00:17:36,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=46642.333333333336, ans=0.0007299275362318836 2024-09-14 00:17:54,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=46670.666666666664, ans=0.1 2024-09-14 00:17:58,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=46670.666666666664, ans=0.0007237681159420299 2024-09-14 00:17:59,718 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-09-14 00:18:06,431 INFO [train.py:1198] (0/2) Epoch 3, batch 3700, loss[loss=0.3742, ctc_loss=0.2751, cr_loss=0.4958, over 20722.00 frames. ], tot_loss[loss=0.3479, ctc_loss=0.2622, cr_loss=0.4287, over 4106687.26 frames. ], batch size: 68, lr: 2.40e-02, grad_scale: 32.0 2024-09-14 00:18:57,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=46784.0, ans=0.2 2024-09-14 00:19:01,351 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.546e+02 2.834e+02 3.481e+02 5.195e+02, threshold=5.668e+02, percent-clipped=1.0 2024-09-14 00:19:22,637 INFO [train.py:1198] (0/2) Epoch 3, batch 3750, loss[loss=0.4116, ctc_loss=0.3167, cr_loss=0.474, over 18232.00 frames. ], tot_loss[loss=0.3475, ctc_loss=0.2618, cr_loss=0.4283, over 4113404.38 frames. ], batch size: 108, lr: 2.40e-02, grad_scale: 32.0 2024-09-14 00:19:24,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=46840.666666666664, ans=0.0 2024-09-14 00:19:39,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2024-09-14 00:19:55,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=46897.333333333336, ans=0.04949747468305833 2024-09-14 00:20:27,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=46954.0, ans=0.125 2024-09-14 00:20:37,760 INFO [train.py:1198] (0/2) Epoch 3, batch 3800, loss[loss=0.4278, ctc_loss=0.3423, cr_loss=0.4275, over 13832.00 frames. ], tot_loss[loss=0.3467, ctc_loss=0.2613, cr_loss=0.4269, over 4096013.83 frames. ], batch size: 150, lr: 2.39e-02, grad_scale: 32.0 2024-09-14 00:20:54,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=47010.666666666664, ans=0.125 2024-09-14 00:20:59,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=47010.666666666664, ans=0.5 2024-09-14 00:21:31,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 2.350e+02 2.718e+02 3.478e+02 5.662e+02, threshold=5.437e+02, percent-clipped=0.0 2024-09-14 00:21:33,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=47067.333333333336, ans=0.025 2024-09-14 00:21:55,832 INFO [train.py:1198] (0/2) Epoch 3, batch 3850, loss[loss=0.3159, ctc_loss=0.238, cr_loss=0.3895, over 20997.00 frames. ], tot_loss[loss=0.3455, ctc_loss=0.2604, cr_loss=0.4259, over 4103746.77 frames. ], batch size: 52, lr: 2.39e-02, grad_scale: 32.0 2024-09-14 00:22:32,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=47180.666666666664, ans=0.2 2024-09-14 00:23:14,315 INFO [train.py:1198] (0/2) Epoch 3, batch 3900, loss[loss=0.3367, ctc_loss=0.2574, cr_loss=0.3964, over 20965.00 frames. ], tot_loss[loss=0.3456, ctc_loss=0.2603, cr_loss=0.4264, over 4107370.84 frames. ], batch size: 51, lr: 2.39e-02, grad_scale: 32.0 2024-09-14 00:23:46,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=47322.333333333336, ans=0.125 2024-09-14 00:23:49,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=47322.333333333336, ans=0.125 2024-09-14 00:23:54,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47322.333333333336, ans=0.1 2024-09-14 00:23:58,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47350.666666666664, ans=0.1 2024-09-14 00:24:08,727 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.288e+02 2.504e+02 2.863e+02 3.836e+02, threshold=5.007e+02, percent-clipped=0.0 2024-09-14 00:24:29,638 INFO [train.py:1198] (0/2) Epoch 3, batch 3950, loss[loss=0.2976, ctc_loss=0.2174, cr_loss=0.401, over 20978.00 frames. ], tot_loss[loss=0.3471, ctc_loss=0.2616, cr_loss=0.4274, over 4095360.08 frames. ], batch size: 50, lr: 2.38e-02, grad_scale: 32.0 2024-09-14 00:24:55,545 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.99 vs. limit=10.0 2024-09-14 00:25:10,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=47464.0, ans=0.025 2024-09-14 00:25:19,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=47492.333333333336, ans=0.125 2024-09-14 00:25:44,700 INFO [train.py:1198] (0/2) Epoch 3, batch 4000, loss[loss=0.3725, ctc_loss=0.2731, cr_loss=0.4969, over 20671.00 frames. ], tot_loss[loss=0.3475, ctc_loss=0.2619, cr_loss=0.4279, over 4085827.21 frames. ], batch size: 68, lr: 2.38e-02, grad_scale: 32.0 2024-09-14 00:25:46,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47549.0, ans=0.1 2024-09-14 00:26:40,355 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.380e+02 2.715e+02 3.768e+02 6.856e+02, threshold=5.431e+02, percent-clipped=7.0 2024-09-14 00:26:42,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=47634.0, ans=0.125 2024-09-14 00:26:59,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.48 vs. limit=10.0 2024-09-14 00:27:00,138 INFO [train.py:1198] (0/2) Epoch 3, batch 4050, loss[loss=0.3145, ctc_loss=0.2357, cr_loss=0.394, over 20961.00 frames. ], tot_loss[loss=0.3455, ctc_loss=0.2603, cr_loss=0.4263, over 4077834.82 frames. ], batch size: 49, lr: 2.38e-02, grad_scale: 32.0 2024-09-14 00:27:26,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=47719.0, ans=0.0 2024-09-14 00:27:26,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=47719.0, ans=0.0 2024-09-14 00:27:35,560 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=12.0 2024-09-14 00:27:45,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=47747.333333333336, ans=0.125 2024-09-14 00:27:47,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=47775.666666666664, ans=0.125 2024-09-14 00:27:51,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=47775.666666666664, ans=0.0 2024-09-14 00:28:21,480 INFO [train.py:1198] (0/2) Epoch 3, batch 4100, loss[loss=0.3202, ctc_loss=0.2359, cr_loss=0.4211, over 20776.00 frames. ], tot_loss[loss=0.3438, ctc_loss=0.2588, cr_loss=0.4247, over 4083431.16 frames. ], batch size: 53, lr: 2.37e-02, grad_scale: 32.0 2024-09-14 00:28:43,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-14 00:28:44,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=47860.666666666664, ans=0.0 2024-09-14 00:28:54,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=47889.0, ans=0.125 2024-09-14 00:28:59,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47889.0, ans=0.1 2024-09-14 00:29:16,806 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.316e+02 2.497e+02 2.917e+02 4.511e+02, threshold=4.994e+02, percent-clipped=0.0 2024-09-14 00:29:36,585 INFO [train.py:1198] (0/2) Epoch 3, batch 4150, loss[loss=0.3519, ctc_loss=0.2593, cr_loss=0.463, over 21007.00 frames. ], tot_loss[loss=0.3454, ctc_loss=0.2601, cr_loss=0.4265, over 4072257.89 frames. ], batch size: 63, lr: 2.37e-02, grad_scale: 32.0 2024-09-14 00:30:08,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=48030.666666666664, ans=0.2 2024-09-14 00:30:24,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=48059.0, ans=0.1 2024-09-14 00:30:33,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=48059.0, ans=0.125 2024-09-14 00:30:51,557 INFO [train.py:1198] (0/2) Epoch 3, batch 4200, loss[loss=0.3436, ctc_loss=0.256, cr_loss=0.4379, over 21033.00 frames. ], tot_loss[loss=0.3464, ctc_loss=0.2608, cr_loss=0.4279, over 4067302.99 frames. ], batch size: 62, lr: 2.37e-02, grad_scale: 32.0 2024-09-14 00:30:53,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48115.666666666664, ans=0.1 2024-09-14 00:31:03,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=48115.666666666664, ans=0.0004096376811594214 2024-09-14 00:31:17,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48144.0, ans=0.1 2024-09-14 00:31:25,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=48172.333333333336, ans=0.125 2024-09-14 00:31:38,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=48200.666666666664, ans=0.125 2024-09-14 00:31:47,396 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.438e+02 2.954e+02 3.531e+02 6.549e+02, threshold=5.908e+02, percent-clipped=3.0 2024-09-14 00:32:06,723 INFO [train.py:1198] (0/2) Epoch 3, batch 4250, loss[loss=0.3708, ctc_loss=0.2794, cr_loss=0.4572, over 20854.00 frames. ], tot_loss[loss=0.3465, ctc_loss=0.2609, cr_loss=0.4282, over 4075133.76 frames. ], batch size: 65, lr: 2.37e-02, grad_scale: 32.0 2024-09-14 00:32:10,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=48257.333333333336, ans=0.125 2024-09-14 00:32:28,876 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=15.0 2024-09-14 00:32:41,170 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=22.5 2024-09-14 00:32:45,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=15.0 2024-09-14 00:32:55,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2024-09-14 00:33:25,004 INFO [train.py:1198] (0/2) Epoch 3, batch 4300, loss[loss=0.3299, ctc_loss=0.2475, cr_loss=0.4124, over 20966.00 frames. ], tot_loss[loss=0.3465, ctc_loss=0.2609, cr_loss=0.428, over 4080826.91 frames. ], batch size: 51, lr: 2.36e-02, grad_scale: 32.0 2024-09-14 00:33:37,411 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 00:33:46,588 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=22.5 2024-09-14 00:34:06,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=12.0 2024-09-14 00:34:20,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=48484.0, ans=0.2 2024-09-14 00:34:23,530 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.438e+02 2.813e+02 3.513e+02 6.461e+02, threshold=5.625e+02, percent-clipped=1.0 2024-09-14 00:34:25,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=48484.0, ans=0.125 2024-09-14 00:34:43,242 INFO [train.py:1198] (0/2) Epoch 3, batch 4350, loss[loss=0.362, ctc_loss=0.2711, cr_loss=0.4544, over 20674.00 frames. ], tot_loss[loss=0.3469, ctc_loss=0.2614, cr_loss=0.4278, over 4075123.98 frames. ], batch size: 71, lr: 2.36e-02, grad_scale: 32.0 2024-09-14 00:34:49,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=48540.666666666664, ans=0.125 2024-09-14 00:35:03,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=48569.0, ans=0.125 2024-09-14 00:35:23,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=48597.333333333336, ans=0.125 2024-09-14 00:35:58,846 INFO [train.py:1198] (0/2) Epoch 3, batch 4400, loss[loss=0.3452, ctc_loss=0.256, cr_loss=0.446, over 20734.00 frames. ], tot_loss[loss=0.3458, ctc_loss=0.2603, cr_loss=0.4277, over 4085570.73 frames. ], batch size: 71, lr: 2.36e-02, grad_scale: 32.0 2024-09-14 00:36:23,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=48710.666666666664, ans=0.025 2024-09-14 00:36:31,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=48739.0, ans=0.0002741304347826084 2024-09-14 00:36:54,845 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.431e+02 2.704e+02 3.193e+02 5.692e+02, threshold=5.407e+02, percent-clipped=1.0 2024-09-14 00:37:14,427 INFO [train.py:1198] (0/2) Epoch 3, batch 4450, loss[loss=0.3281, ctc_loss=0.2414, cr_loss=0.4338, over 21018.00 frames. ], tot_loss[loss=0.3462, ctc_loss=0.2607, cr_loss=0.4278, over 4079322.26 frames. ], batch size: 52, lr: 2.35e-02, grad_scale: 32.0 2024-09-14 00:37:35,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=48852.333333333336, ans=0.125 2024-09-14 00:37:36,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=48852.333333333336, ans=0.125 2024-09-14 00:37:47,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=22.5 2024-09-14 00:37:50,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=48880.666666666664, ans=0.2 2024-09-14 00:38:24,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=48937.333333333336, ans=0.125 2024-09-14 00:38:28,721 INFO [train.py:1198] (0/2) Epoch 3, batch 4500, loss[loss=0.3812, ctc_loss=0.287, cr_loss=0.4711, over 20948.00 frames. ], tot_loss[loss=0.3464, ctc_loss=0.2607, cr_loss=0.4286, over 4082988.80 frames. ], batch size: 60, lr: 2.35e-02, grad_scale: 32.0 2024-09-14 00:38:44,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-09-14 00:38:47,320 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-09-14 00:39:03,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=49022.333333333336, ans=0.125 2024-09-14 00:39:09,707 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2024-09-14 00:39:17,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.71 vs. limit=10.0 2024-09-14 00:39:27,294 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.397e+02 2.714e+02 3.183e+02 5.265e+02, threshold=5.429e+02, percent-clipped=0.0 2024-09-14 00:39:49,974 INFO [train.py:1198] (0/2) Epoch 3, batch 4550, loss[loss=0.3981, ctc_loss=0.306, cr_loss=0.4608, over 20852.00 frames. ], tot_loss[loss=0.3464, ctc_loss=0.2607, cr_loss=0.4283, over 4081261.76 frames. ], batch size: 65, lr: 2.35e-02, grad_scale: 32.0 2024-09-14 00:40:02,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=49107.333333333336, ans=0.0 2024-09-14 00:41:05,067 INFO [train.py:1198] (0/2) Epoch 3, batch 4600, loss[loss=0.3501, ctc_loss=0.2639, cr_loss=0.4309, over 21029.00 frames. ], tot_loss[loss=0.3469, ctc_loss=0.2612, cr_loss=0.4283, over 4068843.23 frames. ], batch size: 63, lr: 2.35e-02, grad_scale: 16.0 2024-09-14 00:41:08,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49249.0, ans=0.1 2024-09-14 00:41:08,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=49249.0, ans=0.2 2024-09-14 00:41:20,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=49277.333333333336, ans=0.5 2024-09-14 00:41:35,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=49305.666666666664, ans=0.125 2024-09-14 00:41:45,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.24 vs. limit=15.0 2024-09-14 00:41:49,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2024-09-14 00:42:01,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=49334.0, ans=0.00014478260869565246 2024-09-14 00:42:02,451 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.427e+02 2.669e+02 3.167e+02 5.749e+02, threshold=5.338e+02, percent-clipped=1.0 2024-09-14 00:42:14,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=49362.333333333336, ans=0.0 2024-09-14 00:42:20,193 INFO [train.py:1198] (0/2) Epoch 3, batch 4650, loss[loss=0.3657, ctc_loss=0.2744, cr_loss=0.4561, over 20978.00 frames. ], tot_loss[loss=0.3461, ctc_loss=0.2606, cr_loss=0.4277, over 4062469.77 frames. ], batch size: 58, lr: 2.34e-02, grad_scale: 16.0 2024-09-14 00:43:18,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2024-09-14 00:43:35,607 INFO [train.py:1198] (0/2) Epoch 3, batch 4700, loss[loss=0.2754, ctc_loss=0.2021, cr_loss=0.3665, over 20969.00 frames. ], tot_loss[loss=0.3443, ctc_loss=0.259, cr_loss=0.4264, over 4073693.13 frames. ], batch size: 49, lr: 2.34e-02, grad_scale: 16.0 2024-09-14 00:43:43,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=49532.333333333336, ans=0.2 2024-09-14 00:43:56,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=49560.666666666664, ans=0.125 2024-09-14 00:44:03,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=49560.666666666664, ans=0.0 2024-09-14 00:44:35,818 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.377e+02 2.710e+02 3.190e+02 4.745e+02, threshold=5.421e+02, percent-clipped=0.0 2024-09-14 00:44:39,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=49645.666666666664, ans=10.0 2024-09-14 00:44:47,189 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-09-14 00:44:53,859 INFO [train.py:1198] (0/2) Epoch 3, batch 4750, loss[loss=0.355, ctc_loss=0.2658, cr_loss=0.4459, over 20330.00 frames. ], tot_loss[loss=0.3472, ctc_loss=0.2614, cr_loss=0.4292, over 4064495.02 frames. ], batch size: 74, lr: 2.34e-02, grad_scale: 16.0 2024-09-14 00:45:40,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=49759.0, ans=0.125 2024-09-14 00:45:48,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49759.0, ans=0.1 2024-09-14 00:46:06,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=49787.333333333336, ans=4.6231884057969935e-05 2024-09-14 00:46:08,680 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=12.0 2024-09-14 00:46:11,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=49815.666666666664, ans=0.025 2024-09-14 00:46:12,307 INFO [train.py:1198] (0/2) Epoch 3, batch 4800, loss[loss=0.3332, ctc_loss=0.25, cr_loss=0.4156, over 20951.00 frames. ], tot_loss[loss=0.3467, ctc_loss=0.261, cr_loss=0.4283, over 4055604.24 frames. ], batch size: 60, lr: 2.33e-02, grad_scale: 32.0 2024-09-14 00:46:18,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=49815.666666666664, ans=0.125 2024-09-14 00:46:29,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=49844.0, ans=0.2 2024-09-14 00:47:06,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=49900.666666666664, ans=0.0 2024-09-14 00:47:09,065 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 2.313e+02 2.745e+02 3.230e+02 4.598e+02, threshold=5.490e+02, percent-clipped=0.0 2024-09-14 00:47:24,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.51 vs. limit=22.5 2024-09-14 00:47:26,886 INFO [train.py:1198] (0/2) Epoch 3, batch 4850, loss[loss=0.3715, ctc_loss=0.2828, cr_loss=0.4436, over 20096.00 frames. ], tot_loss[loss=0.3466, ctc_loss=0.261, cr_loss=0.428, over 4060046.32 frames. ], batch size: 80, lr: 2.33e-02, grad_scale: 32.0 2024-09-14 00:47:51,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2024-09-14 00:48:24,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=50042.333333333336, ans=0.125 2024-09-14 00:48:42,300 INFO [train.py:1198] (0/2) Epoch 3, batch 4900, loss[loss=0.3995, ctc_loss=0.3192, cr_loss=0.4017, over 14372.00 frames. ], tot_loss[loss=0.3454, ctc_loss=0.2599, cr_loss=0.4273, over 4068253.06 frames. ], batch size: 149, lr: 2.33e-02, grad_scale: 32.0 2024-09-14 00:49:09,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=50127.333333333336, ans=0.125 2024-09-14 00:49:15,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=50155.666666666664, ans=0.125 2024-09-14 00:49:34,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=50184.0, ans=0.0 2024-09-14 00:49:38,788 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.313e+02 2.593e+02 2.985e+02 4.393e+02, threshold=5.187e+02, percent-clipped=0.0 2024-09-14 00:49:45,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=50212.333333333336, ans=0.125 2024-09-14 00:49:56,993 INFO [train.py:1198] (0/2) Epoch 3, batch 4950, loss[loss=0.2938, ctc_loss=0.2182, cr_loss=0.378, over 20972.00 frames. ], tot_loss[loss=0.3438, ctc_loss=0.2586, cr_loss=0.4262, over 4081282.94 frames. ], batch size: 50, lr: 2.33e-02, grad_scale: 32.0 2024-09-14 00:50:01,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=50240.666666666664, ans=0.2 2024-09-14 00:50:05,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50240.666666666664, ans=0.1 2024-09-14 00:50:07,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=50240.666666666664, ans=0.125 2024-09-14 00:50:35,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=50297.333333333336, ans=0.125 2024-09-14 00:51:01,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.44 vs. limit=10.0 2024-09-14 00:51:11,218 INFO [train.py:1198] (0/2) Epoch 3, batch 5000, loss[loss=0.3789, ctc_loss=0.2844, cr_loss=0.4727, over 21007.00 frames. ], tot_loss[loss=0.3443, ctc_loss=0.2587, cr_loss=0.4276, over 4096871.47 frames. ], batch size: 61, lr: 2.32e-02, grad_scale: 32.0 2024-09-14 00:51:57,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=50467.333333333336, ans=0.0 2024-09-14 00:52:03,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50467.333333333336, ans=0.1 2024-09-14 00:52:09,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-14 00:52:10,048 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.312e+02 2.677e+02 3.290e+02 7.428e+02, threshold=5.354e+02, percent-clipped=2.0 2024-09-14 00:52:10,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2024-09-14 00:52:17,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=50495.666666666664, ans=0.0 2024-09-14 00:52:27,706 INFO [train.py:1198] (0/2) Epoch 3, batch 5050, loss[loss=0.3711, ctc_loss=0.2805, cr_loss=0.4528, over 20638.00 frames. ], tot_loss[loss=0.3442, ctc_loss=0.2586, cr_loss=0.428, over 4096941.70 frames. ], batch size: 71, lr: 2.32e-02, grad_scale: 32.0 2024-09-14 00:52:40,266 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-09-14 00:52:41,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=50552.333333333336, ans=0.125 2024-09-14 00:53:10,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=50580.666666666664, ans=0.0 2024-09-14 00:53:38,952 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=12.0 2024-09-14 00:53:44,030 INFO [train.py:1198] (0/2) Epoch 3, batch 5100, loss[loss=0.3418, ctc_loss=0.2606, cr_loss=0.4059, over 20667.00 frames. ], tot_loss[loss=0.3456, ctc_loss=0.2599, cr_loss=0.4285, over 4090044.68 frames. ], batch size: 66, lr: 2.32e-02, grad_scale: 32.0 2024-09-14 00:53:44,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=50665.666666666664, ans=0.125 2024-09-14 00:53:52,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-09-14 00:54:29,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2024-09-14 00:54:36,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=50750.666666666664, ans=0.125 2024-09-14 00:54:41,873 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.468e+02 2.944e+02 3.691e+02 6.014e+02, threshold=5.889e+02, percent-clipped=1.0 2024-09-14 00:54:46,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=50779.0, ans=0.0 2024-09-14 00:54:58,508 INFO [train.py:1198] (0/2) Epoch 3, batch 5150, loss[loss=0.4297, ctc_loss=0.3382, cr_loss=0.4577, over 14005.00 frames. ], tot_loss[loss=0.3448, ctc_loss=0.2594, cr_loss=0.427, over 4081627.56 frames. ], batch size: 151, lr: 2.32e-02, grad_scale: 16.0 2024-09-14 00:55:16,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=50835.666666666664, ans=0.125 2024-09-14 00:55:22,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=50835.666666666664, ans=0.125 2024-09-14 00:55:34,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=50864.0, ans=0.125 2024-09-14 00:55:42,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=50892.333333333336, ans=0.0 2024-09-14 00:56:08,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=50920.666666666664, ans=0.0 2024-09-14 00:56:13,011 INFO [train.py:1198] (0/2) Epoch 3, batch 5200, loss[loss=0.362, ctc_loss=0.2747, cr_loss=0.4365, over 19940.00 frames. ], tot_loss[loss=0.3452, ctc_loss=0.2597, cr_loss=0.4277, over 4076830.08 frames. ], batch size: 80, lr: 2.31e-02, grad_scale: 32.0 2024-09-14 00:56:34,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50977.333333333336, ans=0.1 2024-09-14 00:56:46,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=51005.666666666664, ans=0.2 2024-09-14 00:56:52,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=51005.666666666664, ans=0.0 2024-09-14 00:57:06,890 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 00:57:12,420 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.592e+02 2.910e+02 3.799e+02 5.620e+02, threshold=5.820e+02, percent-clipped=0.0 2024-09-14 00:57:20,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=51062.333333333336, ans=0.2 2024-09-14 00:57:27,222 INFO [train.py:1198] (0/2) Epoch 3, batch 5250, loss[loss=0.3707, ctc_loss=0.2776, cr_loss=0.4654, over 20962.00 frames. ], tot_loss[loss=0.3458, ctc_loss=0.2602, cr_loss=0.4282, over 4077772.51 frames. ], batch size: 58, lr: 2.31e-02, grad_scale: 16.0 2024-09-14 00:57:36,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=51090.666666666664, ans=0.0 2024-09-14 00:58:33,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=51204.0, ans=0.1 2024-09-14 00:58:33,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=51204.0, ans=0.0 2024-09-14 00:58:33,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=51204.0, ans=0.0 2024-09-14 00:58:33,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51204.0, ans=0.1 2024-09-14 00:58:40,585 INFO [train.py:1198] (0/2) Epoch 3, batch 5300, loss[loss=0.3404, ctc_loss=0.2509, cr_loss=0.4478, over 21030.00 frames. ], tot_loss[loss=0.3451, ctc_loss=0.2595, cr_loss=0.428, over 4075779.14 frames. ], batch size: 62, lr: 2.31e-02, grad_scale: 16.0 2024-09-14 00:58:42,865 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-09-14 00:58:43,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=51232.333333333336, ans=10.0 2024-09-14 00:59:22,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=51289.0, ans=0.125 2024-09-14 00:59:39,995 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.292e+02 2.469e+02 2.814e+02 4.070e+02, threshold=4.939e+02, percent-clipped=0.0 2024-09-14 00:59:55,189 INFO [train.py:1198] (0/2) Epoch 3, batch 5350, loss[loss=0.3107, ctc_loss=0.227, cr_loss=0.4185, over 20954.00 frames. ], tot_loss[loss=0.3428, ctc_loss=0.2576, cr_loss=0.4262, over 4092814.82 frames. ], batch size: 49, lr: 2.30e-02, grad_scale: 16.0 2024-09-14 00:59:57,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=51374.0, ans=0.125 2024-09-14 01:00:18,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-09-14 01:00:25,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=51430.666666666664, ans=0.0 2024-09-14 01:00:33,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=51430.666666666664, ans=0.0 2024-09-14 01:00:51,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=51459.0, ans=0.0 2024-09-14 01:01:02,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=51487.333333333336, ans=0.025 2024-09-14 01:01:11,363 INFO [train.py:1198] (0/2) Epoch 3, batch 5400, loss[loss=0.3492, ctc_loss=0.2566, cr_loss=0.4631, over 21030.00 frames. ], tot_loss[loss=0.3429, ctc_loss=0.2577, cr_loss=0.4263, over 4085859.29 frames. ], batch size: 62, lr: 2.30e-02, grad_scale: 16.0 2024-09-14 01:01:22,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=51515.666666666664, ans=0.0 2024-09-14 01:01:30,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=51544.0, ans=0.125 2024-09-14 01:01:34,062 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=22.5 2024-09-14 01:02:08,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2024-09-14 01:02:10,252 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.396e+02 2.735e+02 3.180e+02 5.536e+02, threshold=5.470e+02, percent-clipped=4.0 2024-09-14 01:02:13,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=51629.0, ans=0.1 2024-09-14 01:02:16,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=51629.0, ans=0.125 2024-09-14 01:02:27,334 INFO [train.py:1198] (0/2) Epoch 3, batch 5450, loss[loss=0.3081, ctc_loss=0.2264, cr_loss=0.4087, over 21070.00 frames. ], tot_loss[loss=0.3442, ctc_loss=0.2586, cr_loss=0.4277, over 4080735.52 frames. ], batch size: 56, lr: 2.30e-02, grad_scale: 16.0 2024-09-14 01:02:39,563 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:02:57,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=51714.0, ans=0.125 2024-09-14 01:03:04,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=51714.0, ans=0.125 2024-09-14 01:03:13,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=51742.333333333336, ans=0.0 2024-09-14 01:03:18,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=51742.333333333336, ans=0.125 2024-09-14 01:03:22,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=51742.333333333336, ans=0.125 2024-09-14 01:03:41,439 INFO [train.py:1198] (0/2) Epoch 3, batch 5500, loss[loss=0.3346, ctc_loss=0.2544, cr_loss=0.4006, over 20882.00 frames. ], tot_loss[loss=0.3425, ctc_loss=0.2573, cr_loss=0.426, over 4088879.12 frames. ], batch size: 57, lr: 2.30e-02, grad_scale: 16.0 2024-09-14 01:04:01,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-14 01:04:40,734 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.225e+02 2.487e+02 2.765e+02 5.183e+02, threshold=4.975e+02, percent-clipped=0.0 2024-09-14 01:04:55,293 INFO [train.py:1198] (0/2) Epoch 3, batch 5550, loss[loss=0.301, ctc_loss=0.2238, cr_loss=0.3859, over 20894.00 frames. ], tot_loss[loss=0.3419, ctc_loss=0.2568, cr_loss=0.4255, over 4095249.99 frames. ], batch size: 54, lr: 2.29e-02, grad_scale: 16.0 2024-09-14 01:05:18,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=51969.0, ans=0.125 2024-09-14 01:05:20,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=51969.0, ans=0.125 2024-09-14 01:05:20,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=51969.0, ans=0.0 2024-09-14 01:06:09,017 INFO [train.py:1198] (0/2) Epoch 3, batch 5600, loss[loss=0.3922, ctc_loss=0.2992, cr_loss=0.4649, over 18243.00 frames. ], tot_loss[loss=0.3432, ctc_loss=0.2579, cr_loss=0.4267, over 4093265.62 frames. ], batch size: 108, lr: 2.29e-02, grad_scale: 32.0 2024-09-14 01:06:21,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.87 vs. limit=15.0 2024-09-14 01:06:22,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=52110.666666666664, ans=0.0 2024-09-14 01:06:55,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=52167.333333333336, ans=0.0 2024-09-14 01:07:05,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=52167.333333333336, ans=0.0 2024-09-14 01:07:08,546 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.606e+02 2.917e+02 3.457e+02 7.046e+02, threshold=5.834e+02, percent-clipped=6.0 2024-09-14 01:07:20,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=52195.666666666664, ans=0.025 2024-09-14 01:07:23,323 INFO [train.py:1198] (0/2) Epoch 3, batch 5650, loss[loss=0.4254, ctc_loss=0.3335, cr_loss=0.4594, over 14681.00 frames. ], tot_loss[loss=0.3431, ctc_loss=0.2576, cr_loss=0.4272, over 4098404.98 frames. ], batch size: 151, lr: 2.29e-02, grad_scale: 32.0 2024-09-14 01:07:38,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=52252.333333333336, ans=0.125 2024-09-14 01:08:10,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=52309.0, ans=0.125 2024-09-14 01:08:27,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=52337.333333333336, ans=0.125 2024-09-14 01:08:37,539 INFO [train.py:1198] (0/2) Epoch 3, batch 5700, loss[loss=0.2743, ctc_loss=0.2054, cr_loss=0.3442, over 20960.00 frames. ], tot_loss[loss=0.3413, ctc_loss=0.2562, cr_loss=0.4255, over 4105596.18 frames. ], batch size: 49, lr: 2.29e-02, grad_scale: 32.0 2024-09-14 01:08:42,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2024-09-14 01:09:04,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=52394.0, ans=0.0 2024-09-14 01:09:10,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=52422.333333333336, ans=0.07 2024-09-14 01:09:27,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=52450.666666666664, ans=0.125 2024-09-14 01:09:39,691 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.391e+02 2.677e+02 3.083e+02 5.761e+02, threshold=5.355e+02, percent-clipped=1.0 2024-09-14 01:09:54,684 INFO [train.py:1198] (0/2) Epoch 3, batch 5750, loss[loss=0.3588, ctc_loss=0.2693, cr_loss=0.4479, over 21048.00 frames. ], tot_loss[loss=0.3419, ctc_loss=0.2567, cr_loss=0.4256, over 4106602.28 frames. ], batch size: 61, lr: 2.28e-02, grad_scale: 32.0 2024-09-14 01:10:10,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=52535.666666666664, ans=0.125 2024-09-14 01:10:37,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=52592.333333333336, ans=0.05 2024-09-14 01:10:40,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=52592.333333333336, ans=0.2 2024-09-14 01:11:00,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=52620.666666666664, ans=0.2 2024-09-14 01:11:10,217 INFO [train.py:1198] (0/2) Epoch 3, batch 5800, loss[loss=0.3358, ctc_loss=0.2508, cr_loss=0.4251, over 21022.00 frames. ], tot_loss[loss=0.3425, ctc_loss=0.2573, cr_loss=0.4261, over 4093177.48 frames. ], batch size: 61, lr: 2.28e-02, grad_scale: 32.0 2024-09-14 01:11:38,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52705.666666666664, ans=0.1 2024-09-14 01:11:45,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52705.666666666664, ans=0.1 2024-09-14 01:11:49,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=52705.666666666664, ans=0.025 2024-09-14 01:12:09,618 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.414e+02 2.743e+02 3.288e+02 5.794e+02, threshold=5.486e+02, percent-clipped=3.0 2024-09-14 01:12:15,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=52762.333333333336, ans=0.0 2024-09-14 01:12:23,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=52790.666666666664, ans=0.125 2024-09-14 01:12:23,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2024-09-14 01:12:24,258 INFO [train.py:1198] (0/2) Epoch 3, batch 5850, loss[loss=0.4542, ctc_loss=0.3647, cr_loss=0.4477, over 14252.00 frames. ], tot_loss[loss=0.3438, ctc_loss=0.2584, cr_loss=0.4268, over 4069737.54 frames. ], batch size: 149, lr: 2.28e-02, grad_scale: 32.0 2024-09-14 01:12:31,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=52790.666666666664, ans=10.0 2024-09-14 01:12:48,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=52819.0, ans=0.125 2024-09-14 01:12:54,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=52847.333333333336, ans=0.125 2024-09-14 01:13:04,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52847.333333333336, ans=0.1 2024-09-14 01:13:10,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=52875.666666666664, ans=0.0 2024-09-14 01:13:29,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=52904.0, ans=0.2 2024-09-14 01:13:37,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.70 vs. limit=22.5 2024-09-14 01:13:37,684 INFO [train.py:1198] (0/2) Epoch 3, batch 5900, loss[loss=0.4361, ctc_loss=0.3374, cr_loss=0.4931, over 20354.00 frames. ], tot_loss[loss=0.3433, ctc_loss=0.2579, cr_loss=0.4267, over 4077667.79 frames. ], batch size: 74, lr: 2.27e-02, grad_scale: 32.0 2024-09-14 01:13:54,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2024-09-14 01:14:13,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=52989.0, ans=0.0 2024-09-14 01:14:21,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=53017.333333333336, ans=0.125 2024-09-14 01:14:37,286 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.295e+02 2.684e+02 3.322e+02 6.255e+02, threshold=5.369e+02, percent-clipped=1.0 2024-09-14 01:14:43,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=53045.666666666664, ans=0.0 2024-09-14 01:14:49,652 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:14:52,310 INFO [train.py:1198] (0/2) Epoch 3, batch 5950, loss[loss=0.3615, ctc_loss=0.2813, cr_loss=0.4006, over 19457.00 frames. ], tot_loss[loss=0.3404, ctc_loss=0.2555, cr_loss=0.4244, over 4091302.82 frames. ], batch size: 90, lr: 2.27e-02, grad_scale: 32.0 2024-09-14 01:14:58,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=53074.0, ans=0.125 2024-09-14 01:15:12,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-14 01:15:18,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=53102.333333333336, ans=0.125 2024-09-14 01:16:04,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=53215.666666666664, ans=0.125 2024-09-14 01:16:05,318 INFO [train.py:1198] (0/2) Epoch 3, batch 6000, loss[loss=0.3507, ctc_loss=0.2632, cr_loss=0.4371, over 20681.00 frames. ], tot_loss[loss=0.3404, ctc_loss=0.2554, cr_loss=0.425, over 4098661.01 frames. ], batch size: 66, lr: 2.27e-02, grad_scale: 32.0 2024-09-14 01:16:05,318 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 01:16:23,877 INFO [train.py:1230] (0/2) Epoch 3, validation: loss=0.08739, ctc_loss=0.08739, cr_loss=9.228e-15, over 944034.00 frames. 2024-09-14 01:16:23,878 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 01:16:24,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=53215.666666666664, ans=0.125 2024-09-14 01:16:39,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-09-14 01:16:42,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=53244.0, ans=0.0 2024-09-14 01:16:56,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=12.0 2024-09-14 01:17:06,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=53272.333333333336, ans=0.04949747468305833 2024-09-14 01:17:12,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=8.0 2024-09-14 01:17:25,806 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.436e+02 2.868e+02 3.699e+02 6.884e+02, threshold=5.737e+02, percent-clipped=5.0 2024-09-14 01:17:41,300 INFO [train.py:1198] (0/2) Epoch 3, batch 6050, loss[loss=0.3949, ctc_loss=0.3107, cr_loss=0.4209, over 14751.00 frames. ], tot_loss[loss=0.3374, ctc_loss=0.2528, cr_loss=0.4229, over 4108079.08 frames. ], batch size: 150, lr: 2.27e-02, grad_scale: 32.0 2024-09-14 01:18:37,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=53442.333333333336, ans=0.0 2024-09-14 01:18:56,684 INFO [train.py:1198] (0/2) Epoch 3, batch 6100, loss[loss=0.352, ctc_loss=0.2654, cr_loss=0.4327, over 20826.00 frames. ], tot_loss[loss=0.3381, ctc_loss=0.2532, cr_loss=0.4244, over 4113677.68 frames. ], batch size: 59, lr: 2.26e-02, grad_scale: 32.0 2024-09-14 01:19:36,661 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:19:38,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=22.5 2024-09-14 01:19:41,093 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:19:47,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2024-09-14 01:19:55,696 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.271e+02 2.566e+02 3.060e+02 5.343e+02, threshold=5.133e+02, percent-clipped=0.0 2024-09-14 01:19:56,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-14 01:20:10,709 INFO [train.py:1198] (0/2) Epoch 3, batch 6150, loss[loss=0.3396, ctc_loss=0.253, cr_loss=0.4331, over 21033.00 frames. ], tot_loss[loss=0.3383, ctc_loss=0.2537, cr_loss=0.4234, over 4098610.61 frames. ], batch size: 62, lr: 2.26e-02, grad_scale: 32.0 2024-09-14 01:20:59,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=53725.666666666664, ans=0.125 2024-09-14 01:21:23,992 INFO [train.py:1198] (0/2) Epoch 3, batch 6200, loss[loss=0.3387, ctc_loss=0.2555, cr_loss=0.4162, over 21035.00 frames. ], tot_loss[loss=0.3403, ctc_loss=0.2554, cr_loss=0.4244, over 4070875.62 frames. ], batch size: 56, lr: 2.26e-02, grad_scale: 32.0 2024-09-14 01:21:37,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2024-09-14 01:21:52,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=53839.0, ans=0.125 2024-09-14 01:22:09,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=53867.333333333336, ans=0.1 2024-09-14 01:22:15,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=53867.333333333336, ans=0.2 2024-09-14 01:22:22,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.487e+02 3.106e+02 3.866e+02 8.162e+02, threshold=6.213e+02, percent-clipped=4.0 2024-09-14 01:22:38,054 INFO [train.py:1198] (0/2) Epoch 3, batch 6250, loss[loss=0.35, ctc_loss=0.2695, cr_loss=0.4021, over 19293.00 frames. ], tot_loss[loss=0.3412, ctc_loss=0.2562, cr_loss=0.4248, over 4053661.97 frames. ], batch size: 90, lr: 2.26e-02, grad_scale: 32.0 2024-09-14 01:22:44,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=53924.0, ans=0.125 2024-09-14 01:22:45,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=53924.0, ans=0.125 2024-09-14 01:23:13,841 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-09-14 01:23:33,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=54009.0, ans=0.2 2024-09-14 01:23:39,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=54037.333333333336, ans=0.035 2024-09-14 01:23:50,346 INFO [train.py:1198] (0/2) Epoch 3, batch 6300, loss[loss=0.4075, ctc_loss=0.3089, cr_loss=0.4931, over 18244.00 frames. ], tot_loss[loss=0.3458, ctc_loss=0.2605, cr_loss=0.4263, over 3976497.64 frames. ], batch size: 108, lr: 2.25e-02, grad_scale: 32.0 2024-09-14 01:24:12,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=54094.0, ans=0.125 2024-09-14 01:24:12,778 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:24:19,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=54122.333333333336, ans=0.0 2024-09-14 01:24:38,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=54150.666666666664, ans=0.125 2024-09-14 01:24:43,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=54150.666666666664, ans=0.125 2024-09-14 01:24:49,005 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.498e+02 2.907e+02 3.648e+02 6.562e+02, threshold=5.813e+02, percent-clipped=1.0 2024-09-14 01:25:03,490 INFO [train.py:1198] (0/2) Epoch 3, batch 6350, loss[loss=0.3767, ctc_loss=0.2928, cr_loss=0.4194, over 20672.00 frames. ], tot_loss[loss=0.352, ctc_loss=0.2662, cr_loss=0.4287, over 3906167.37 frames. ], batch size: 71, lr: 2.25e-02, grad_scale: 32.0 2024-09-14 01:25:05,353 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:25:39,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=54264.0, ans=0.0 2024-09-14 01:26:01,693 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-3.pt 2024-09-14 01:26:47,831 INFO [train.py:1198] (0/2) Epoch 4, batch 0, loss[loss=0.374, ctc_loss=0.2838, cr_loss=0.451, over 20662.00 frames. ], tot_loss[loss=0.374, ctc_loss=0.2838, cr_loss=0.451, over 20662.00 frames. ], batch size: 66, lr: 2.10e-02, grad_scale: 32.0 2024-09-14 01:26:47,832 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 01:27:06,291 INFO [train.py:1230] (0/2) Epoch 4, validation: loss=0.08861, ctc_loss=0.08861, cr_loss=9.595e-15, over 944034.00 frames. 2024-09-14 01:27:06,292 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 01:27:20,460 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2024-09-14 01:27:47,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=54380.166666666664, ans=0.125 2024-09-14 01:27:59,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=54408.5, ans=0.2 2024-09-14 01:28:11,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=54436.833333333336, ans=0.0 2024-09-14 01:28:15,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-14 01:28:18,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=54436.833333333336, ans=0.125 2024-09-14 01:28:19,819 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.466e+02 2.778e+02 3.325e+02 4.890e+02, threshold=5.557e+02, percent-clipped=0.0 2024-09-14 01:28:21,412 INFO [train.py:1198] (0/2) Epoch 4, batch 50, loss[loss=0.3004, ctc_loss=0.2212, cr_loss=0.3963, over 20977.00 frames. ], tot_loss[loss=0.3403, ctc_loss=0.2552, cr_loss=0.4256, over 919564.38 frames. ], batch size: 49, lr: 2.10e-02, grad_scale: 32.0 2024-09-14 01:28:35,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=12.0 2024-09-14 01:29:16,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=54550.166666666664, ans=0.125 2024-09-14 01:29:39,509 INFO [train.py:1198] (0/2) Epoch 4, batch 100, loss[loss=0.3518, ctc_loss=0.2632, cr_loss=0.4432, over 20634.00 frames. ], tot_loss[loss=0.3398, ctc_loss=0.2544, cr_loss=0.4267, over 1629838.13 frames. ], batch size: 66, lr: 2.10e-02, grad_scale: 32.0 2024-09-14 01:29:53,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54606.833333333336, ans=0.1 2024-09-14 01:30:23,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=54663.5, ans=0.015 2024-09-14 01:30:45,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=54720.166666666664, ans=0.125 2024-09-14 01:30:56,044 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.292e+02 2.583e+02 3.089e+02 4.761e+02, threshold=5.167e+02, percent-clipped=0.0 2024-09-14 01:30:57,525 INFO [train.py:1198] (0/2) Epoch 4, batch 150, loss[loss=0.3523, ctc_loss=0.2679, cr_loss=0.422, over 20845.00 frames. ], tot_loss[loss=0.3409, ctc_loss=0.2554, cr_loss=0.4275, over 2165967.56 frames. ], batch size: 65, lr: 2.10e-02, grad_scale: 32.0 2024-09-14 01:31:07,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=12.0 2024-09-14 01:31:17,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=54776.833333333336, ans=0.09899494936611666 2024-09-14 01:31:18,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=54776.833333333336, ans=0.2 2024-09-14 01:31:18,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=54776.833333333336, ans=0.0 2024-09-14 01:32:06,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=54861.833333333336, ans=0.125 2024-09-14 01:32:12,448 INFO [train.py:1198] (0/2) Epoch 4, batch 200, loss[loss=0.3308, ctc_loss=0.2473, cr_loss=0.4177, over 20971.00 frames. ], tot_loss[loss=0.3363, ctc_loss=0.2516, cr_loss=0.4233, over 2593558.60 frames. ], batch size: 58, lr: 2.09e-02, grad_scale: 32.0 2024-09-14 01:32:31,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2024-09-14 01:32:36,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=54918.5, ans=0.02 2024-09-14 01:32:59,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=54975.166666666664, ans=10.0 2024-09-14 01:33:25,644 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.290e+02 2.532e+02 2.846e+02 4.959e+02, threshold=5.064e+02, percent-clipped=0.0 2024-09-14 01:33:27,154 INFO [train.py:1198] (0/2) Epoch 4, batch 250, loss[loss=0.329, ctc_loss=0.2489, cr_loss=0.4004, over 21064.00 frames. ], tot_loss[loss=0.337, ctc_loss=0.2523, cr_loss=0.4232, over 2918429.68 frames. ], batch size: 56, lr: 2.09e-02, grad_scale: 32.0 2024-09-14 01:33:53,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.84 vs. limit=15.0 2024-09-14 01:33:53,111 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-09-14 01:33:56,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=55088.5, ans=0.2 2024-09-14 01:33:57,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=55088.5, ans=0.0 2024-09-14 01:34:17,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=12.0 2024-09-14 01:34:35,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-09-14 01:34:42,632 INFO [train.py:1198] (0/2) Epoch 4, batch 300, loss[loss=0.3791, ctc_loss=0.2851, cr_loss=0.47, over 18243.00 frames. ], tot_loss[loss=0.3374, ctc_loss=0.2525, cr_loss=0.4243, over 3177739.70 frames. ], batch size: 108, lr: 2.09e-02, grad_scale: 16.0 2024-09-14 01:34:52,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=55173.5, ans=0.07 2024-09-14 01:34:56,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=55173.5, ans=0.125 2024-09-14 01:35:36,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55258.5, ans=0.1 2024-09-14 01:35:41,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=55258.5, ans=0.125 2024-09-14 01:35:42,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=55258.5, ans=0.0 2024-09-14 01:35:42,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-09-14 01:35:50,522 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=22.5 2024-09-14 01:35:56,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=55286.833333333336, ans=0.125 2024-09-14 01:36:04,694 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.239e+02 2.447e+02 2.806e+02 5.533e+02, threshold=4.895e+02, percent-clipped=1.0 2024-09-14 01:36:04,713 INFO [train.py:1198] (0/2) Epoch 4, batch 350, loss[loss=0.3652, ctc_loss=0.2737, cr_loss=0.4576, over 20648.00 frames. ], tot_loss[loss=0.3359, ctc_loss=0.2511, cr_loss=0.4239, over 3395681.01 frames. ], batch size: 68, lr: 2.09e-02, grad_scale: 16.0 2024-09-14 01:36:11,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=55315.166666666664, ans=0.125 2024-09-14 01:36:11,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=55315.166666666664, ans=0.0 2024-09-14 01:37:12,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=55428.5, ans=0.04949747468305833 2024-09-14 01:37:19,889 INFO [train.py:1198] (0/2) Epoch 4, batch 400, loss[loss=0.3365, ctc_loss=0.251, cr_loss=0.4277, over 20975.00 frames. ], tot_loss[loss=0.3358, ctc_loss=0.2509, cr_loss=0.4243, over 3539035.66 frames. ], batch size: 55, lr: 2.08e-02, grad_scale: 32.0 2024-09-14 01:38:00,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=55513.5, ans=0.125 2024-09-14 01:38:26,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=55570.166666666664, ans=0.125 2024-09-14 01:38:35,103 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.264e+02 2.550e+02 3.085e+02 5.220e+02, threshold=5.100e+02, percent-clipped=1.0 2024-09-14 01:38:35,122 INFO [train.py:1198] (0/2) Epoch 4, batch 450, loss[loss=0.3432, ctc_loss=0.2612, cr_loss=0.41, over 21029.00 frames. ], tot_loss[loss=0.3345, ctc_loss=0.2497, cr_loss=0.4236, over 3674103.42 frames. ], batch size: 62, lr: 2.08e-02, grad_scale: 32.0 2024-09-14 01:38:38,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=55598.5, ans=0.2 2024-09-14 01:38:48,439 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=22.5 2024-09-14 01:38:58,259 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:39:11,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=55655.166666666664, ans=0.125 2024-09-14 01:39:47,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55711.833333333336, ans=0.1 2024-09-14 01:39:50,569 INFO [train.py:1198] (0/2) Epoch 4, batch 500, loss[loss=0.3313, ctc_loss=0.2473, cr_loss=0.42, over 20982.00 frames. ], tot_loss[loss=0.335, ctc_loss=0.25, cr_loss=0.4248, over 3773343.06 frames. ], batch size: 52, lr: 2.08e-02, grad_scale: 32.0 2024-09-14 01:40:01,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=55740.166666666664, ans=0.025 2024-09-14 01:40:31,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=55796.833333333336, ans=0.125 2024-09-14 01:40:39,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=55825.166666666664, ans=0.025 2024-09-14 01:41:09,327 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.255e+02 2.488e+02 2.846e+02 4.809e+02, threshold=4.975e+02, percent-clipped=0.0 2024-09-14 01:41:09,346 INFO [train.py:1198] (0/2) Epoch 4, batch 550, loss[loss=0.3401, ctc_loss=0.2526, cr_loss=0.4373, over 20831.00 frames. ], tot_loss[loss=0.3348, ctc_loss=0.2499, cr_loss=0.4245, over 3834381.99 frames. ], batch size: 59, lr: 2.08e-02, grad_scale: 32.0 2024-09-14 01:41:34,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=55910.166666666664, ans=0.125 2024-09-14 01:41:48,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-09-14 01:42:10,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=55966.833333333336, ans=0.0 2024-09-14 01:42:27,948 INFO [train.py:1198] (0/2) Epoch 4, batch 600, loss[loss=0.3393, ctc_loss=0.2516, cr_loss=0.4381, over 20924.00 frames. ], tot_loss[loss=0.3368, ctc_loss=0.2516, cr_loss=0.4258, over 3874082.23 frames. ], batch size: 60, lr: 2.08e-02, grad_scale: 32.0 2024-09-14 01:42:51,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=56051.833333333336, ans=0.0 2024-09-14 01:42:58,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=56080.166666666664, ans=12.0 2024-09-14 01:43:13,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=56108.5, ans=0.0 2024-09-14 01:43:15,562 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-09-14 01:43:18,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=56108.5, ans=0.125 2024-09-14 01:43:43,445 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.280e+02 2.697e+02 3.155e+02 5.118e+02, threshold=5.393e+02, percent-clipped=2.0 2024-09-14 01:43:43,465 INFO [train.py:1198] (0/2) Epoch 4, batch 650, loss[loss=0.3578, ctc_loss=0.268, cr_loss=0.449, over 20660.00 frames. ], tot_loss[loss=0.3367, ctc_loss=0.2513, cr_loss=0.4265, over 3915313.54 frames. ], batch size: 66, lr: 2.07e-02, grad_scale: 32.0 2024-09-14 01:43:57,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=56193.5, ans=0.1 2024-09-14 01:44:05,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=56193.5, ans=0.0 2024-09-14 01:44:29,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2024-09-14 01:44:34,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=56250.166666666664, ans=0.125 2024-09-14 01:44:58,524 INFO [train.py:1198] (0/2) Epoch 4, batch 700, loss[loss=0.3274, ctc_loss=0.2451, cr_loss=0.4116, over 20960.00 frames. ], tot_loss[loss=0.3343, ctc_loss=0.2495, cr_loss=0.4244, over 3952265.41 frames. ], batch size: 64, lr: 2.07e-02, grad_scale: 32.0 2024-09-14 01:45:15,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56335.166666666664, ans=0.1 2024-09-14 01:45:27,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=56363.5, ans=0.0 2024-09-14 01:45:53,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56391.833333333336, ans=0.1 2024-09-14 01:46:16,742 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.228e+02 2.527e+02 3.007e+02 4.463e+02, threshold=5.055e+02, percent-clipped=0.0 2024-09-14 01:46:16,761 INFO [train.py:1198] (0/2) Epoch 4, batch 750, loss[loss=0.342, ctc_loss=0.2544, cr_loss=0.4377, over 20658.00 frames. ], tot_loss[loss=0.3355, ctc_loss=0.2504, cr_loss=0.4259, over 3978442.62 frames. ], batch size: 68, lr: 2.07e-02, grad_scale: 32.0 2024-09-14 01:46:41,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=56476.833333333336, ans=0.2 2024-09-14 01:47:30,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=56561.833333333336, ans=0.0 2024-09-14 01:47:35,711 INFO [train.py:1198] (0/2) Epoch 4, batch 800, loss[loss=0.348, ctc_loss=0.2614, cr_loss=0.4329, over 20702.00 frames. ], tot_loss[loss=0.3364, ctc_loss=0.2512, cr_loss=0.4261, over 3987822.42 frames. ], batch size: 71, lr: 2.07e-02, grad_scale: 32.0 2024-09-14 01:47:49,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=56618.5, ans=0.0 2024-09-14 01:48:14,899 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-20000.pt 2024-09-14 01:48:51,660 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.334e+02 2.553e+02 2.998e+02 6.483e+02, threshold=5.106e+02, percent-clipped=2.0 2024-09-14 01:48:51,680 INFO [train.py:1198] (0/2) Epoch 4, batch 850, loss[loss=0.3133, ctc_loss=0.2319, cr_loss=0.4072, over 21061.00 frames. ], tot_loss[loss=0.3356, ctc_loss=0.2504, cr_loss=0.4258, over 4014260.42 frames. ], batch size: 59, lr: 2.06e-02, grad_scale: 32.0 2024-09-14 01:48:55,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=56731.833333333336, ans=0.125 2024-09-14 01:49:08,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=56760.166666666664, ans=0.1 2024-09-14 01:49:55,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=56845.166666666664, ans=0.125 2024-09-14 01:50:01,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=56845.166666666664, ans=0.125 2024-09-14 01:50:04,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=56845.166666666664, ans=0.125 2024-09-14 01:50:06,727 INFO [train.py:1198] (0/2) Epoch 4, batch 900, loss[loss=0.3482, ctc_loss=0.2631, cr_loss=0.4256, over 21003.00 frames. ], tot_loss[loss=0.3372, ctc_loss=0.2518, cr_loss=0.4271, over 4033850.62 frames. ], batch size: 61, lr: 2.06e-02, grad_scale: 32.0 2024-09-14 01:50:09,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=56873.5, ans=0.125 2024-09-14 01:50:36,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=56930.166666666664, ans=0.125 2024-09-14 01:50:57,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-14 01:51:12,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=56986.833333333336, ans=0.0 2024-09-14 01:51:17,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.82 vs. limit=22.5 2024-09-14 01:51:21,095 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.187e+02 2.451e+02 2.751e+02 5.212e+02, threshold=4.902e+02, percent-clipped=1.0 2024-09-14 01:51:21,113 INFO [train.py:1198] (0/2) Epoch 4, batch 950, loss[loss=0.3191, ctc_loss=0.2438, cr_loss=0.3766, over 21052.00 frames. ], tot_loss[loss=0.3375, ctc_loss=0.2519, cr_loss=0.428, over 4041262.30 frames. ], batch size: 56, lr: 2.06e-02, grad_scale: 32.0 2024-09-14 01:51:27,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=57015.166666666664, ans=0.125 2024-09-14 01:51:47,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=22.5 2024-09-14 01:52:42,041 INFO [train.py:1198] (0/2) Epoch 4, batch 1000, loss[loss=0.3282, ctc_loss=0.2416, cr_loss=0.4328, over 21040.00 frames. ], tot_loss[loss=0.336, ctc_loss=0.2506, cr_loss=0.427, over 4069956.31 frames. ], batch size: 62, lr: 2.06e-02, grad_scale: 32.0 2024-09-14 01:52:58,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=57185.166666666664, ans=0.0 2024-09-14 01:53:38,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=57241.833333333336, ans=0.5 2024-09-14 01:53:56,665 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.263e+02 2.491e+02 2.829e+02 5.827e+02, threshold=4.982e+02, percent-clipped=1.0 2024-09-14 01:53:56,684 INFO [train.py:1198] (0/2) Epoch 4, batch 1050, loss[loss=0.3196, ctc_loss=0.2407, cr_loss=0.3945, over 20933.00 frames. ], tot_loss[loss=0.3338, ctc_loss=0.2489, cr_loss=0.4244, over 4062877.43 frames. ], batch size: 60, lr: 2.06e-02, grad_scale: 32.0 2024-09-14 01:54:01,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=57298.5, ans=0.125 2024-09-14 01:55:10,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=57440.166666666664, ans=0.2 2024-09-14 01:55:11,812 INFO [train.py:1198] (0/2) Epoch 4, batch 1100, loss[loss=0.3394, ctc_loss=0.2522, cr_loss=0.436, over 21015.00 frames. ], tot_loss[loss=0.3328, ctc_loss=0.248, cr_loss=0.4239, over 4078611.28 frames. ], batch size: 61, lr: 2.05e-02, grad_scale: 32.0 2024-09-14 01:55:24,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=57440.166666666664, ans=0.125 2024-09-14 01:55:26,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-14 01:55:42,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=57496.833333333336, ans=0.025 2024-09-14 01:55:45,553 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-09-14 01:55:49,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57496.833333333336, ans=0.1 2024-09-14 01:56:07,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=57525.166666666664, ans=0.125 2024-09-14 01:56:17,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2024-09-14 01:56:22,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=57553.5, ans=0.125 2024-09-14 01:56:25,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57581.833333333336, ans=0.1 2024-09-14 01:56:26,548 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.230e+02 2.445e+02 2.927e+02 4.253e+02, threshold=4.890e+02, percent-clipped=0.0 2024-09-14 01:56:26,568 INFO [train.py:1198] (0/2) Epoch 4, batch 1150, loss[loss=0.3384, ctc_loss=0.2523, cr_loss=0.4305, over 21070.00 frames. ], tot_loss[loss=0.334, ctc_loss=0.249, cr_loss=0.4249, over 4088795.03 frames. ], batch size: 56, lr: 2.05e-02, grad_scale: 32.0 2024-09-14 01:56:41,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=57610.166666666664, ans=0.125 2024-09-14 01:56:43,542 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-14 01:57:23,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.95 vs. limit=12.0 2024-09-14 01:57:33,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=57695.166666666664, ans=0.125 2024-09-14 01:57:33,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-09-14 01:57:40,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=57695.166666666664, ans=0.125 2024-09-14 01:57:40,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=57695.166666666664, ans=0.125 2024-09-14 01:57:44,774 INFO [train.py:1198] (0/2) Epoch 4, batch 1200, loss[loss=0.3405, ctc_loss=0.2532, cr_loss=0.4365, over 20721.00 frames. ], tot_loss[loss=0.333, ctc_loss=0.2481, cr_loss=0.4246, over 4103020.12 frames. ], batch size: 71, lr: 2.05e-02, grad_scale: 32.0 2024-09-14 01:58:03,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=57751.833333333336, ans=0.125 2024-09-14 01:58:12,968 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=15.0 2024-09-14 01:59:03,649 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.239e+02 2.446e+02 2.774e+02 4.788e+02, threshold=4.893e+02, percent-clipped=0.0 2024-09-14 01:59:03,669 INFO [train.py:1198] (0/2) Epoch 4, batch 1250, loss[loss=0.2937, ctc_loss=0.215, cr_loss=0.3939, over 21052.00 frames. ], tot_loss[loss=0.3316, ctc_loss=0.2469, cr_loss=0.4235, over 4113893.84 frames. ], batch size: 56, lr: 2.05e-02, grad_scale: 32.0 2024-09-14 01:59:08,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-09-14 01:59:11,089 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 01:59:20,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=57893.5, ans=0.2 2024-09-14 01:59:26,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=57893.5, ans=0.125 2024-09-14 01:59:35,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=57921.833333333336, ans=0.2 2024-09-14 01:59:53,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=57950.166666666664, ans=0.0 2024-09-14 01:59:55,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=57950.166666666664, ans=0.125 2024-09-14 02:00:14,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=57978.5, ans=0.125 2024-09-14 02:00:18,917 INFO [train.py:1198] (0/2) Epoch 4, batch 1300, loss[loss=0.3321, ctc_loss=0.2476, cr_loss=0.4223, over 20955.00 frames. ], tot_loss[loss=0.3322, ctc_loss=0.2475, cr_loss=0.4235, over 4112343.46 frames. ], batch size: 58, lr: 2.04e-02, grad_scale: 32.0 2024-09-14 02:00:55,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=58063.5, ans=0.2 2024-09-14 02:00:55,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-14 02:00:58,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=58063.5, ans=10.0 2024-09-14 02:01:33,793 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 2.422e+02 2.848e+02 3.328e+02 7.197e+02, threshold=5.697e+02, percent-clipped=3.0 2024-09-14 02:01:33,812 INFO [train.py:1198] (0/2) Epoch 4, batch 1350, loss[loss=0.2888, ctc_loss=0.2144, cr_loss=0.3722, over 20305.00 frames. ], tot_loss[loss=0.3333, ctc_loss=0.2484, cr_loss=0.4241, over 4096761.40 frames. ], batch size: 45, lr: 2.04e-02, grad_scale: 32.0 2024-09-14 02:01:37,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=58148.5, ans=0.125 2024-09-14 02:02:10,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=58205.166666666664, ans=0.025 2024-09-14 02:02:46,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=58261.833333333336, ans=0.0 2024-09-14 02:02:48,642 INFO [train.py:1198] (0/2) Epoch 4, batch 1400, loss[loss=0.3116, ctc_loss=0.233, cr_loss=0.3929, over 21063.00 frames. ], tot_loss[loss=0.3331, ctc_loss=0.2483, cr_loss=0.4244, over 4104731.01 frames. ], batch size: 56, lr: 2.04e-02, grad_scale: 32.0 2024-09-14 02:03:15,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=58318.5, ans=0.1 2024-09-14 02:03:15,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=58318.5, ans=0.0 2024-09-14 02:03:26,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=58346.833333333336, ans=0.035 2024-09-14 02:03:38,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=58375.166666666664, ans=0.2 2024-09-14 02:04:03,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2024-09-14 02:04:09,991 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.266e+02 2.530e+02 3.055e+02 6.181e+02, threshold=5.060e+02, percent-clipped=1.0 2024-09-14 02:04:10,011 INFO [train.py:1198] (0/2) Epoch 4, batch 1450, loss[loss=0.3721, ctc_loss=0.2791, cr_loss=0.4648, over 19364.00 frames. ], tot_loss[loss=0.3328, ctc_loss=0.2479, cr_loss=0.4243, over 4102799.60 frames. ], batch size: 90, lr: 2.04e-02, grad_scale: 32.0 2024-09-14 02:05:19,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=58545.166666666664, ans=0.125 2024-09-14 02:05:25,169 INFO [train.py:1198] (0/2) Epoch 4, batch 1500, loss[loss=0.3369, ctc_loss=0.2498, cr_loss=0.4358, over 21079.00 frames. ], tot_loss[loss=0.3337, ctc_loss=0.2486, cr_loss=0.4251, over 4103589.39 frames. ], batch size: 59, lr: 2.04e-02, grad_scale: 32.0 2024-09-14 02:05:51,743 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=22.5 2024-09-14 02:05:55,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=58630.166666666664, ans=0.2 2024-09-14 02:06:23,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=22.5 2024-09-14 02:06:40,823 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.302e+02 2.552e+02 2.878e+02 4.776e+02, threshold=5.103e+02, percent-clipped=0.0 2024-09-14 02:06:40,843 INFO [train.py:1198] (0/2) Epoch 4, batch 1550, loss[loss=0.3649, ctc_loss=0.2694, cr_loss=0.4777, over 20640.00 frames. ], tot_loss[loss=0.335, ctc_loss=0.2499, cr_loss=0.4251, over 4072515.55 frames. ], batch size: 68, lr: 2.03e-02, grad_scale: 32.0 2024-09-14 02:06:46,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=22.5 2024-09-14 02:06:59,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=58743.5, ans=0.0 2024-09-14 02:07:42,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-09-14 02:07:56,497 INFO [train.py:1198] (0/2) Epoch 4, batch 1600, loss[loss=0.3251, ctc_loss=0.2408, cr_loss=0.4217, over 20802.00 frames. ], tot_loss[loss=0.3335, ctc_loss=0.2485, cr_loss=0.4245, over 4082090.83 frames. ], batch size: 53, lr: 2.03e-02, grad_scale: 32.0 2024-09-14 02:08:03,569 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2024-09-14 02:08:16,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=58885.166666666664, ans=0.0 2024-09-14 02:08:19,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=58885.166666666664, ans=0.125 2024-09-14 02:09:14,886 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.305e+02 2.521e+02 2.941e+02 4.741e+02, threshold=5.041e+02, percent-clipped=0.0 2024-09-14 02:09:14,906 INFO [train.py:1198] (0/2) Epoch 4, batch 1650, loss[loss=0.3509, ctc_loss=0.2606, cr_loss=0.4515, over 20626.00 frames. ], tot_loss[loss=0.3333, ctc_loss=0.2484, cr_loss=0.4245, over 4079484.14 frames. ], batch size: 66, lr: 2.03e-02, grad_scale: 32.0 2024-09-14 02:09:21,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=58998.5, ans=0.125 2024-09-14 02:09:22,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=58998.5, ans=0.0 2024-09-14 02:09:33,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=59026.833333333336, ans=0.0 2024-09-14 02:09:33,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59026.833333333336, ans=0.1 2024-09-14 02:09:53,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=59055.166666666664, ans=0.125 2024-09-14 02:09:59,356 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:10:02,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=59083.5, ans=15.0 2024-09-14 02:10:33,896 INFO [train.py:1198] (0/2) Epoch 4, batch 1700, loss[loss=0.2675, ctc_loss=0.1936, cr_loss=0.3696, over 20969.00 frames. ], tot_loss[loss=0.331, ctc_loss=0.2464, cr_loss=0.423, over 4083205.24 frames. ], batch size: 48, lr: 2.03e-02, grad_scale: 32.0 2024-09-14 02:11:20,890 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:11:22,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=59225.166666666664, ans=0.0 2024-09-14 02:11:43,552 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-09-14 02:11:48,872 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.403e+02 2.734e+02 3.429e+02 5.395e+02, threshold=5.469e+02, percent-clipped=3.0 2024-09-14 02:11:48,893 INFO [train.py:1198] (0/2) Epoch 4, batch 1750, loss[loss=0.3383, ctc_loss=0.248, cr_loss=0.4516, over 20860.00 frames. ], tot_loss[loss=0.3302, ctc_loss=0.2455, cr_loss=0.4232, over 4096564.78 frames. ], batch size: 65, lr: 2.02e-02, grad_scale: 32.0 2024-09-14 02:12:43,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=59366.833333333336, ans=0.125 2024-09-14 02:12:45,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59366.833333333336, ans=0.1 2024-09-14 02:13:04,079 INFO [train.py:1198] (0/2) Epoch 4, batch 1800, loss[loss=0.3532, ctc_loss=0.2643, cr_loss=0.4445, over 20144.00 frames. ], tot_loss[loss=0.331, ctc_loss=0.2463, cr_loss=0.4237, over 4092849.70 frames. ], batch size: 80, lr: 2.02e-02, grad_scale: 32.0 2024-09-14 02:13:14,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=59423.5, ans=0.125 2024-09-14 02:13:24,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=59451.833333333336, ans=0.125 2024-09-14 02:14:19,628 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.241e+02 2.476e+02 2.738e+02 4.960e+02, threshold=4.953e+02, percent-clipped=0.0 2024-09-14 02:14:19,648 INFO [train.py:1198] (0/2) Epoch 4, batch 1850, loss[loss=0.2704, ctc_loss=0.2025, cr_loss=0.3398, over 20987.00 frames. ], tot_loss[loss=0.3306, ctc_loss=0.246, cr_loss=0.423, over 4098216.49 frames. ], batch size: 48, lr: 2.02e-02, grad_scale: 32.0 2024-09-14 02:15:03,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=59621.833333333336, ans=0.0 2024-09-14 02:15:41,189 INFO [train.py:1198] (0/2) Epoch 4, batch 1900, loss[loss=0.3078, ctc_loss=0.2273, cr_loss=0.4027, over 20787.00 frames. ], tot_loss[loss=0.3304, ctc_loss=0.246, cr_loss=0.4218, over 4082182.81 frames. ], batch size: 56, lr: 2.02e-02, grad_scale: 32.0 2024-09-14 02:16:08,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=59735.166666666664, ans=0.125 2024-09-14 02:16:33,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=59791.833333333336, ans=0.125 2024-09-14 02:16:42,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=59820.166666666664, ans=0.0 2024-09-14 02:16:57,007 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.325e+02 2.598e+02 2.989e+02 4.886e+02, threshold=5.196e+02, percent-clipped=0.0 2024-09-14 02:16:57,027 INFO [train.py:1198] (0/2) Epoch 4, batch 1950, loss[loss=0.281, ctc_loss=0.2057, cr_loss=0.3767, over 19883.00 frames. ], tot_loss[loss=0.3293, ctc_loss=0.2449, cr_loss=0.422, over 4092141.02 frames. ], batch size: 44, lr: 2.02e-02, grad_scale: 32.0 2024-09-14 02:17:26,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=59905.166666666664, ans=0.0 2024-09-14 02:17:50,746 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2024-09-14 02:18:02,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59961.833333333336, ans=0.1 2024-09-14 02:18:09,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.97 vs. limit=12.0 2024-09-14 02:18:12,639 INFO [train.py:1198] (0/2) Epoch 4, batch 2000, loss[loss=0.3105, ctc_loss=0.2307, cr_loss=0.399, over 20993.00 frames. ], tot_loss[loss=0.3302, ctc_loss=0.2457, cr_loss=0.422, over 4074616.14 frames. ], batch size: 58, lr: 2.01e-02, grad_scale: 32.0 2024-09-14 02:18:12,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=59990.166666666664, ans=0.0 2024-09-14 02:18:13,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2024-09-14 02:18:26,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60018.5, ans=0.1 2024-09-14 02:19:27,600 INFO [train.py:1198] (0/2) Epoch 4, batch 2050, loss[loss=0.3132, ctc_loss=0.2301, cr_loss=0.4156, over 20894.00 frames. ], tot_loss[loss=0.3295, ctc_loss=0.2452, cr_loss=0.4219, over 4082945.11 frames. ], batch size: 54, lr: 2.01e-02, grad_scale: 32.0 2024-09-14 02:19:29,093 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.164e+02 2.429e+02 2.730e+02 4.906e+02, threshold=4.857e+02, percent-clipped=0.0 2024-09-14 02:19:59,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=60188.5, ans=0.125 2024-09-14 02:20:33,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-09-14 02:20:46,044 INFO [train.py:1198] (0/2) Epoch 4, batch 2100, loss[loss=0.368, ctc_loss=0.2786, cr_loss=0.4469, over 20596.00 frames. ], tot_loss[loss=0.3305, ctc_loss=0.246, cr_loss=0.4224, over 4086506.13 frames. ], batch size: 75, lr: 2.01e-02, grad_scale: 32.0 2024-09-14 02:21:09,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2024-09-14 02:21:25,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=60330.166666666664, ans=0.2 2024-09-14 02:21:31,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=60330.166666666664, ans=0.2 2024-09-14 02:21:34,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=60358.5, ans=0.125 2024-09-14 02:21:54,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=60386.833333333336, ans=0.0 2024-09-14 02:22:04,917 INFO [train.py:1198] (0/2) Epoch 4, batch 2150, loss[loss=0.3491, ctc_loss=0.2613, cr_loss=0.439, over 21088.00 frames. ], tot_loss[loss=0.3302, ctc_loss=0.2458, cr_loss=0.4221, over 4073232.42 frames. ], batch size: 59, lr: 2.01e-02, grad_scale: 16.0 2024-09-14 02:22:07,892 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.354e+02 2.656e+02 3.093e+02 5.783e+02, threshold=5.313e+02, percent-clipped=1.0 2024-09-14 02:22:12,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=60415.166666666664, ans=0.1 2024-09-14 02:22:26,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=60443.5, ans=0.1 2024-09-14 02:22:36,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=60471.833333333336, ans=0.2 2024-09-14 02:22:59,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-09-14 02:23:20,088 INFO [train.py:1198] (0/2) Epoch 4, batch 2200, loss[loss=0.4091, ctc_loss=0.3204, cr_loss=0.4432, over 14354.00 frames. ], tot_loss[loss=0.3303, ctc_loss=0.2458, cr_loss=0.4226, over 4074704.77 frames. ], batch size: 150, lr: 2.01e-02, grad_scale: 16.0 2024-09-14 02:23:38,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=60585.166666666664, ans=0.025 2024-09-14 02:24:00,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.97 vs. limit=12.0 2024-09-14 02:24:02,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=60613.5, ans=0.0 2024-09-14 02:24:10,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=60641.833333333336, ans=0.1 2024-09-14 02:24:35,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2024-09-14 02:24:35,704 INFO [train.py:1198] (0/2) Epoch 4, batch 2250, loss[loss=0.3223, ctc_loss=0.2441, cr_loss=0.3908, over 20930.00 frames. ], tot_loss[loss=0.3284, ctc_loss=0.2441, cr_loss=0.4212, over 4079982.46 frames. ], batch size: 49, lr: 2.00e-02, grad_scale: 16.0 2024-09-14 02:24:38,563 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.228e+02 2.504e+02 2.906e+02 4.727e+02, threshold=5.007e+02, percent-clipped=0.0 2024-09-14 02:24:52,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=60726.833333333336, ans=0.125 2024-09-14 02:25:12,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2024-09-14 02:25:15,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-09-14 02:25:25,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=60783.5, ans=0.0 2024-09-14 02:25:31,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=60783.5, ans=0.0 2024-09-14 02:25:51,389 INFO [train.py:1198] (0/2) Epoch 4, batch 2300, loss[loss=0.3642, ctc_loss=0.2692, cr_loss=0.4752, over 21077.00 frames. ], tot_loss[loss=0.3284, ctc_loss=0.2442, cr_loss=0.421, over 4084442.53 frames. ], batch size: 59, lr: 2.00e-02, grad_scale: 16.0 2024-09-14 02:26:29,472 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0 2024-09-14 02:26:30,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=60896.833333333336, ans=0.125 2024-09-14 02:26:34,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=60896.833333333336, ans=0.0 2024-09-14 02:27:00,130 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:27:11,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=60981.833333333336, ans=0.125 2024-09-14 02:27:12,079 INFO [train.py:1198] (0/2) Epoch 4, batch 2350, loss[loss=0.334, ctc_loss=0.2427, cr_loss=0.4568, over 21052.00 frames. ], tot_loss[loss=0.3276, ctc_loss=0.2434, cr_loss=0.4209, over 4093674.07 frames. ], batch size: 56, lr: 2.00e-02, grad_scale: 16.0 2024-09-14 02:27:15,060 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.371e+02 2.641e+02 3.214e+02 5.050e+02, threshold=5.283e+02, percent-clipped=1.0 2024-09-14 02:27:21,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=60981.833333333336, ans=0.125 2024-09-14 02:27:25,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=61010.166666666664, ans=0.1 2024-09-14 02:27:51,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=61038.5, ans=0.0 2024-09-14 02:28:05,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=61066.833333333336, ans=0.125 2024-09-14 02:28:17,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=61095.166666666664, ans=0.125 2024-09-14 02:28:23,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=61095.166666666664, ans=0.125 2024-09-14 02:28:24,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61095.166666666664, ans=0.1 2024-09-14 02:28:27,488 INFO [train.py:1198] (0/2) Epoch 4, batch 2400, loss[loss=0.2809, ctc_loss=0.2061, cr_loss=0.3741, over 20978.00 frames. ], tot_loss[loss=0.3273, ctc_loss=0.243, cr_loss=0.4212, over 4092890.71 frames. ], batch size: 51, lr: 2.00e-02, grad_scale: 32.0 2024-09-14 02:28:29,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=61123.5, ans=0.05 2024-09-14 02:28:48,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=22.5 2024-09-14 02:28:49,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=61151.833333333336, ans=0.0 2024-09-14 02:29:07,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-09-14 02:29:25,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=61208.5, ans=0.125 2024-09-14 02:29:43,077 INFO [train.py:1198] (0/2) Epoch 4, batch 2450, loss[loss=0.3412, ctc_loss=0.2531, cr_loss=0.4403, over 20653.00 frames. ], tot_loss[loss=0.3269, ctc_loss=0.2426, cr_loss=0.4215, over 4099027.53 frames. ], batch size: 68, lr: 2.00e-02, grad_scale: 32.0 2024-09-14 02:29:46,087 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.320e+02 2.688e+02 3.117e+02 5.115e+02, threshold=5.375e+02, percent-clipped=0.0 2024-09-14 02:30:44,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=61378.5, ans=15.0 2024-09-14 02:30:52,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61378.5, ans=0.1 2024-09-14 02:30:56,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=61378.5, ans=0.125 2024-09-14 02:30:58,634 INFO [train.py:1198] (0/2) Epoch 4, batch 2500, loss[loss=0.3027, ctc_loss=0.2243, cr_loss=0.3922, over 20975.00 frames. ], tot_loss[loss=0.328, ctc_loss=0.2435, cr_loss=0.4223, over 4093354.18 frames. ], batch size: 52, lr: 1.99e-02, grad_scale: 32.0 2024-09-14 02:31:06,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=61406.833333333336, ans=0.125 2024-09-14 02:31:39,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=61463.5, ans=0.0 2024-09-14 02:31:54,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=61491.833333333336, ans=0.125 2024-09-14 02:32:09,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61520.166666666664, ans=0.1 2024-09-14 02:32:16,368 INFO [train.py:1198] (0/2) Epoch 4, batch 2550, loss[loss=0.2914, ctc_loss=0.2138, cr_loss=0.3881, over 21072.00 frames. ], tot_loss[loss=0.3291, ctc_loss=0.2446, cr_loss=0.4224, over 4092764.16 frames. ], batch size: 56, lr: 1.99e-02, grad_scale: 32.0 2024-09-14 02:32:19,318 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.249e+02 2.600e+02 3.277e+02 5.290e+02, threshold=5.199e+02, percent-clipped=0.0 2024-09-14 02:32:24,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=61548.5, ans=0.0 2024-09-14 02:32:24,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=61548.5, ans=0.125 2024-09-14 02:33:06,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=61633.5, ans=0.0 2024-09-14 02:33:16,067 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:33:25,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=61661.833333333336, ans=0.1 2024-09-14 02:33:29,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=61661.833333333336, ans=0.125 2024-09-14 02:33:35,247 INFO [train.py:1198] (0/2) Epoch 4, batch 2600, loss[loss=0.3024, ctc_loss=0.2226, cr_loss=0.399, over 20897.00 frames. ], tot_loss[loss=0.329, ctc_loss=0.2443, cr_loss=0.4233, over 4099554.93 frames. ], batch size: 54, lr: 1.99e-02, grad_scale: 32.0 2024-09-14 02:33:50,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=61718.5, ans=0.07 2024-09-14 02:34:12,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2024-09-14 02:34:14,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=61746.833333333336, ans=0.125 2024-09-14 02:34:50,826 INFO [train.py:1198] (0/2) Epoch 4, batch 2650, loss[loss=0.3337, ctc_loss=0.2487, cr_loss=0.4251, over 21045.00 frames. ], tot_loss[loss=0.3282, ctc_loss=0.2435, cr_loss=0.4234, over 4104846.18 frames. ], batch size: 62, lr: 1.99e-02, grad_scale: 32.0 2024-09-14 02:34:53,748 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.382e+02 2.697e+02 3.200e+02 4.789e+02, threshold=5.395e+02, percent-clipped=0.0 2024-09-14 02:34:58,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61831.833333333336, ans=0.1 2024-09-14 02:35:09,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.63 vs. limit=15.0 2024-09-14 02:35:13,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=61860.166666666664, ans=0.2 2024-09-14 02:35:14,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=22.5 2024-09-14 02:35:23,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=22.5 2024-09-14 02:35:51,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=61945.166666666664, ans=0.0 2024-09-14 02:36:06,000 INFO [train.py:1198] (0/2) Epoch 4, batch 2700, loss[loss=0.3594, ctc_loss=0.2662, cr_loss=0.4658, over 20649.00 frames. ], tot_loss[loss=0.3272, ctc_loss=0.2428, cr_loss=0.4219, over 4108052.31 frames. ], batch size: 66, lr: 1.98e-02, grad_scale: 32.0 2024-09-14 02:36:13,733 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:36:34,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=62030.166666666664, ans=0.0 2024-09-14 02:36:39,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-14 02:36:46,896 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=22.5 2024-09-14 02:37:20,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=22.5 2024-09-14 02:37:20,832 INFO [train.py:1198] (0/2) Epoch 4, batch 2750, loss[loss=0.3288, ctc_loss=0.2412, cr_loss=0.4377, over 20987.00 frames. ], tot_loss[loss=0.327, ctc_loss=0.2429, cr_loss=0.4208, over 4087139.44 frames. ], batch size: 52, lr: 1.98e-02, grad_scale: 32.0 2024-09-14 02:37:23,808 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.307e+02 2.523e+02 3.152e+02 4.526e+02, threshold=5.045e+02, percent-clipped=0.0 2024-09-14 02:38:06,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=62171.833333333336, ans=0.2 2024-09-14 02:38:42,520 INFO [train.py:1198] (0/2) Epoch 4, batch 2800, loss[loss=0.2778, ctc_loss=0.2066, cr_loss=0.3559, over 20957.00 frames. ], tot_loss[loss=0.328, ctc_loss=0.2437, cr_loss=0.4218, over 4088091.86 frames. ], batch size: 51, lr: 1.98e-02, grad_scale: 32.0 2024-09-14 02:39:09,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=62285.166666666664, ans=0.125 2024-09-14 02:39:48,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=62370.166666666664, ans=0.0 2024-09-14 02:39:51,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=62370.166666666664, ans=0.125 2024-09-14 02:39:57,671 INFO [train.py:1198] (0/2) Epoch 4, batch 2850, loss[loss=0.256, ctc_loss=0.1862, cr_loss=0.349, over 20979.00 frames. ], tot_loss[loss=0.3284, ctc_loss=0.2439, cr_loss=0.4223, over 4096782.39 frames. ], batch size: 49, lr: 1.98e-02, grad_scale: 32.0 2024-09-14 02:40:00,594 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.333e+02 2.637e+02 3.139e+02 4.944e+02, threshold=5.275e+02, percent-clipped=0.0 2024-09-14 02:40:32,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=62455.166666666664, ans=0.125 2024-09-14 02:40:52,822 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2024-09-14 02:41:13,076 INFO [train.py:1198] (0/2) Epoch 4, batch 2900, loss[loss=0.4202, ctc_loss=0.3325, cr_loss=0.4386, over 14282.00 frames. ], tot_loss[loss=0.3288, ctc_loss=0.2444, cr_loss=0.4223, over 4075204.24 frames. ], batch size: 149, lr: 1.98e-02, grad_scale: 32.0 2024-09-14 02:41:19,837 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2024-09-14 02:41:20,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=62540.166666666664, ans=0.125 2024-09-14 02:41:25,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=62540.166666666664, ans=0.95 2024-09-14 02:41:38,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=62568.5, ans=0.0 2024-09-14 02:42:23,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=62653.5, ans=0.125 2024-09-14 02:42:27,695 INFO [train.py:1198] (0/2) Epoch 4, batch 2950, loss[loss=0.3393, ctc_loss=0.2516, cr_loss=0.4383, over 20959.00 frames. ], tot_loss[loss=0.3285, ctc_loss=0.2439, cr_loss=0.4229, over 4085606.40 frames. ], batch size: 64, lr: 1.97e-02, grad_scale: 32.0 2024-09-14 02:42:30,686 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.239e+02 2.461e+02 2.747e+02 6.552e+02, threshold=4.921e+02, percent-clipped=1.0 2024-09-14 02:43:01,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=62738.5, ans=0.05 2024-09-14 02:43:45,887 INFO [train.py:1198] (0/2) Epoch 4, batch 3000, loss[loss=0.2997, ctc_loss=0.2238, cr_loss=0.3793, over 21058.00 frames. ], tot_loss[loss=0.3285, ctc_loss=0.2439, cr_loss=0.4228, over 4086870.06 frames. ], batch size: 53, lr: 1.97e-02, grad_scale: 16.0 2024-09-14 02:43:45,888 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 02:44:05,789 INFO [train.py:1230] (0/2) Epoch 4, validation: loss=0.08117, ctc_loss=0.08117, cr_loss=9.426e-15, over 944034.00 frames. 2024-09-14 02:44:05,789 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 02:44:16,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=62823.5, ans=0.2 2024-09-14 02:44:37,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=62880.166666666664, ans=0.125 2024-09-14 02:45:00,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=62908.5, ans=0.0 2024-09-14 02:45:03,268 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:45:06,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=62936.833333333336, ans=0.125 2024-09-14 02:45:09,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=62936.833333333336, ans=0.04949747468305833 2024-09-14 02:45:11,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=22.5 2024-09-14 02:45:21,065 INFO [train.py:1198] (0/2) Epoch 4, batch 3050, loss[loss=0.2941, ctc_loss=0.2214, cr_loss=0.3637, over 21055.00 frames. ], tot_loss[loss=0.3285, ctc_loss=0.244, cr_loss=0.423, over 4087091.89 frames. ], batch size: 53, lr: 1.97e-02, grad_scale: 16.0 2024-09-14 02:45:25,558 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.274e+02 2.525e+02 2.965e+02 5.292e+02, threshold=5.050e+02, percent-clipped=2.0 2024-09-14 02:46:14,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=63050.166666666664, ans=0.125 2024-09-14 02:46:23,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=63078.5, ans=0.125 2024-09-14 02:46:36,836 INFO [train.py:1198] (0/2) Epoch 4, batch 3100, loss[loss=0.3302, ctc_loss=0.2481, cr_loss=0.4108, over 20728.00 frames. ], tot_loss[loss=0.3277, ctc_loss=0.2433, cr_loss=0.422, over 4092573.11 frames. ], batch size: 71, lr: 1.97e-02, grad_scale: 16.0 2024-09-14 02:47:02,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=63135.166666666664, ans=0.125 2024-09-14 02:47:17,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=63163.5, ans=0.125 2024-09-14 02:47:29,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=63191.833333333336, ans=0.0 2024-09-14 02:47:37,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=63220.166666666664, ans=0.125 2024-09-14 02:47:51,401 INFO [train.py:1198] (0/2) Epoch 4, batch 3150, loss[loss=0.3044, ctc_loss=0.2241, cr_loss=0.4011, over 21065.00 frames. ], tot_loss[loss=0.3291, ctc_loss=0.2444, cr_loss=0.4231, over 4081947.86 frames. ], batch size: 56, lr: 1.97e-02, grad_scale: 16.0 2024-09-14 02:47:55,990 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.359e+02 2.728e+02 3.358e+02 9.340e+02, threshold=5.456e+02, percent-clipped=4.0 2024-09-14 02:48:03,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=63248.5, ans=0.0 2024-09-14 02:49:06,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63361.833333333336, ans=0.1 2024-09-14 02:49:10,237 INFO [train.py:1198] (0/2) Epoch 4, batch 3200, loss[loss=0.3267, ctc_loss=0.2412, cr_loss=0.4273, over 20634.00 frames. ], tot_loss[loss=0.3278, ctc_loss=0.2433, cr_loss=0.4222, over 4085375.65 frames. ], batch size: 71, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:49:36,968 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=22.5 2024-09-14 02:49:51,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=63446.833333333336, ans=0.2 2024-09-14 02:49:55,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.16 vs. limit=15.0 2024-09-14 02:49:59,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=63475.166666666664, ans=0.025 2024-09-14 02:50:00,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=63475.166666666664, ans=0.0 2024-09-14 02:50:23,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-14 02:50:28,863 INFO [train.py:1198] (0/2) Epoch 4, batch 3250, loss[loss=0.3567, ctc_loss=0.2637, cr_loss=0.4647, over 20691.00 frames. ], tot_loss[loss=0.3289, ctc_loss=0.2442, cr_loss=0.4237, over 4087100.40 frames. ], batch size: 71, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:50:29,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=63531.833333333336, ans=0.125 2024-09-14 02:50:33,527 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.173e+02 2.516e+02 3.106e+02 4.227e+02, threshold=5.031e+02, percent-clipped=0.0 2024-09-14 02:50:54,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=63560.166666666664, ans=0.04949747468305833 2024-09-14 02:50:57,955 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-09-14 02:51:20,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=63616.833333333336, ans=0.0 2024-09-14 02:51:33,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=63645.166666666664, ans=0.0 2024-09-14 02:51:43,725 INFO [train.py:1198] (0/2) Epoch 4, batch 3300, loss[loss=0.2831, ctc_loss=0.2036, cr_loss=0.3976, over 21011.00 frames. ], tot_loss[loss=0.3276, ctc_loss=0.2432, cr_loss=0.4223, over 4091646.49 frames. ], batch size: 51, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:51:49,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=63673.5, ans=0.125 2024-09-14 02:52:58,361 INFO [train.py:1198] (0/2) Epoch 4, batch 3350, loss[loss=0.3598, ctc_loss=0.2709, cr_loss=0.4448, over 20742.00 frames. ], tot_loss[loss=0.3285, ctc_loss=0.2439, cr_loss=0.4231, over 4089338.18 frames. ], batch size: 71, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:53:02,891 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.280e+02 2.587e+02 3.253e+02 5.558e+02, threshold=5.175e+02, percent-clipped=1.0 2024-09-14 02:53:09,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=63815.166666666664, ans=0.125 2024-09-14 02:53:47,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=63900.166666666664, ans=0.125 2024-09-14 02:54:12,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=63928.5, ans=0.125 2024-09-14 02:54:14,596 INFO [train.py:1198] (0/2) Epoch 4, batch 3400, loss[loss=0.3389, ctc_loss=0.2554, cr_loss=0.4175, over 21032.00 frames. ], tot_loss[loss=0.3284, ctc_loss=0.2438, cr_loss=0.4229, over 4083136.03 frames. ], batch size: 62, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:54:22,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=63956.833333333336, ans=0.0 2024-09-14 02:55:00,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=64013.5, ans=0.2 2024-09-14 02:55:09,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=64041.833333333336, ans=0.125 2024-09-14 02:55:10,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64041.833333333336, ans=0.1 2024-09-14 02:55:36,989 INFO [train.py:1198] (0/2) Epoch 4, batch 3450, loss[loss=0.2861, ctc_loss=0.2095, cr_loss=0.3827, over 19848.00 frames. ], tot_loss[loss=0.3271, ctc_loss=0.2429, cr_loss=0.4214, over 4078405.68 frames. ], batch size: 44, lr: 1.96e-02, grad_scale: 32.0 2024-09-14 02:55:41,572 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.364e+02 2.636e+02 3.041e+02 4.515e+02, threshold=5.273e+02, percent-clipped=1.0 2024-09-14 02:56:18,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=64155.166666666664, ans=0.125 2024-09-14 02:56:39,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64211.833333333336, ans=0.1 2024-09-14 02:56:53,117 INFO [train.py:1198] (0/2) Epoch 4, batch 3500, loss[loss=0.3396, ctc_loss=0.2505, cr_loss=0.4454, over 20979.00 frames. ], tot_loss[loss=0.3281, ctc_loss=0.2437, cr_loss=0.4223, over 4070042.98 frames. ], batch size: 55, lr: 1.95e-02, grad_scale: 32.0 2024-09-14 02:56:59,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=64240.166666666664, ans=0.015 2024-09-14 02:57:14,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=64268.5, ans=0.0 2024-09-14 02:57:30,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=64296.833333333336, ans=0.04949747468305833 2024-09-14 02:57:50,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=64325.166666666664, ans=0.025 2024-09-14 02:58:08,590 INFO [train.py:1198] (0/2) Epoch 4, batch 3550, loss[loss=0.3334, ctc_loss=0.2532, cr_loss=0.4009, over 20266.00 frames. ], tot_loss[loss=0.3271, ctc_loss=0.2428, cr_loss=0.4216, over 4073072.77 frames. ], batch size: 74, lr: 1.95e-02, grad_scale: 32.0 2024-09-14 02:58:11,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=64381.833333333336, ans=0.2 2024-09-14 02:58:12,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=64381.833333333336, ans=0.125 2024-09-14 02:58:13,128 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.378e+02 2.692e+02 3.412e+02 6.201e+02, threshold=5.385e+02, percent-clipped=4.0 2024-09-14 02:58:33,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.07 vs. limit=22.5 2024-09-14 02:58:38,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=64438.5, ans=0.125 2024-09-14 02:59:22,631 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 02:59:23,712 INFO [train.py:1198] (0/2) Epoch 4, batch 3600, loss[loss=0.3464, ctc_loss=0.2561, cr_loss=0.4512, over 20969.00 frames. ], tot_loss[loss=0.3246, ctc_loss=0.2407, cr_loss=0.4195, over 4093585.31 frames. ], batch size: 67, lr: 1.95e-02, grad_scale: 32.0 2024-09-14 02:59:27,428 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-09-14 02:59:28,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=64523.5, ans=0.2 2024-09-14 02:59:37,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=64551.833333333336, ans=0.125 2024-09-14 02:59:46,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=64551.833333333336, ans=0.0 2024-09-14 02:59:57,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=64580.166666666664, ans=0.05 2024-09-14 03:00:01,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=64580.166666666664, ans=0.09899494936611666 2024-09-14 03:00:42,223 INFO [train.py:1198] (0/2) Epoch 4, batch 3650, loss[loss=0.327, ctc_loss=0.2423, cr_loss=0.4232, over 21075.00 frames. ], tot_loss[loss=0.325, ctc_loss=0.241, cr_loss=0.4203, over 4096821.55 frames. ], batch size: 59, lr: 1.95e-02, grad_scale: 16.0 2024-09-14 03:00:45,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=64665.166666666664, ans=0.125 2024-09-14 03:00:47,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=64665.166666666664, ans=0.125 2024-09-14 03:00:48,412 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.333e+02 2.690e+02 3.120e+02 5.506e+02, threshold=5.380e+02, percent-clipped=1.0 2024-09-14 03:00:48,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=64665.166666666664, ans=0.125 2024-09-14 03:01:03,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64693.5, ans=0.1 2024-09-14 03:01:10,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2024-09-14 03:01:17,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=64721.833333333336, ans=0.0 2024-09-14 03:01:33,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=64750.166666666664, ans=0.0 2024-09-14 03:01:38,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=64750.166666666664, ans=0.125 2024-09-14 03:02:00,524 INFO [train.py:1198] (0/2) Epoch 4, batch 3700, loss[loss=0.2963, ctc_loss=0.2189, cr_loss=0.3868, over 20781.00 frames. ], tot_loss[loss=0.3257, ctc_loss=0.2417, cr_loss=0.4202, over 4079200.04 frames. ], batch size: 56, lr: 1.95e-02, grad_scale: 16.0 2024-09-14 03:02:13,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=64806.833333333336, ans=0.2 2024-09-14 03:02:23,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=64835.166666666664, ans=0.125 2024-09-14 03:02:27,069 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-09-14 03:02:29,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=64863.5, ans=0.0 2024-09-14 03:02:44,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=64891.833333333336, ans=0.0 2024-09-14 03:02:53,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=64891.833333333336, ans=0.0 2024-09-14 03:03:01,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=64920.166666666664, ans=0.2 2024-09-14 03:03:16,162 INFO [train.py:1198] (0/2) Epoch 4, batch 3750, loss[loss=0.2717, ctc_loss=0.2002, cr_loss=0.3574, over 20951.00 frames. ], tot_loss[loss=0.324, ctc_loss=0.2402, cr_loss=0.4189, over 4079904.09 frames. ], batch size: 50, lr: 1.94e-02, grad_scale: 16.0 2024-09-14 03:03:22,046 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.387e+02 2.649e+02 3.255e+02 5.060e+02, threshold=5.297e+02, percent-clipped=0.0 2024-09-14 03:03:24,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=64948.5, ans=0.125 2024-09-14 03:03:33,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=64976.833333333336, ans=0.0 2024-09-14 03:03:52,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=65005.166666666664, ans=0.025 2024-09-14 03:04:25,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=65061.833333333336, ans=0.2 2024-09-14 03:04:32,221 INFO [train.py:1198] (0/2) Epoch 4, batch 3800, loss[loss=0.3471, ctc_loss=0.2603, cr_loss=0.434, over 20895.00 frames. ], tot_loss[loss=0.3252, ctc_loss=0.241, cr_loss=0.4207, over 4090422.25 frames. ], batch size: 65, lr: 1.94e-02, grad_scale: 16.0 2024-09-14 03:04:35,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=65090.166666666664, ans=0.125 2024-09-14 03:04:47,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=65118.5, ans=0.0 2024-09-14 03:05:13,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=65146.833333333336, ans=0.0 2024-09-14 03:05:47,565 INFO [train.py:1198] (0/2) Epoch 4, batch 3850, loss[loss=0.3226, ctc_loss=0.2418, cr_loss=0.4037, over 21073.00 frames. ], tot_loss[loss=0.3255, ctc_loss=0.2412, cr_loss=0.4216, over 4094194.29 frames. ], batch size: 59, lr: 1.94e-02, grad_scale: 16.0 2024-09-14 03:05:56,566 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.256e+02 2.420e+02 2.840e+02 5.630e+02, threshold=4.839e+02, percent-clipped=1.0 2024-09-14 03:05:56,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=65231.833333333336, ans=0.125 2024-09-14 03:06:20,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=65288.5, ans=22.5 2024-09-14 03:06:46,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=65316.833333333336, ans=0.0 2024-09-14 03:07:03,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=65345.166666666664, ans=0.125 2024-09-14 03:07:09,417 INFO [train.py:1198] (0/2) Epoch 4, batch 3900, loss[loss=0.3064, ctc_loss=0.221, cr_loss=0.4269, over 21013.00 frames. ], tot_loss[loss=0.3238, ctc_loss=0.2398, cr_loss=0.4203, over 4096131.53 frames. ], batch size: 61, lr: 1.94e-02, grad_scale: 16.0 2024-09-14 03:07:12,801 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-09-14 03:07:52,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65458.5, ans=0.1 2024-09-14 03:08:15,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=65486.833333333336, ans=0.0 2024-09-14 03:08:24,137 INFO [train.py:1198] (0/2) Epoch 4, batch 3950, loss[loss=0.3464, ctc_loss=0.2576, cr_loss=0.4443, over 20780.00 frames. ], tot_loss[loss=0.3238, ctc_loss=0.2397, cr_loss=0.4204, over 4103237.27 frames. ], batch size: 53, lr: 1.94e-02, grad_scale: 16.0 2024-09-14 03:08:30,036 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.393e+02 2.693e+02 3.193e+02 4.940e+02, threshold=5.386e+02, percent-clipped=2.0 2024-09-14 03:08:40,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=65543.5, ans=0.125 2024-09-14 03:09:24,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=65628.5, ans=0.1 2024-09-14 03:09:27,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=65628.5, ans=15.0 2024-09-14 03:09:39,030 INFO [train.py:1198] (0/2) Epoch 4, batch 4000, loss[loss=0.3084, ctc_loss=0.2236, cr_loss=0.4242, over 21011.00 frames. ], tot_loss[loss=0.3231, ctc_loss=0.2391, cr_loss=0.4197, over 4116584.44 frames. ], batch size: 63, lr: 1.93e-02, grad_scale: 32.0 2024-09-14 03:09:57,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=65685.16666666667, ans=0.125 2024-09-14 03:09:57,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=65685.16666666667, ans=0.125 2024-09-14 03:10:36,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=65741.83333333333, ans=0.05 2024-09-14 03:10:53,985 INFO [train.py:1198] (0/2) Epoch 4, batch 4050, loss[loss=0.2859, ctc_loss=0.2103, cr_loss=0.3782, over 20952.00 frames. ], tot_loss[loss=0.3232, ctc_loss=0.2393, cr_loss=0.4197, over 4116293.93 frames. ], batch size: 48, lr: 1.93e-02, grad_scale: 32.0 2024-09-14 03:10:58,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65798.5, ans=0.1 2024-09-14 03:10:59,955 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.255e+02 2.573e+02 3.023e+02 5.009e+02, threshold=5.147e+02, percent-clipped=0.0 2024-09-14 03:12:11,781 INFO [train.py:1198] (0/2) Epoch 4, batch 4100, loss[loss=0.3671, ctc_loss=0.2747, cr_loss=0.4621, over 21034.00 frames. ], tot_loss[loss=0.326, ctc_loss=0.2417, cr_loss=0.4215, over 4091648.00 frames. ], batch size: 62, lr: 1.93e-02, grad_scale: 32.0 2024-09-14 03:12:18,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=65940.16666666667, ans=0.125 2024-09-14 03:12:21,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=65940.16666666667, ans=0.125 2024-09-14 03:12:31,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=65968.5, ans=0.0 2024-09-14 03:12:46,909 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:13:06,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66025.16666666667, ans=0.1 2024-09-14 03:13:29,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=66081.83333333333, ans=0.025 2024-09-14 03:13:30,342 INFO [train.py:1198] (0/2) Epoch 4, batch 4150, loss[loss=0.33, ctc_loss=0.2456, cr_loss=0.4219, over 20828.00 frames. ], tot_loss[loss=0.3254, ctc_loss=0.241, cr_loss=0.4219, over 4093736.16 frames. ], batch size: 59, lr: 1.93e-02, grad_scale: 32.0 2024-09-14 03:13:36,283 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.323e+02 2.608e+02 3.200e+02 5.078e+02, threshold=5.216e+02, percent-clipped=0.0 2024-09-14 03:13:36,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66081.83333333333, ans=0.1 2024-09-14 03:14:09,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=66138.5, ans=0.125 2024-09-14 03:14:12,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=66138.5, ans=0.125 2024-09-14 03:14:29,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=66195.16666666667, ans=0.125 2024-09-14 03:14:45,289 INFO [train.py:1198] (0/2) Epoch 4, batch 4200, loss[loss=0.332, ctc_loss=0.2436, cr_loss=0.4417, over 21081.00 frames. ], tot_loss[loss=0.3252, ctc_loss=0.2408, cr_loss=0.4215, over 4100374.04 frames. ], batch size: 56, lr: 1.93e-02, grad_scale: 32.0 2024-09-14 03:15:01,027 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-14 03:15:04,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.46 vs. limit=22.5 2024-09-14 03:15:25,069 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-09-14 03:15:26,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=66280.16666666667, ans=0.0 2024-09-14 03:15:53,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=66336.83333333333, ans=0.125 2024-09-14 03:16:00,376 INFO [train.py:1198] (0/2) Epoch 4, batch 4250, loss[loss=0.3115, ctc_loss=0.2271, cr_loss=0.4219, over 20684.00 frames. ], tot_loss[loss=0.3264, ctc_loss=0.2419, cr_loss=0.4224, over 4090161.85 frames. ], batch size: 71, lr: 1.92e-02, grad_scale: 32.0 2024-09-14 03:16:06,444 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.390e+02 2.675e+02 3.034e+02 4.916e+02, threshold=5.351e+02, percent-clipped=0.0 2024-09-14 03:16:11,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=66365.16666666667, ans=0.0 2024-09-14 03:16:18,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=66393.5, ans=0.0 2024-09-14 03:16:26,649 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-09-14 03:17:01,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=66478.5, ans=0.2 2024-09-14 03:17:16,188 INFO [train.py:1198] (0/2) Epoch 4, batch 4300, loss[loss=0.3018, ctc_loss=0.2265, cr_loss=0.3762, over 21054.00 frames. ], tot_loss[loss=0.3243, ctc_loss=0.2402, cr_loss=0.4204, over 4102030.44 frames. ], batch size: 53, lr: 1.92e-02, grad_scale: 16.0 2024-09-14 03:17:55,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66563.5, ans=0.1 2024-09-14 03:18:03,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=66591.83333333333, ans=0.0 2024-09-14 03:18:21,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=66620.16666666667, ans=0.2 2024-09-14 03:18:37,296 INFO [train.py:1198] (0/2) Epoch 4, batch 4350, loss[loss=0.3678, ctc_loss=0.273, cr_loss=0.4741, over 20966.00 frames. ], tot_loss[loss=0.324, ctc_loss=0.2398, cr_loss=0.4208, over 4107154.84 frames. ], batch size: 64, lr: 1.92e-02, grad_scale: 16.0 2024-09-14 03:18:43,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66648.5, ans=0.1 2024-09-14 03:18:44,908 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.281e+02 2.581e+02 3.059e+02 5.942e+02, threshold=5.161e+02, percent-clipped=2.0 2024-09-14 03:19:12,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-09-14 03:19:15,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=66705.16666666667, ans=0.125 2024-09-14 03:19:16,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.27 vs. limit=5.0 2024-09-14 03:19:22,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=66733.5, ans=0.125 2024-09-14 03:19:24,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=66733.5, ans=0.0 2024-09-14 03:19:24,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=22.5 2024-09-14 03:19:52,475 INFO [train.py:1198] (0/2) Epoch 4, batch 4400, loss[loss=0.375, ctc_loss=0.2751, cr_loss=0.4995, over 19452.00 frames. ], tot_loss[loss=0.3264, ctc_loss=0.2419, cr_loss=0.4227, over 4091197.58 frames. ], batch size: 90, lr: 1.92e-02, grad_scale: 32.0 2024-09-14 03:20:09,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=22.5 2024-09-14 03:20:12,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=66818.5, ans=0.125 2024-09-14 03:20:59,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2024-09-14 03:21:08,163 INFO [train.py:1198] (0/2) Epoch 4, batch 4450, loss[loss=0.3611, ctc_loss=0.2755, cr_loss=0.4279, over 20655.00 frames. ], tot_loss[loss=0.3267, ctc_loss=0.2419, cr_loss=0.4236, over 4088107.80 frames. ], batch size: 66, lr: 1.92e-02, grad_scale: 32.0 2024-09-14 03:21:15,633 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.305e+02 2.603e+02 3.007e+02 4.983e+02, threshold=5.207e+02, percent-clipped=0.0 2024-09-14 03:21:23,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=66960.16666666667, ans=0.0 2024-09-14 03:21:26,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=66960.16666666667, ans=0.125 2024-09-14 03:22:01,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=67016.83333333333, ans=0.125 2024-09-14 03:22:23,553 INFO [train.py:1198] (0/2) Epoch 4, batch 4500, loss[loss=0.309, ctc_loss=0.2283, cr_loss=0.4031, over 20954.00 frames. ], tot_loss[loss=0.3253, ctc_loss=0.2409, cr_loss=0.4218, over 4094184.70 frames. ], batch size: 52, lr: 1.92e-02, grad_scale: 32.0 2024-09-14 03:22:37,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=67101.83333333333, ans=0.0 2024-09-14 03:22:58,981 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.48 vs. limit=6.0 2024-09-14 03:23:05,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67130.16666666667, ans=0.1 2024-09-14 03:23:28,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=67186.83333333333, ans=0.025 2024-09-14 03:23:41,615 INFO [train.py:1198] (0/2) Epoch 4, batch 4550, loss[loss=0.3116, ctc_loss=0.2249, cr_loss=0.4336, over 20901.00 frames. ], tot_loss[loss=0.3261, ctc_loss=0.2416, cr_loss=0.4224, over 4093816.00 frames. ], batch size: 54, lr: 1.91e-02, grad_scale: 32.0 2024-09-14 03:23:44,853 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:23:49,208 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.333e+02 2.572e+02 3.102e+02 4.754e+02, threshold=5.145e+02, percent-clipped=0.0 2024-09-14 03:23:50,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=15.0 2024-09-14 03:24:00,547 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=12.0 2024-09-14 03:24:04,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-09-14 03:24:45,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.88 vs. limit=22.5 2024-09-14 03:24:59,856 INFO [train.py:1198] (0/2) Epoch 4, batch 4600, loss[loss=0.3578, ctc_loss=0.2736, cr_loss=0.4209, over 20954.00 frames. ], tot_loss[loss=0.327, ctc_loss=0.2424, cr_loss=0.4229, over 4089480.42 frames. ], batch size: 64, lr: 1.91e-02, grad_scale: 32.0 2024-09-14 03:25:04,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=67356.83333333333, ans=0.0 2024-09-14 03:25:10,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=67356.83333333333, ans=0.0 2024-09-14 03:25:15,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=67385.16666666667, ans=0.0 2024-09-14 03:25:17,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=67385.16666666667, ans=0.125 2024-09-14 03:25:20,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2024-09-14 03:26:02,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=67470.16666666667, ans=0.125 2024-09-14 03:26:13,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-14 03:26:16,005 INFO [train.py:1198] (0/2) Epoch 4, batch 4650, loss[loss=0.3059, ctc_loss=0.221, cr_loss=0.4247, over 20987.00 frames. ], tot_loss[loss=0.3272, ctc_loss=0.2424, cr_loss=0.4242, over 4094416.14 frames. ], batch size: 51, lr: 1.91e-02, grad_scale: 32.0 2024-09-14 03:26:23,629 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.466e+02 2.933e+02 3.344e+02 5.732e+02, threshold=5.866e+02, percent-clipped=1.0 2024-09-14 03:26:25,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=67498.5, ans=0.125 2024-09-14 03:26:49,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=67555.16666666667, ans=0.0 2024-09-14 03:27:30,428 INFO [train.py:1198] (0/2) Epoch 4, batch 4700, loss[loss=0.2971, ctc_loss=0.222, cr_loss=0.3751, over 21049.00 frames. ], tot_loss[loss=0.3284, ctc_loss=0.2434, cr_loss=0.4252, over 4088137.46 frames. ], batch size: 56, lr: 1.91e-02, grad_scale: 32.0 2024-09-14 03:27:39,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=67640.16666666667, ans=0.035 2024-09-14 03:27:57,677 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:28:37,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2024-09-14 03:28:47,853 INFO [train.py:1198] (0/2) Epoch 4, batch 4750, loss[loss=0.3461, ctc_loss=0.2565, cr_loss=0.4479, over 20662.00 frames. ], tot_loss[loss=0.3267, ctc_loss=0.2418, cr_loss=0.4245, over 4102058.91 frames. ], batch size: 68, lr: 1.91e-02, grad_scale: 32.0 2024-09-14 03:28:48,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=67781.83333333333, ans=0.0 2024-09-14 03:28:55,498 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.253e+02 2.459e+02 2.716e+02 5.028e+02, threshold=4.917e+02, percent-clipped=0.0 2024-09-14 03:29:33,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=67838.5, ans=0.125 2024-09-14 03:29:33,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=67838.5, ans=0.125 2024-09-14 03:29:45,769 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:30:06,674 INFO [train.py:1198] (0/2) Epoch 4, batch 4800, loss[loss=0.2864, ctc_loss=0.2118, cr_loss=0.3729, over 20993.00 frames. ], tot_loss[loss=0.3267, ctc_loss=0.2419, cr_loss=0.4243, over 4098549.71 frames. ], batch size: 48, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:30:25,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=67951.83333333333, ans=0.125 2024-09-14 03:30:36,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2024-09-14 03:30:38,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=67980.16666666667, ans=0.125 2024-09-14 03:30:45,654 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-24000.pt 2024-09-14 03:30:47,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=67980.16666666667, ans=12.0 2024-09-14 03:30:54,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=68008.5, ans=0.125 2024-09-14 03:31:12,801 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=22.5 2024-09-14 03:31:19,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=68036.83333333333, ans=0.0 2024-09-14 03:31:22,581 INFO [train.py:1198] (0/2) Epoch 4, batch 4850, loss[loss=0.2781, ctc_loss=0.2016, cr_loss=0.3823, over 20986.00 frames. ], tot_loss[loss=0.3285, ctc_loss=0.2436, cr_loss=0.4245, over 4062223.13 frames. ], batch size: 51, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:31:30,317 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.309e+02 2.649e+02 3.134e+02 7.070e+02, threshold=5.298e+02, percent-clipped=3.0 2024-09-14 03:31:38,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=68093.5, ans=0.2 2024-09-14 03:31:59,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=68121.83333333333, ans=0.125 2024-09-14 03:31:59,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2024-09-14 03:32:02,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=68121.83333333333, ans=0.0 2024-09-14 03:32:09,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=68150.16666666667, ans=0.015 2024-09-14 03:32:09,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=68150.16666666667, ans=0.125 2024-09-14 03:32:27,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=68178.5, ans=0.125 2024-09-14 03:32:37,459 INFO [train.py:1198] (0/2) Epoch 4, batch 4900, loss[loss=0.3397, ctc_loss=0.2513, cr_loss=0.4422, over 21020.00 frames. ], tot_loss[loss=0.3282, ctc_loss=0.2433, cr_loss=0.4245, over 4070356.15 frames. ], batch size: 61, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:32:43,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=68206.83333333333, ans=0.0 2024-09-14 03:32:45,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=68206.83333333333, ans=0.125 2024-09-14 03:32:58,996 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:33:06,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=68263.5, ans=0.2 2024-09-14 03:33:28,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=68291.83333333333, ans=0.125 2024-09-14 03:33:40,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=68320.16666666667, ans=0.1 2024-09-14 03:33:51,970 INFO [train.py:1198] (0/2) Epoch 4, batch 4950, loss[loss=0.3229, ctc_loss=0.2373, cr_loss=0.428, over 20969.00 frames. ], tot_loss[loss=0.326, ctc_loss=0.2414, cr_loss=0.4231, over 4084709.66 frames. ], batch size: 58, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:33:59,336 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.275e+02 2.605e+02 3.048e+02 5.653e+02, threshold=5.210e+02, percent-clipped=2.0 2024-09-14 03:34:13,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68376.83333333333, ans=0.1 2024-09-14 03:34:35,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=68433.5, ans=15.0 2024-09-14 03:34:57,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=68461.83333333333, ans=0.025 2024-09-14 03:35:00,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=68461.83333333333, ans=0.125 2024-09-14 03:35:06,069 INFO [train.py:1198] (0/2) Epoch 4, batch 5000, loss[loss=0.3026, ctc_loss=0.2173, cr_loss=0.4263, over 20974.00 frames. ], tot_loss[loss=0.3255, ctc_loss=0.2411, cr_loss=0.4221, over 4093084.18 frames. ], batch size: 51, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:35:10,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=68490.16666666667, ans=0.025 2024-09-14 03:35:15,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=68490.16666666667, ans=0.0 2024-09-14 03:35:22,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2024-09-14 03:35:24,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=68518.5, ans=0.2 2024-09-14 03:35:30,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=68518.5, ans=0.0 2024-09-14 03:36:00,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=68575.16666666667, ans=0.125 2024-09-14 03:36:03,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=68575.16666666667, ans=0.125 2024-09-14 03:36:17,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=68603.5, ans=0.0 2024-09-14 03:36:23,736 INFO [train.py:1198] (0/2) Epoch 4, batch 5050, loss[loss=0.4223, ctc_loss=0.3305, cr_loss=0.459, over 15073.00 frames. ], tot_loss[loss=0.3246, ctc_loss=0.2405, cr_loss=0.4205, over 4081333.78 frames. ], batch size: 150, lr: 1.90e-02, grad_scale: 32.0 2024-09-14 03:36:27,493 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2024-09-14 03:36:31,184 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.248e+02 2.535e+02 2.857e+02 5.181e+02, threshold=5.071e+02, percent-clipped=0.0 2024-09-14 03:36:46,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-09-14 03:36:56,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=68688.5, ans=0.0 2024-09-14 03:37:00,880 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:37:32,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=68745.16666666667, ans=0.125 2024-09-14 03:37:38,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=68773.5, ans=0.1 2024-09-14 03:37:40,027 INFO [train.py:1198] (0/2) Epoch 4, batch 5100, loss[loss=0.3191, ctc_loss=0.2387, cr_loss=0.402, over 21066.00 frames. ], tot_loss[loss=0.3261, ctc_loss=0.2418, cr_loss=0.4216, over 4072270.29 frames. ], batch size: 56, lr: 1.89e-02, grad_scale: 32.0 2024-09-14 03:37:43,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=68773.5, ans=0.025 2024-09-14 03:37:44,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68773.5, ans=0.1 2024-09-14 03:37:50,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=68773.5, ans=0.04949747468305833 2024-09-14 03:38:39,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=22.5 2024-09-14 03:38:54,932 INFO [train.py:1198] (0/2) Epoch 4, batch 5150, loss[loss=0.3165, ctc_loss=0.235, cr_loss=0.4078, over 21024.00 frames. ], tot_loss[loss=0.3241, ctc_loss=0.24, cr_loss=0.4205, over 4082584.66 frames. ], batch size: 63, lr: 1.89e-02, grad_scale: 32.0 2024-09-14 03:38:58,151 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:39:02,217 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.227e+02 2.530e+02 2.864e+02 5.813e+02, threshold=5.059e+02, percent-clipped=5.0 2024-09-14 03:39:09,373 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.58 vs. limit=5.0 2024-09-14 03:39:31,048 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 03:39:36,383 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=8.0 2024-09-14 03:39:46,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2024-09-14 03:40:09,578 INFO [train.py:1198] (0/2) Epoch 4, batch 5200, loss[loss=0.3527, ctc_loss=0.2641, cr_loss=0.4431, over 19457.00 frames. ], tot_loss[loss=0.3255, ctc_loss=0.2413, cr_loss=0.4213, over 4068710.88 frames. ], batch size: 90, lr: 1.89e-02, grad_scale: 32.0 2024-09-14 03:40:48,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=69113.5, ans=0.2 2024-09-14 03:40:48,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=69113.5, ans=0.07 2024-09-14 03:40:56,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=12.0 2024-09-14 03:40:58,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=69141.83333333333, ans=0.025 2024-09-14 03:41:23,612 INFO [train.py:1198] (0/2) Epoch 4, batch 5250, loss[loss=0.34, ctc_loss=0.2512, cr_loss=0.4437, over 20853.00 frames. ], tot_loss[loss=0.3262, ctc_loss=0.2418, cr_loss=0.4218, over 4060308.72 frames. ], batch size: 57, lr: 1.89e-02, grad_scale: 32.0 2024-09-14 03:41:29,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69198.5, ans=0.1 2024-09-14 03:41:30,985 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.230e+02 2.655e+02 3.088e+02 4.121e+02, threshold=5.310e+02, percent-clipped=0.0 2024-09-14 03:42:30,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=69311.83333333333, ans=0.125 2024-09-14 03:42:31,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-09-14 03:42:37,937 INFO [train.py:1198] (0/2) Epoch 4, batch 5300, loss[loss=0.3071, ctc_loss=0.2207, cr_loss=0.4321, over 20970.00 frames. ], tot_loss[loss=0.3267, ctc_loss=0.2423, cr_loss=0.422, over 4059265.15 frames. ], batch size: 50, lr: 1.89e-02, grad_scale: 32.0 2024-09-14 03:43:22,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=69425.16666666667, ans=0.2 2024-09-14 03:43:27,801 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.42 vs. limit=22.5 2024-09-14 03:43:52,049 INFO [train.py:1198] (0/2) Epoch 4, batch 5350, loss[loss=0.3329, ctc_loss=0.2489, cr_loss=0.4203, over 20155.00 frames. ], tot_loss[loss=0.3251, ctc_loss=0.241, cr_loss=0.421, over 4067851.22 frames. ], batch size: 80, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:43:53,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=69481.83333333333, ans=0.125 2024-09-14 03:43:59,478 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.320e+02 2.675e+02 3.518e+02 6.585e+02, threshold=5.349e+02, percent-clipped=4.0 2024-09-14 03:44:50,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69595.16666666667, ans=0.1 2024-09-14 03:45:06,741 INFO [train.py:1198] (0/2) Epoch 4, batch 5400, loss[loss=0.2906, ctc_loss=0.2134, cr_loss=0.3859, over 20975.00 frames. ], tot_loss[loss=0.324, ctc_loss=0.2399, cr_loss=0.4205, over 4081326.34 frames. ], batch size: 55, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:46:23,202 INFO [train.py:1198] (0/2) Epoch 4, batch 5450, loss[loss=0.306, ctc_loss=0.2235, cr_loss=0.4126, over 20978.00 frames. ], tot_loss[loss=0.3246, ctc_loss=0.2404, cr_loss=0.4214, over 4075145.23 frames. ], batch size: 51, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:46:30,427 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.222e+02 2.469e+02 3.026e+02 4.253e+02, threshold=4.938e+02, percent-clipped=0.0 2024-09-14 03:46:36,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-09-14 03:46:40,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=26.11 vs. limit=22.5 2024-09-14 03:47:19,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=69850.16666666667, ans=0.0 2024-09-14 03:47:31,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=69878.5, ans=0.125 2024-09-14 03:47:39,831 INFO [train.py:1198] (0/2) Epoch 4, batch 5500, loss[loss=0.3169, ctc_loss=0.234, cr_loss=0.4146, over 20965.00 frames. ], tot_loss[loss=0.322, ctc_loss=0.2381, cr_loss=0.4194, over 4086914.45 frames. ], batch size: 64, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:47:49,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-09-14 03:48:09,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=69963.5, ans=0.125 2024-09-14 03:48:20,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=22.5 2024-09-14 03:48:27,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2024-09-14 03:48:30,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=69991.83333333333, ans=0.1 2024-09-14 03:48:36,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=69991.83333333333, ans=0.125 2024-09-14 03:48:42,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=70020.16666666667, ans=0.0 2024-09-14 03:48:44,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-14 03:48:45,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=70020.16666666667, ans=0.05 2024-09-14 03:48:53,801 INFO [train.py:1198] (0/2) Epoch 4, batch 5550, loss[loss=0.3184, ctc_loss=0.2333, cr_loss=0.4259, over 21039.00 frames. ], tot_loss[loss=0.3219, ctc_loss=0.2379, cr_loss=0.42, over 4086455.99 frames. ], batch size: 62, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:49:01,161 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.248e+02 2.611e+02 2.984e+02 5.188e+02, threshold=5.222e+02, percent-clipped=1.0 2024-09-14 03:49:41,643 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-09-14 03:50:07,826 INFO [train.py:1198] (0/2) Epoch 4, batch 5600, loss[loss=0.3411, ctc_loss=0.2504, cr_loss=0.4537, over 20708.00 frames. ], tot_loss[loss=0.3224, ctc_loss=0.2385, cr_loss=0.4198, over 4082916.67 frames. ], batch size: 71, lr: 1.88e-02, grad_scale: 32.0 2024-09-14 03:50:28,457 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.95 vs. limit=10.0 2024-09-14 03:50:29,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=70218.5, ans=0.125 2024-09-14 03:50:50,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=70275.16666666667, ans=0.1 2024-09-14 03:51:21,160 INFO [train.py:1198] (0/2) Epoch 4, batch 5650, loss[loss=0.3518, ctc_loss=0.2644, cr_loss=0.437, over 20938.00 frames. ], tot_loss[loss=0.3221, ctc_loss=0.238, cr_loss=0.4205, over 4093319.55 frames. ], batch size: 60, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:51:28,569 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.235e+02 2.459e+02 3.074e+02 5.356e+02, threshold=4.919e+02, percent-clipped=2.0 2024-09-14 03:52:05,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=70416.83333333333, ans=0.2 2024-09-14 03:52:09,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-14 03:52:24,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=70445.16666666667, ans=0.125 2024-09-14 03:52:34,648 INFO [train.py:1198] (0/2) Epoch 4, batch 5700, loss[loss=0.4057, ctc_loss=0.3039, cr_loss=0.5085, over 18370.00 frames. ], tot_loss[loss=0.3239, ctc_loss=0.2394, cr_loss=0.4226, over 4077686.85 frames. ], batch size: 108, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:52:42,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=70473.5, ans=0.125 2024-09-14 03:53:13,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.79 vs. limit=15.0 2024-09-14 03:53:36,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=70586.83333333333, ans=0.125 2024-09-14 03:53:49,038 INFO [train.py:1198] (0/2) Epoch 4, batch 5750, loss[loss=0.2915, ctc_loss=0.2097, cr_loss=0.4089, over 20968.00 frames. ], tot_loss[loss=0.323, ctc_loss=0.2386, cr_loss=0.4221, over 4074516.13 frames. ], batch size: 50, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:53:56,510 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.235e+02 2.475e+02 2.905e+02 4.217e+02, threshold=4.950e+02, percent-clipped=0.0 2024-09-14 03:54:04,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=70643.5, ans=0.125 2024-09-14 03:54:07,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=70643.5, ans=0.0 2024-09-14 03:55:05,354 INFO [train.py:1198] (0/2) Epoch 4, batch 5800, loss[loss=0.3736, ctc_loss=0.2897, cr_loss=0.4196, over 18276.00 frames. ], tot_loss[loss=0.3235, ctc_loss=0.2391, cr_loss=0.4222, over 4077378.30 frames. ], batch size: 108, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:55:11,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=70756.83333333333, ans=0.0 2024-09-14 03:55:22,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=70785.16666666667, ans=0.0 2024-09-14 03:56:01,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=70841.83333333333, ans=0.125 2024-09-14 03:56:20,725 INFO [train.py:1198] (0/2) Epoch 4, batch 5850, loss[loss=0.3026, ctc_loss=0.2196, cr_loss=0.4149, over 20972.00 frames. ], tot_loss[loss=0.3233, ctc_loss=0.2388, cr_loss=0.4223, over 4080196.15 frames. ], batch size: 55, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:56:28,201 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.290e+02 2.515e+02 2.921e+02 4.898e+02, threshold=5.030e+02, percent-clipped=0.0 2024-09-14 03:56:30,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=70898.5, ans=0.0 2024-09-14 03:56:33,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=70898.5, ans=0.025 2024-09-14 03:56:40,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=70926.83333333333, ans=0.125 2024-09-14 03:56:47,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=70926.83333333333, ans=0.0 2024-09-14 03:57:13,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=70983.5, ans=0.125 2024-09-14 03:57:16,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=70983.5, ans=0.125 2024-09-14 03:57:35,161 INFO [train.py:1198] (0/2) Epoch 4, batch 5900, loss[loss=0.3158, ctc_loss=0.2323, cr_loss=0.4178, over 20972.00 frames. ], tot_loss[loss=0.3224, ctc_loss=0.2381, cr_loss=0.4215, over 4081787.79 frames. ], batch size: 58, lr: 1.87e-02, grad_scale: 32.0 2024-09-14 03:57:43,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2024-09-14 03:57:51,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=71068.5, ans=0.2 2024-09-14 03:57:53,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=71068.5, ans=0.0 2024-09-14 03:58:07,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=71096.83333333333, ans=0.125 2024-09-14 03:58:26,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=71125.16666666667, ans=0.2 2024-09-14 03:58:44,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-09-14 03:58:49,229 INFO [train.py:1198] (0/2) Epoch 4, batch 5950, loss[loss=0.2832, ctc_loss=0.2081, cr_loss=0.3753, over 20976.00 frames. ], tot_loss[loss=0.3207, ctc_loss=0.2368, cr_loss=0.4193, over 4089214.14 frames. ], batch size: 50, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 03:58:56,540 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.272e+02 2.604e+02 3.252e+02 5.278e+02, threshold=5.208e+02, percent-clipped=1.0 2024-09-14 03:59:36,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=71266.83333333333, ans=0.0 2024-09-14 03:59:45,071 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=22.5 2024-09-14 03:59:45,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=71266.83333333333, ans=0.125 2024-09-14 04:00:03,191 INFO [train.py:1198] (0/2) Epoch 4, batch 6000, loss[loss=0.2867, ctc_loss=0.204, cr_loss=0.4134, over 20947.00 frames. ], tot_loss[loss=0.3211, ctc_loss=0.2372, cr_loss=0.4195, over 4072896.43 frames. ], batch size: 49, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 04:00:03,192 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 04:00:18,789 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.0713, 5.7312, 5.4081, 5.4318], device='cuda:0') 2024-09-14 04:00:29,488 INFO [train.py:1230] (0/2) Epoch 4, validation: loss=0.07445, ctc_loss=0.07445, cr_loss=9.516e-15, over 944034.00 frames. 2024-09-14 04:00:29,488 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 04:00:31,498 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-09-14 04:00:37,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=12.0 2024-09-14 04:01:05,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=71380.16666666667, ans=0.2 2024-09-14 04:01:36,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.59 vs. limit=6.0 2024-09-14 04:01:44,545 INFO [train.py:1198] (0/2) Epoch 4, batch 6050, loss[loss=0.3353, ctc_loss=0.2449, cr_loss=0.4522, over 20959.00 frames. ], tot_loss[loss=0.3215, ctc_loss=0.2376, cr_loss=0.4197, over 4067332.83 frames. ], batch size: 64, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 04:01:52,069 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.211e+02 2.724e+02 3.340e+02 6.745e+02, threshold=5.449e+02, percent-clipped=2.0 2024-09-14 04:02:10,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=71493.5, ans=0.125 2024-09-14 04:02:50,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=71578.5, ans=0.125 2024-09-14 04:03:00,131 INFO [train.py:1198] (0/2) Epoch 4, batch 6100, loss[loss=0.343, ctc_loss=0.2536, cr_loss=0.4473, over 20837.00 frames. ], tot_loss[loss=0.3218, ctc_loss=0.2379, cr_loss=0.4196, over 4068629.48 frames. ], batch size: 59, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 04:03:02,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-09-14 04:03:18,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=71635.16666666667, ans=0.125 2024-09-14 04:03:47,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-09-14 04:03:57,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=71691.83333333333, ans=0.125 2024-09-14 04:04:06,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=12.0 2024-09-14 04:04:09,095 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.42 vs. limit=22.5 2024-09-14 04:04:12,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=71720.16666666667, ans=0.125 2024-09-14 04:04:15,613 INFO [train.py:1198] (0/2) Epoch 4, batch 6150, loss[loss=0.2957, ctc_loss=0.2156, cr_loss=0.4006, over 21059.00 frames. ], tot_loss[loss=0.3228, ctc_loss=0.2387, cr_loss=0.4204, over 4058771.22 frames. ], batch size: 56, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 04:04:19,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=15.0 2024-09-14 04:04:22,988 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.314e+02 2.579e+02 3.111e+02 7.091e+02, threshold=5.159e+02, percent-clipped=1.0 2024-09-14 04:05:06,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=71833.5, ans=0.0 2024-09-14 04:05:29,381 INFO [train.py:1198] (0/2) Epoch 4, batch 6200, loss[loss=0.3682, ctc_loss=0.2756, cr_loss=0.4632, over 20328.00 frames. ], tot_loss[loss=0.3262, ctc_loss=0.2417, cr_loss=0.4226, over 4028659.41 frames. ], batch size: 74, lr: 1.86e-02, grad_scale: 32.0 2024-09-14 04:05:31,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2024-09-14 04:05:36,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71890.16666666667, ans=0.1 2024-09-14 04:05:42,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=71918.5, ans=0.125 2024-09-14 04:05:51,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2024-09-14 04:06:13,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71975.16666666667, ans=0.1 2024-09-14 04:06:42,361 INFO [train.py:1198] (0/2) Epoch 4, batch 6250, loss[loss=0.3725, ctc_loss=0.2836, cr_loss=0.4448, over 18443.00 frames. ], tot_loss[loss=0.3246, ctc_loss=0.2406, cr_loss=0.42, over 4008821.56 frames. ], batch size: 108, lr: 1.85e-02, grad_scale: 32.0 2024-09-14 04:06:49,806 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.237e+02 2.467e+02 3.015e+02 4.414e+02, threshold=4.934e+02, percent-clipped=0.0 2024-09-14 04:06:50,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=72031.83333333333, ans=0.2 2024-09-14 04:06:53,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=72031.83333333333, ans=0.125 2024-09-14 04:06:59,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=72060.16666666667, ans=0.0 2024-09-14 04:07:14,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=72088.5, ans=0.025 2024-09-14 04:07:34,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=72116.83333333333, ans=0.125 2024-09-14 04:07:56,933 INFO [train.py:1198] (0/2) Epoch 4, batch 6300, loss[loss=0.2872, ctc_loss=0.2077, cr_loss=0.3973, over 19902.00 frames. ], tot_loss[loss=0.3259, ctc_loss=0.2418, cr_loss=0.4207, over 3979094.63 frames. ], batch size: 44, lr: 1.85e-02, grad_scale: 32.0 2024-09-14 04:07:57,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=72173.5, ans=0.125 2024-09-14 04:08:08,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=72173.5, ans=0.025 2024-09-14 04:08:38,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=72258.5, ans=0.0 2024-09-14 04:08:49,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-09-14 04:09:08,327 INFO [train.py:1198] (0/2) Epoch 4, batch 6350, loss[loss=0.411, ctc_loss=0.3172, cr_loss=0.469, over 14115.00 frames. ], tot_loss[loss=0.3298, ctc_loss=0.2455, cr_loss=0.4215, over 3884652.57 frames. ], batch size: 149, lr: 1.85e-02, grad_scale: 32.0 2024-09-14 04:09:09,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=22.5 2024-09-14 04:09:17,051 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.398e+02 2.903e+02 3.475e+02 4.595e+02, threshold=5.806e+02, percent-clipped=0.0 2024-09-14 04:09:25,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=6.0 2024-09-14 04:10:05,272 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-4.pt 2024-09-14 04:10:54,889 INFO [train.py:1198] (0/2) Epoch 5, batch 0, loss[loss=0.3579, ctc_loss=0.2672, cr_loss=0.4537, over 20668.00 frames. ], tot_loss[loss=0.3579, ctc_loss=0.2672, cr_loss=0.4537, over 20668.00 frames. ], batch size: 66, lr: 1.72e-02, grad_scale: 32.0 2024-09-14 04:10:54,891 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 04:11:13,197 INFO [train.py:1230] (0/2) Epoch 5, validation: loss=0.07931, ctc_loss=0.07931, cr_loss=9.897e-15, over 944034.00 frames. 2024-09-14 04:11:13,198 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 04:11:25,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=72428.5, ans=0.2 2024-09-14 04:11:43,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=72485.16666666667, ans=0.0 2024-09-14 04:12:29,246 INFO [train.py:1198] (0/2) Epoch 5, batch 50, loss[loss=0.3299, ctc_loss=0.2427, cr_loss=0.4362, over 21009.00 frames. ], tot_loss[loss=0.3225, ctc_loss=0.2376, cr_loss=0.4246, over 928402.90 frames. ], batch size: 63, lr: 1.72e-02, grad_scale: 32.0 2024-09-14 04:12:31,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=72570.16666666667, ans=0.0 2024-09-14 04:12:53,487 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.284e+02 2.523e+02 2.865e+02 3.597e+02, threshold=5.047e+02, percent-clipped=0.0 2024-09-14 04:13:33,908 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-09-14 04:13:45,082 INFO [train.py:1198] (0/2) Epoch 5, batch 100, loss[loss=0.342, ctc_loss=0.2545, cr_loss=0.4374, over 21046.00 frames. ], tot_loss[loss=0.3229, ctc_loss=0.238, cr_loss=0.4247, over 1626797.77 frames. ], batch size: 56, lr: 1.72e-02, grad_scale: 32.0 2024-09-14 04:13:48,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=72711.83333333333, ans=0.125 2024-09-14 04:14:13,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=72768.5, ans=0.2 2024-09-14 04:14:21,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72768.5, ans=0.1 2024-09-14 04:14:31,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=72796.83333333333, ans=0.04949747468305833 2024-09-14 04:14:39,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72796.83333333333, ans=0.1 2024-09-14 04:14:40,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72796.83333333333, ans=0.1 2024-09-14 04:15:00,152 INFO [train.py:1198] (0/2) Epoch 5, batch 150, loss[loss=0.3107, ctc_loss=0.2303, cr_loss=0.4023, over 20972.00 frames. ], tot_loss[loss=0.3217, ctc_loss=0.2371, cr_loss=0.4231, over 2179668.05 frames. ], batch size: 49, lr: 1.72e-02, grad_scale: 32.0 2024-09-14 04:15:27,264 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.192e+02 2.483e+02 2.953e+02 5.023e+02, threshold=4.965e+02, percent-clipped=0.0 2024-09-14 04:15:29,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=72881.83333333333, ans=0.0 2024-09-14 04:15:30,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=72881.83333333333, ans=0.2 2024-09-14 04:15:50,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=72938.5, ans=0.125 2024-09-14 04:15:57,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=72938.5, ans=0.09899494936611666 2024-09-14 04:16:21,968 INFO [train.py:1198] (0/2) Epoch 5, batch 200, loss[loss=0.3381, ctc_loss=0.2583, cr_loss=0.3986, over 20016.00 frames. ], tot_loss[loss=0.3187, ctc_loss=0.2349, cr_loss=0.419, over 2599548.11 frames. ], batch size: 80, lr: 1.72e-02, grad_scale: 32.0 2024-09-14 04:16:46,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=73023.5, ans=0.125 2024-09-14 04:16:52,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=73051.83333333333, ans=0.125 2024-09-14 04:17:29,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=73108.5, ans=0.125 2024-09-14 04:17:37,730 INFO [train.py:1198] (0/2) Epoch 5, batch 250, loss[loss=0.2885, ctc_loss=0.2113, cr_loss=0.3858, over 20949.00 frames. ], tot_loss[loss=0.3169, ctc_loss=0.2332, cr_loss=0.4186, over 2939891.26 frames. ], batch size: 60, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:17:50,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-09-14 04:18:01,527 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.225e+02 2.506e+02 3.171e+02 4.742e+02, threshold=5.012e+02, percent-clipped=0.0 2024-09-14 04:18:24,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=73221.83333333333, ans=0.025 2024-09-14 04:18:38,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-09-14 04:18:39,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73250.16666666667, ans=0.1 2024-09-14 04:18:43,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=73250.16666666667, ans=0.125 2024-09-14 04:18:52,774 INFO [train.py:1198] (0/2) Epoch 5, batch 300, loss[loss=0.2855, ctc_loss=0.2059, cr_loss=0.3984, over 20993.00 frames. ], tot_loss[loss=0.3171, ctc_loss=0.2333, cr_loss=0.4189, over 3198988.80 frames. ], batch size: 48, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:19:23,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=73335.16666666667, ans=0.125 2024-09-14 04:20:08,538 INFO [train.py:1198] (0/2) Epoch 5, batch 350, loss[loss=0.318, ctc_loss=0.2316, cr_loss=0.4321, over 20642.00 frames. ], tot_loss[loss=0.3176, ctc_loss=0.2336, cr_loss=0.4202, over 3400571.04 frames. ], batch size: 71, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:20:13,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=73420.16666666667, ans=0.0 2024-09-14 04:20:32,599 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.196e+02 2.581e+02 3.294e+02 5.338e+02, threshold=5.162e+02, percent-clipped=1.0 2024-09-14 04:20:35,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=73448.5, ans=0.0 2024-09-14 04:20:39,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=73476.83333333333, ans=0.025 2024-09-14 04:21:25,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-09-14 04:21:27,644 INFO [train.py:1198] (0/2) Epoch 5, batch 400, loss[loss=0.2958, ctc_loss=0.2205, cr_loss=0.3766, over 20784.00 frames. ], tot_loss[loss=0.3162, ctc_loss=0.2321, cr_loss=0.4202, over 3570257.17 frames. ], batch size: 53, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:21:34,151 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 04:22:11,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=73618.5, ans=0.125 2024-09-14 04:22:34,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=73675.16666666667, ans=0.125 2024-09-14 04:22:46,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73703.5, ans=0.1 2024-09-14 04:22:47,215 INFO [train.py:1198] (0/2) Epoch 5, batch 450, loss[loss=0.31, ctc_loss=0.2241, cr_loss=0.4296, over 20928.00 frames. ], tot_loss[loss=0.3169, ctc_loss=0.2329, cr_loss=0.4202, over 3684184.25 frames. ], batch size: 60, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:22:57,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=73703.5, ans=0.125 2024-09-14 04:22:58,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2024-09-14 04:23:06,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=73731.83333333333, ans=0.0 2024-09-14 04:23:11,099 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.264e+02 2.692e+02 3.227e+02 5.067e+02, threshold=5.384e+02, percent-clipped=0.0 2024-09-14 04:23:12,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=73731.83333333333, ans=0.125 2024-09-14 04:23:42,943 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 04:23:44,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=73788.5, ans=0.0 2024-09-14 04:24:02,018 INFO [train.py:1198] (0/2) Epoch 5, batch 500, loss[loss=0.3205, ctc_loss=0.2385, cr_loss=0.4103, over 20797.00 frames. ], tot_loss[loss=0.3163, ctc_loss=0.2324, cr_loss=0.4199, over 3781101.31 frames. ], batch size: 53, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:24:26,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=73873.5, ans=0.125 2024-09-14 04:24:38,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=73901.83333333333, ans=0.0 2024-09-14 04:25:13,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=73958.5, ans=0.125 2024-09-14 04:25:17,153 INFO [train.py:1198] (0/2) Epoch 5, batch 550, loss[loss=0.3531, ctc_loss=0.266, cr_loss=0.4357, over 18255.00 frames. ], tot_loss[loss=0.3177, ctc_loss=0.2336, cr_loss=0.4207, over 3844960.15 frames. ], batch size: 108, lr: 1.71e-02, grad_scale: 32.0 2024-09-14 04:25:22,034 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 04:25:41,081 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.265e+02 2.549e+02 3.068e+02 5.622e+02, threshold=5.099e+02, percent-clipped=1.0 2024-09-14 04:25:44,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=74015.16666666667, ans=0.125 2024-09-14 04:26:32,306 INFO [train.py:1198] (0/2) Epoch 5, batch 600, loss[loss=0.392, ctc_loss=0.2924, cr_loss=0.498, over 18275.00 frames. ], tot_loss[loss=0.3171, ctc_loss=0.2331, cr_loss=0.4204, over 3900189.98 frames. ], batch size: 108, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:26:42,295 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2024-09-14 04:26:54,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.15 vs. limit=10.0 2024-09-14 04:27:05,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=74185.16666666667, ans=0.0 2024-09-14 04:27:23,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=74213.5, ans=0.2 2024-09-14 04:27:32,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=74213.5, ans=0.125 2024-09-14 04:27:54,118 INFO [train.py:1198] (0/2) Epoch 5, batch 650, loss[loss=0.2829, ctc_loss=0.2045, cr_loss=0.3917, over 20884.00 frames. ], tot_loss[loss=0.3165, ctc_loss=0.2327, cr_loss=0.4192, over 3939508.48 frames. ], batch size: 57, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:28:18,508 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.273e+02 2.558e+02 3.138e+02 5.126e+02, threshold=5.115e+02, percent-clipped=1.0 2024-09-14 04:28:52,119 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 04:28:56,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=74383.5, ans=0.125 2024-09-14 04:29:07,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=74383.5, ans=0.125 2024-09-14 04:29:09,706 INFO [train.py:1198] (0/2) Epoch 5, batch 700, loss[loss=0.322, ctc_loss=0.2344, cr_loss=0.4379, over 20984.00 frames. ], tot_loss[loss=0.3172, ctc_loss=0.2332, cr_loss=0.42, over 3975853.65 frames. ], batch size: 58, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:29:13,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=74411.83333333333, ans=0.125 2024-09-14 04:30:08,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=74525.16666666667, ans=0.125 2024-09-14 04:30:25,104 INFO [train.py:1198] (0/2) Epoch 5, batch 750, loss[loss=0.3058, ctc_loss=0.2206, cr_loss=0.4264, over 20877.00 frames. ], tot_loss[loss=0.3161, ctc_loss=0.2322, cr_loss=0.4194, over 4015322.21 frames. ], batch size: 57, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:30:49,517 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.270e+02 2.626e+02 3.050e+02 4.773e+02, threshold=5.253e+02, percent-clipped=0.0 2024-09-14 04:30:56,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=74610.16666666667, ans=10.0 2024-09-14 04:31:21,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=74638.5, ans=0.2 2024-09-14 04:31:41,074 INFO [train.py:1198] (0/2) Epoch 5, batch 800, loss[loss=0.3336, ctc_loss=0.2463, cr_loss=0.4367, over 20638.00 frames. ], tot_loss[loss=0.3156, ctc_loss=0.2319, cr_loss=0.4188, over 4028912.38 frames. ], batch size: 71, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:32:23,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2024-09-14 04:32:59,684 INFO [train.py:1198] (0/2) Epoch 5, batch 850, loss[loss=0.3122, ctc_loss=0.2316, cr_loss=0.4029, over 21014.00 frames. ], tot_loss[loss=0.3148, ctc_loss=0.2312, cr_loss=0.4178, over 4046967.17 frames. ], batch size: 63, lr: 1.70e-02, grad_scale: 32.0 2024-09-14 04:33:18,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74865.16666666667, ans=0.1 2024-09-14 04:33:24,077 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.232e+02 2.575e+02 3.092e+02 5.064e+02, threshold=5.150e+02, percent-clipped=0.0 2024-09-14 04:33:52,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-09-14 04:34:11,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=74950.16666666667, ans=0.025 2024-09-14 04:34:18,947 INFO [train.py:1198] (0/2) Epoch 5, batch 900, loss[loss=0.3089, ctc_loss=0.2296, cr_loss=0.3965, over 20933.00 frames. ], tot_loss[loss=0.3142, ctc_loss=0.2307, cr_loss=0.4171, over 4063137.34 frames. ], batch size: 60, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:34:56,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-14 04:35:01,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=75035.16666666667, ans=0.1 2024-09-14 04:35:02,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=22.5 2024-09-14 04:35:32,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-09-14 04:35:34,870 INFO [train.py:1198] (0/2) Epoch 5, batch 950, loss[loss=0.3267, ctc_loss=0.2401, cr_loss=0.4326, over 21052.00 frames. ], tot_loss[loss=0.3144, ctc_loss=0.2308, cr_loss=0.4179, over 4075096.94 frames. ], batch size: 62, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:35:59,004 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.259e+02 2.505e+02 2.950e+02 5.700e+02, threshold=5.011e+02, percent-clipped=1.0 2024-09-14 04:36:28,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=75205.16666666667, ans=0.0 2024-09-14 04:36:28,141 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 04:36:41,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=75233.5, ans=0.125 2024-09-14 04:36:50,484 INFO [train.py:1198] (0/2) Epoch 5, batch 1000, loss[loss=0.3245, ctc_loss=0.2414, cr_loss=0.4154, over 20675.00 frames. ], tot_loss[loss=0.314, ctc_loss=0.2305, cr_loss=0.4174, over 4089840.77 frames. ], batch size: 71, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:36:50,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=75261.83333333333, ans=0.025 2024-09-14 04:37:14,994 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-09-14 04:37:25,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=75318.5, ans=0.125 2024-09-14 04:37:28,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=75318.5, ans=0.04949747468305833 2024-09-14 04:37:45,933 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=15.0 2024-09-14 04:37:55,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=75375.16666666667, ans=0.125 2024-09-14 04:38:06,095 INFO [train.py:1198] (0/2) Epoch 5, batch 1050, loss[loss=0.2853, ctc_loss=0.2071, cr_loss=0.3913, over 20963.00 frames. ], tot_loss[loss=0.312, ctc_loss=0.2288, cr_loss=0.4158, over 4108271.92 frames. ], batch size: 51, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:38:24,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=75431.83333333333, ans=0.125 2024-09-14 04:38:33,307 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.273e+02 2.577e+02 3.329e+02 5.088e+02, threshold=5.154e+02, percent-clipped=1.0 2024-09-14 04:39:21,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=75516.83333333333, ans=0.125 2024-09-14 04:39:28,345 INFO [train.py:1198] (0/2) Epoch 5, batch 1100, loss[loss=0.3193, ctc_loss=0.2342, cr_loss=0.4251, over 21034.00 frames. ], tot_loss[loss=0.3111, ctc_loss=0.2281, cr_loss=0.4149, over 4115534.63 frames. ], batch size: 63, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:39:30,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=75545.16666666667, ans=0.125 2024-09-14 04:39:44,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.88 vs. limit=10.0 2024-09-14 04:39:51,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=75573.5, ans=0.09899494936611666 2024-09-14 04:40:05,776 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-09-14 04:40:21,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=75630.16666666667, ans=0.1 2024-09-14 04:40:43,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2024-09-14 04:40:43,849 INFO [train.py:1198] (0/2) Epoch 5, batch 1150, loss[loss=0.2537, ctc_loss=0.178, cr_loss=0.3785, over 20977.00 frames. ], tot_loss[loss=0.3113, ctc_loss=0.2281, cr_loss=0.416, over 4119372.98 frames. ], batch size: 49, lr: 1.69e-02, grad_scale: 16.0 2024-09-14 04:40:57,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=75715.16666666667, ans=0.0 2024-09-14 04:41:09,142 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.141e+02 2.363e+02 2.618e+02 3.853e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-14 04:41:11,384 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2024-09-14 04:41:33,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=75771.83333333333, ans=0.1 2024-09-14 04:41:35,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=75771.83333333333, ans=0.0 2024-09-14 04:41:44,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=75800.16666666667, ans=0.125 2024-09-14 04:41:50,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=75800.16666666667, ans=0.035 2024-09-14 04:41:59,494 INFO [train.py:1198] (0/2) Epoch 5, batch 1200, loss[loss=0.339, ctc_loss=0.2534, cr_loss=0.4283, over 20992.00 frames. ], tot_loss[loss=0.3123, ctc_loss=0.2288, cr_loss=0.4172, over 4127502.77 frames. ], batch size: 61, lr: 1.69e-02, grad_scale: 32.0 2024-09-14 04:41:59,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=75828.5, ans=0.0 2024-09-14 04:42:21,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=15.0 2024-09-14 04:42:33,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=15.0 2024-09-14 04:43:10,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=75941.83333333333, ans=0.125 2024-09-14 04:43:15,887 INFO [train.py:1198] (0/2) Epoch 5, batch 1250, loss[loss=0.3191, ctc_loss=0.2379, cr_loss=0.4062, over 20958.00 frames. ], tot_loss[loss=0.3143, ctc_loss=0.2305, cr_loss=0.4189, over 4113717.46 frames. ], batch size: 64, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:43:25,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=75970.16666666667, ans=0.0 2024-09-14 04:43:39,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=75998.5, ans=0.0 2024-09-14 04:43:41,665 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.260e+02 2.462e+02 3.135e+02 5.192e+02, threshold=4.924e+02, percent-clipped=1.0 2024-09-14 04:43:48,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=76026.83333333333, ans=0.125 2024-09-14 04:43:53,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=76026.83333333333, ans=0.125 2024-09-14 04:44:34,575 INFO [train.py:1198] (0/2) Epoch 5, batch 1300, loss[loss=0.3152, ctc_loss=0.2324, cr_loss=0.4138, over 20939.00 frames. ], tot_loss[loss=0.314, ctc_loss=0.2302, cr_loss=0.4192, over 4119849.71 frames. ], batch size: 60, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:44:43,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=76111.83333333333, ans=0.2 2024-09-14 04:45:11,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=76168.5, ans=0.125 2024-09-14 04:45:20,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76168.5, ans=0.1 2024-09-14 04:45:48,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=76225.16666666667, ans=0.125 2024-09-14 04:45:53,314 INFO [train.py:1198] (0/2) Epoch 5, batch 1350, loss[loss=0.2853, ctc_loss=0.2081, cr_loss=0.3863, over 21031.00 frames. ], tot_loss[loss=0.3132, ctc_loss=0.2295, cr_loss=0.4188, over 4125189.14 frames. ], batch size: 52, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:46:02,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=76253.5, ans=0.07 2024-09-14 04:46:05,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=76253.5, ans=0.1 2024-09-14 04:46:13,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=76281.83333333333, ans=0.2 2024-09-14 04:46:19,187 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.260e+02 2.463e+02 3.053e+02 4.334e+02, threshold=4.926e+02, percent-clipped=0.0 2024-09-14 04:46:59,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2024-09-14 04:47:09,347 INFO [train.py:1198] (0/2) Epoch 5, batch 1400, loss[loss=0.2966, ctc_loss=0.2181, cr_loss=0.3927, over 20924.00 frames. ], tot_loss[loss=0.3129, ctc_loss=0.2292, cr_loss=0.4186, over 4132784.53 frames. ], batch size: 50, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:47:11,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=76395.16666666667, ans=0.125 2024-09-14 04:48:24,980 INFO [train.py:1198] (0/2) Epoch 5, batch 1450, loss[loss=0.3335, ctc_loss=0.2464, cr_loss=0.4357, over 20657.00 frames. ], tot_loss[loss=0.3154, ctc_loss=0.2313, cr_loss=0.4205, over 4119171.89 frames. ], batch size: 68, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:48:28,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=76536.83333333333, ans=0.0 2024-09-14 04:48:37,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-09-14 04:48:50,796 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.187e+02 2.432e+02 2.673e+02 4.140e+02, threshold=4.864e+02, percent-clipped=0.0 2024-09-14 04:49:03,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=76593.5, ans=0.2 2024-09-14 04:49:37,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=76650.16666666667, ans=0.0 2024-09-14 04:49:40,522 INFO [train.py:1198] (0/2) Epoch 5, batch 1500, loss[loss=0.3249, ctc_loss=0.2419, cr_loss=0.4155, over 20250.00 frames. ], tot_loss[loss=0.3139, ctc_loss=0.2302, cr_loss=0.4186, over 4101615.24 frames. ], batch size: 74, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:49:45,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76678.5, ans=0.1 2024-09-14 04:50:05,147 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2024-09-14 04:50:08,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2024-09-14 04:50:24,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=76735.16666666667, ans=0.0 2024-09-14 04:50:30,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=76763.5, ans=0.125 2024-09-14 04:50:54,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=22.5 2024-09-14 04:51:01,551 INFO [train.py:1198] (0/2) Epoch 5, batch 1550, loss[loss=0.3406, ctc_loss=0.2506, cr_loss=0.4498, over 20977.00 frames. ], tot_loss[loss=0.3137, ctc_loss=0.2301, cr_loss=0.4183, over 4109527.44 frames. ], batch size: 58, lr: 1.68e-02, grad_scale: 32.0 2024-09-14 04:51:26,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-09-14 04:51:27,440 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.199e+02 2.388e+02 2.846e+02 4.682e+02, threshold=4.776e+02, percent-clipped=0.0 2024-09-14 04:51:44,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=76876.83333333333, ans=0.95 2024-09-14 04:51:48,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=76905.16666666667, ans=0.035 2024-09-14 04:51:52,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=12.0 2024-09-14 04:52:17,482 INFO [train.py:1198] (0/2) Epoch 5, batch 1600, loss[loss=0.2968, ctc_loss=0.2171, cr_loss=0.3985, over 20893.00 frames. ], tot_loss[loss=0.3147, ctc_loss=0.2308, cr_loss=0.4196, over 4113179.70 frames. ], batch size: 54, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:52:25,487 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 04:52:34,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=76990.16666666667, ans=0.125 2024-09-14 04:53:09,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77046.83333333333, ans=0.1 2024-09-14 04:53:26,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=77075.16666666667, ans=0.0 2024-09-14 04:53:33,709 INFO [train.py:1198] (0/2) Epoch 5, batch 1650, loss[loss=0.3209, ctc_loss=0.2333, cr_loss=0.4376, over 20221.00 frames. ], tot_loss[loss=0.3144, ctc_loss=0.2306, cr_loss=0.4188, over 4097790.33 frames. ], batch size: 74, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:53:40,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.39 vs. limit=10.0 2024-09-14 04:53:59,053 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.205e+02 2.413e+02 2.920e+02 5.498e+02, threshold=4.827e+02, percent-clipped=2.0 2024-09-14 04:53:59,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.32 vs. limit=6.0 2024-09-14 04:54:38,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=77216.83333333333, ans=0.125 2024-09-14 04:54:44,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=77216.83333333333, ans=0.0 2024-09-14 04:54:48,690 INFO [train.py:1198] (0/2) Epoch 5, batch 1700, loss[loss=0.3243, ctc_loss=0.24, cr_loss=0.4213, over 20952.00 frames. ], tot_loss[loss=0.3144, ctc_loss=0.2305, cr_loss=0.4197, over 4110379.72 frames. ], batch size: 58, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:54:53,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=77245.16666666667, ans=0.5 2024-09-14 04:55:10,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77273.5, ans=0.1 2024-09-14 04:55:16,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=77273.5, ans=0.125 2024-09-14 04:55:19,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2024-09-14 04:55:38,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=77330.16666666667, ans=0.125 2024-09-14 04:55:52,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=77358.5, ans=0.125 2024-09-14 04:56:00,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=77358.5, ans=0.125 2024-09-14 04:56:06,783 INFO [train.py:1198] (0/2) Epoch 5, batch 1750, loss[loss=0.3258, ctc_loss=0.2392, cr_loss=0.4329, over 20934.00 frames. ], tot_loss[loss=0.3141, ctc_loss=0.2303, cr_loss=0.4187, over 4117098.64 frames. ], batch size: 60, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:56:13,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=77386.83333333333, ans=0.2 2024-09-14 04:56:35,764 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.241e+02 2.627e+02 3.126e+02 7.012e+02, threshold=5.255e+02, percent-clipped=4.0 2024-09-14 04:56:42,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=22.5 2024-09-14 04:57:05,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=77471.83333333333, ans=0.05 2024-09-14 04:57:18,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=77500.16666666667, ans=0.125 2024-09-14 04:57:19,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=77500.16666666667, ans=0.0 2024-09-14 04:57:23,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=77528.5, ans=0.125 2024-09-14 04:57:24,959 INFO [train.py:1198] (0/2) Epoch 5, batch 1800, loss[loss=0.3403, ctc_loss=0.2526, cr_loss=0.4383, over 20871.00 frames. ], tot_loss[loss=0.3153, ctc_loss=0.2313, cr_loss=0.4199, over 4109532.40 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:58:27,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2024-09-14 04:58:32,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-09-14 04:58:40,620 INFO [train.py:1198] (0/2) Epoch 5, batch 1850, loss[loss=0.3072, ctc_loss=0.2282, cr_loss=0.3947, over 21031.00 frames. ], tot_loss[loss=0.3124, ctc_loss=0.2291, cr_loss=0.4168, over 4105109.78 frames. ], batch size: 61, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 04:59:06,058 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.206e+02 2.459e+02 2.688e+02 5.866e+02, threshold=4.917e+02, percent-clipped=1.0 2024-09-14 04:59:18,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=77726.83333333333, ans=0.0 2024-09-14 04:59:33,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-14 04:59:55,215 INFO [train.py:1198] (0/2) Epoch 5, batch 1900, loss[loss=0.2467, ctc_loss=0.1711, cr_loss=0.3776, over 19928.00 frames. ], tot_loss[loss=0.3132, ctc_loss=0.2297, cr_loss=0.4173, over 4091224.74 frames. ], batch size: 44, lr: 1.67e-02, grad_scale: 32.0 2024-09-14 05:00:24,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=77868.5, ans=0.125 2024-09-14 05:00:27,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77868.5, ans=0.1 2024-09-14 05:00:45,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77896.83333333333, ans=0.1 2024-09-14 05:01:09,920 INFO [train.py:1198] (0/2) Epoch 5, batch 1950, loss[loss=0.3055, ctc_loss=0.2263, cr_loss=0.3964, over 20885.00 frames. ], tot_loss[loss=0.3132, ctc_loss=0.2298, cr_loss=0.4169, over 4083312.61 frames. ], batch size: 54, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:01:11,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=77953.5, ans=0.025 2024-09-14 05:01:15,091 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-09-14 05:01:34,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77981.83333333333, ans=0.1 2024-09-14 05:01:35,773 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.272e+02 2.658e+02 3.216e+02 5.160e+02, threshold=5.317e+02, percent-clipped=2.0 2024-09-14 05:01:43,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=78010.16666666667, ans=0.125 2024-09-14 05:01:55,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78010.16666666667, ans=0.1 2024-09-14 05:02:31,860 INFO [train.py:1198] (0/2) Epoch 5, batch 2000, loss[loss=0.3294, ctc_loss=0.2446, cr_loss=0.424, over 21028.00 frames. ], tot_loss[loss=0.3127, ctc_loss=0.2294, cr_loss=0.4167, over 4087502.69 frames. ], batch size: 63, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:02:35,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=78095.16666666667, ans=0.0 2024-09-14 05:02:57,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=78123.5, ans=0.125 2024-09-14 05:03:01,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=78151.83333333333, ans=0.025 2024-09-14 05:03:07,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=78151.83333333333, ans=0.125 2024-09-14 05:03:26,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=78180.16666666667, ans=0.0 2024-09-14 05:03:41,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=78208.5, ans=0.2 2024-09-14 05:03:41,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=78208.5, ans=0.2 2024-09-14 05:03:47,021 INFO [train.py:1198] (0/2) Epoch 5, batch 2050, loss[loss=0.3297, ctc_loss=0.2391, cr_loss=0.4533, over 20294.00 frames. ], tot_loss[loss=0.312, ctc_loss=0.2285, cr_loss=0.4171, over 4093730.14 frames. ], batch size: 74, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:03:57,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=78236.83333333333, ans=0.125 2024-09-14 05:03:59,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=78236.83333333333, ans=0.0 2024-09-14 05:04:12,352 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.217e+02 2.439e+02 3.083e+02 6.067e+02, threshold=4.878e+02, percent-clipped=1.0 2024-09-14 05:04:12,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=78265.16666666667, ans=0.0 2024-09-14 05:04:24,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=78293.5, ans=0.0 2024-09-14 05:04:29,951 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.81 vs. limit=10.0 2024-09-14 05:04:35,695 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-09-14 05:04:41,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2024-09-14 05:04:42,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78321.83333333333, ans=0.1 2024-09-14 05:05:02,150 INFO [train.py:1198] (0/2) Epoch 5, batch 2100, loss[loss=0.2975, ctc_loss=0.2169, cr_loss=0.4031, over 20963.00 frames. ], tot_loss[loss=0.3133, ctc_loss=0.2297, cr_loss=0.418, over 4092294.55 frames. ], batch size: 52, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:05:13,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78378.5, ans=0.1 2024-09-14 05:05:19,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-14 05:06:03,721 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=22.5 2024-09-14 05:06:04,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78491.83333333333, ans=0.1 2024-09-14 05:06:13,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=12.0 2024-09-14 05:06:17,906 INFO [train.py:1198] (0/2) Epoch 5, batch 2150, loss[loss=0.2963, ctc_loss=0.2135, cr_loss=0.4141, over 21066.00 frames. ], tot_loss[loss=0.3117, ctc_loss=0.2284, cr_loss=0.4167, over 4102806.87 frames. ], batch size: 53, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:06:28,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78520.16666666667, ans=0.1 2024-09-14 05:06:44,135 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.182e+02 2.428e+02 2.938e+02 5.310e+02, threshold=4.856e+02, percent-clipped=1.0 2024-09-14 05:07:16,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78605.16666666667, ans=0.1 2024-09-14 05:07:36,830 INFO [train.py:1198] (0/2) Epoch 5, batch 2200, loss[loss=0.2831, ctc_loss=0.199, cr_loss=0.4203, over 20899.00 frames. ], tot_loss[loss=0.3121, ctc_loss=0.2287, cr_loss=0.4174, over 4104894.34 frames. ], batch size: 54, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:07:38,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78661.83333333333, ans=0.1 2024-09-14 05:07:49,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=78661.83333333333, ans=0.125 2024-09-14 05:08:09,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=78718.5, ans=0.0 2024-09-14 05:08:23,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=78746.83333333333, ans=0.125 2024-09-14 05:08:39,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78775.16666666667, ans=0.1 2024-09-14 05:08:49,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=78775.16666666667, ans=0.0 2024-09-14 05:08:55,432 INFO [train.py:1198] (0/2) Epoch 5, batch 2250, loss[loss=0.3263, ctc_loss=0.2325, cr_loss=0.4687, over 21078.00 frames. ], tot_loss[loss=0.3137, ctc_loss=0.2299, cr_loss=0.4191, over 4094440.28 frames. ], batch size: 59, lr: 1.66e-02, grad_scale: 32.0 2024-09-14 05:09:06,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=78803.5, ans=0.0 2024-09-14 05:09:20,906 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.383e+02 2.840e+02 3.304e+02 5.979e+02, threshold=5.680e+02, percent-clipped=3.0 2024-09-14 05:09:33,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2024-09-14 05:09:55,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=78916.83333333333, ans=0.95 2024-09-14 05:09:55,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=78916.83333333333, ans=0.2 2024-09-14 05:10:09,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=78945.16666666667, ans=0.0 2024-09-14 05:10:10,763 INFO [train.py:1198] (0/2) Epoch 5, batch 2300, loss[loss=0.2759, ctc_loss=0.196, cr_loss=0.3994, over 20960.00 frames. ], tot_loss[loss=0.3134, ctc_loss=0.2296, cr_loss=0.419, over 4100604.15 frames. ], batch size: 50, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:10:26,301 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 05:10:58,247 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.48 vs. limit=10.0 2024-09-14 05:11:05,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=22.5 2024-09-14 05:11:14,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79058.5, ans=0.1 2024-09-14 05:11:26,261 INFO [train.py:1198] (0/2) Epoch 5, batch 2350, loss[loss=0.3287, ctc_loss=0.2362, cr_loss=0.4626, over 20868.00 frames. ], tot_loss[loss=0.3136, ctc_loss=0.2298, cr_loss=0.419, over 4093556.31 frames. ], batch size: 65, lr: 1.65e-02, grad_scale: 16.0 2024-09-14 05:11:53,407 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.309e+02 2.622e+02 3.073e+02 5.142e+02, threshold=5.243e+02, percent-clipped=0.0 2024-09-14 05:12:41,567 INFO [train.py:1198] (0/2) Epoch 5, batch 2400, loss[loss=0.3215, ctc_loss=0.2328, cr_loss=0.4436, over 20878.00 frames. ], tot_loss[loss=0.3135, ctc_loss=0.2298, cr_loss=0.4184, over 4078402.55 frames. ], batch size: 57, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:12:48,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=79228.5, ans=0.2 2024-09-14 05:12:53,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=79228.5, ans=0.125 2024-09-14 05:13:03,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=22.5 2024-09-14 05:13:05,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=79256.83333333333, ans=0.125 2024-09-14 05:13:33,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79313.5, ans=0.1 2024-09-14 05:13:35,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=79313.5, ans=0.125 2024-09-14 05:13:42,687 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-28000.pt 2024-09-14 05:13:45,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2024-09-14 05:14:04,352 INFO [train.py:1198] (0/2) Epoch 5, batch 2450, loss[loss=0.2948, ctc_loss=0.2145, cr_loss=0.4011, over 20951.00 frames. ], tot_loss[loss=0.3135, ctc_loss=0.2297, cr_loss=0.4189, over 4085313.76 frames. ], batch size: 48, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:14:31,544 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.141e+02 2.417e+02 2.858e+02 5.130e+02, threshold=4.835e+02, percent-clipped=0.0 2024-09-14 05:14:38,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-14 05:14:43,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=79426.83333333333, ans=0.125 2024-09-14 05:14:45,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=79426.83333333333, ans=0.125 2024-09-14 05:14:50,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=79455.16666666667, ans=0.125 2024-09-14 05:14:51,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=79455.16666666667, ans=0.07 2024-09-14 05:14:57,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=12.0 2024-09-14 05:15:19,165 INFO [train.py:1198] (0/2) Epoch 5, batch 2500, loss[loss=0.3344, ctc_loss=0.251, cr_loss=0.4169, over 20840.00 frames. ], tot_loss[loss=0.3136, ctc_loss=0.2299, cr_loss=0.4188, over 4082351.26 frames. ], batch size: 65, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:15:37,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=79540.16666666667, ans=0.125 2024-09-14 05:16:19,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79625.16666666667, ans=0.1 2024-09-14 05:16:22,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=79625.16666666667, ans=0.125 2024-09-14 05:16:33,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79653.5, ans=0.1 2024-09-14 05:16:34,230 INFO [train.py:1198] (0/2) Epoch 5, batch 2550, loss[loss=0.3096, ctc_loss=0.2235, cr_loss=0.4303, over 21044.00 frames. ], tot_loss[loss=0.3124, ctc_loss=0.2288, cr_loss=0.4178, over 4082379.29 frames. ], batch size: 56, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:17:00,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=79681.83333333333, ans=0.0 2024-09-14 05:17:01,157 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.242e+02 2.471e+02 2.801e+02 5.372e+02, threshold=4.942e+02, percent-clipped=0.0 2024-09-14 05:17:09,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=79710.16666666667, ans=0.125 2024-09-14 05:17:19,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=79738.5, ans=0.125 2024-09-14 05:17:49,553 INFO [train.py:1198] (0/2) Epoch 5, batch 2600, loss[loss=0.3388, ctc_loss=0.2529, cr_loss=0.4297, over 19397.00 frames. ], tot_loss[loss=0.3125, ctc_loss=0.2289, cr_loss=0.4181, over 4094508.52 frames. ], batch size: 90, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:18:17,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=79823.5, ans=0.125 2024-09-14 05:18:21,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-09-14 05:18:22,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=79851.83333333333, ans=0.125 2024-09-14 05:18:32,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79851.83333333333, ans=0.1 2024-09-14 05:18:49,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=79908.5, ans=0.125 2024-09-14 05:19:08,385 INFO [train.py:1198] (0/2) Epoch 5, batch 2650, loss[loss=0.317, ctc_loss=0.2342, cr_loss=0.4145, over 21025.00 frames. ], tot_loss[loss=0.3134, ctc_loss=0.2297, cr_loss=0.4188, over 4088656.52 frames. ], batch size: 62, lr: 1.65e-02, grad_scale: 32.0 2024-09-14 05:19:10,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=79936.83333333333, ans=0.125 2024-09-14 05:19:11,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=79936.83333333333, ans=0.07 2024-09-14 05:19:28,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.42 vs. limit=15.0 2024-09-14 05:19:33,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2024-09-14 05:19:38,608 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.167e+02 2.414e+02 2.922e+02 4.679e+02, threshold=4.827e+02, percent-clipped=1.0 2024-09-14 05:19:49,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=79993.5, ans=0.125 2024-09-14 05:20:27,045 INFO [train.py:1198] (0/2) Epoch 5, batch 2700, loss[loss=0.3108, ctc_loss=0.2213, cr_loss=0.4475, over 20878.00 frames. ], tot_loss[loss=0.3119, ctc_loss=0.2285, cr_loss=0.4171, over 4094045.66 frames. ], batch size: 57, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:20:38,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=80078.5, ans=0.125 2024-09-14 05:20:44,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80106.83333333333, ans=0.1 2024-09-14 05:20:53,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=80106.83333333333, ans=0.2 2024-09-14 05:20:57,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=80135.16666666667, ans=0.125 2024-09-14 05:20:58,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80135.16666666667, ans=0.1 2024-09-14 05:21:05,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2024-09-14 05:21:09,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=80135.16666666667, ans=0.125 2024-09-14 05:21:32,497 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 05:21:42,656 INFO [train.py:1198] (0/2) Epoch 5, batch 2750, loss[loss=0.3296, ctc_loss=0.2414, cr_loss=0.441, over 20783.00 frames. ], tot_loss[loss=0.3111, ctc_loss=0.2279, cr_loss=0.4158, over 4094718.02 frames. ], batch size: 56, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:21:56,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=80248.5, ans=0.125 2024-09-14 05:22:09,744 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.195e+02 2.460e+02 2.775e+02 5.354e+02, threshold=4.921e+02, percent-clipped=1.0 2024-09-14 05:22:11,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=80276.83333333333, ans=0.0 2024-09-14 05:22:13,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=80276.83333333333, ans=0.125 2024-09-14 05:22:18,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=80276.83333333333, ans=0.125 2024-09-14 05:22:21,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=80276.83333333333, ans=0.125 2024-09-14 05:22:30,716 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 05:22:39,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=80305.16666666667, ans=0.125 2024-09-14 05:22:44,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.46 vs. limit=15.0 2024-09-14 05:22:48,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=80333.5, ans=0.0 2024-09-14 05:22:57,545 INFO [train.py:1198] (0/2) Epoch 5, batch 2800, loss[loss=0.3324, ctc_loss=0.2427, cr_loss=0.4484, over 19960.00 frames. ], tot_loss[loss=0.3117, ctc_loss=0.2284, cr_loss=0.4161, over 4084755.44 frames. ], batch size: 80, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:23:05,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=80361.83333333333, ans=0.0 2024-09-14 05:23:10,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=80361.83333333333, ans=0.2 2024-09-14 05:23:19,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80390.16666666667, ans=0.1 2024-09-14 05:24:13,287 INFO [train.py:1198] (0/2) Epoch 5, batch 2850, loss[loss=0.3211, ctc_loss=0.2323, cr_loss=0.4436, over 20833.00 frames. ], tot_loss[loss=0.3119, ctc_loss=0.2288, cr_loss=0.4155, over 4075761.97 frames. ], batch size: 59, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:24:25,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=80503.5, ans=0.125 2024-09-14 05:24:43,174 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.215e+02 2.474e+02 2.810e+02 4.187e+02, threshold=4.948e+02, percent-clipped=0.0 2024-09-14 05:24:46,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=80560.16666666667, ans=0.125 2024-09-14 05:25:34,327 INFO [train.py:1198] (0/2) Epoch 5, batch 2900, loss[loss=0.3253, ctc_loss=0.2402, cr_loss=0.4254, over 19352.00 frames. ], tot_loss[loss=0.3122, ctc_loss=0.229, cr_loss=0.4163, over 4075163.24 frames. ], batch size: 90, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:25:34,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=80645.16666666667, ans=0.0 2024-09-14 05:25:59,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=80673.5, ans=0.125 2024-09-14 05:25:59,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2024-09-14 05:26:12,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=80701.83333333333, ans=0.2 2024-09-14 05:26:17,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=80701.83333333333, ans=0.125 2024-09-14 05:26:28,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=80730.16666666667, ans=0.125 2024-09-14 05:26:39,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.38 vs. limit=15.0 2024-09-14 05:26:50,359 INFO [train.py:1198] (0/2) Epoch 5, batch 2950, loss[loss=0.3217, ctc_loss=0.237, cr_loss=0.4237, over 20849.00 frames. ], tot_loss[loss=0.3128, ctc_loss=0.2295, cr_loss=0.4164, over 4067374.49 frames. ], batch size: 65, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:26:50,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=80786.83333333333, ans=0.025 2024-09-14 05:27:17,109 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.189e+02 2.408e+02 2.676e+02 5.297e+02, threshold=4.816e+02, percent-clipped=2.0 2024-09-14 05:27:32,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=80843.5, ans=0.2 2024-09-14 05:27:54,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=80900.16666666667, ans=0.0 2024-09-14 05:28:04,768 INFO [train.py:1198] (0/2) Epoch 5, batch 3000, loss[loss=0.2721, ctc_loss=0.1938, cr_loss=0.3916, over 20986.00 frames. ], tot_loss[loss=0.3117, ctc_loss=0.2286, cr_loss=0.4157, over 4076671.99 frames. ], batch size: 55, lr: 1.64e-02, grad_scale: 32.0 2024-09-14 05:28:04,769 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 05:28:25,119 INFO [train.py:1230] (0/2) Epoch 5, validation: loss=0.07203, ctc_loss=0.07203, cr_loss=9.208e-15, over 944034.00 frames. 2024-09-14 05:28:25,119 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 05:28:51,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=80956.83333333333, ans=0.125 2024-09-14 05:29:09,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.44 vs. limit=10.0 2024-09-14 05:29:22,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=81013.5, ans=10.0 2024-09-14 05:29:25,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81041.83333333333, ans=0.1 2024-09-14 05:29:34,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=81041.83333333333, ans=0.125 2024-09-14 05:29:40,509 INFO [train.py:1198] (0/2) Epoch 5, batch 3050, loss[loss=0.3052, ctc_loss=0.2252, cr_loss=0.4002, over 21058.00 frames. ], tot_loss[loss=0.3118, ctc_loss=0.2285, cr_loss=0.4161, over 4078103.12 frames. ], batch size: 56, lr: 1.63e-02, grad_scale: 32.0 2024-09-14 05:30:10,518 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.173e+02 2.418e+02 2.867e+02 4.391e+02, threshold=4.837e+02, percent-clipped=0.0 2024-09-14 05:30:16,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=81126.83333333333, ans=0.1 2024-09-14 05:30:18,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=81126.83333333333, ans=0.05 2024-09-14 05:30:37,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=12.0 2024-09-14 05:30:46,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=81183.5, ans=0.1 2024-09-14 05:31:02,195 INFO [train.py:1198] (0/2) Epoch 5, batch 3100, loss[loss=0.3201, ctc_loss=0.2342, cr_loss=0.4293, over 20947.00 frames. ], tot_loss[loss=0.3099, ctc_loss=0.227, cr_loss=0.4143, over 4091428.61 frames. ], batch size: 58, lr: 1.63e-02, grad_scale: 32.0 2024-09-14 05:31:55,396 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2024-09-14 05:32:00,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=81325.16666666667, ans=0.125 2024-09-14 05:32:16,938 INFO [train.py:1198] (0/2) Epoch 5, batch 3150, loss[loss=0.3249, ctc_loss=0.2384, cr_loss=0.4323, over 20026.00 frames. ], tot_loss[loss=0.3104, ctc_loss=0.2273, cr_loss=0.4153, over 4098079.21 frames. ], batch size: 80, lr: 1.63e-02, grad_scale: 32.0 2024-09-14 05:32:35,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-09-14 05:32:44,308 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.174e+02 2.414e+02 2.681e+02 3.799e+02, threshold=4.829e+02, percent-clipped=0.0 2024-09-14 05:33:08,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=81438.5, ans=0.125 2024-09-14 05:33:11,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=81438.5, ans=0.125 2024-09-14 05:33:23,158 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=12.0 2024-09-14 05:33:28,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=81466.83333333333, ans=0.5 2024-09-14 05:33:32,887 INFO [train.py:1198] (0/2) Epoch 5, batch 3200, loss[loss=0.2723, ctc_loss=0.1971, cr_loss=0.3762, over 21062.00 frames. ], tot_loss[loss=0.3092, ctc_loss=0.2263, cr_loss=0.4146, over 4107700.97 frames. ], batch size: 53, lr: 1.63e-02, grad_scale: 32.0 2024-09-14 05:34:09,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=81551.83333333333, ans=0.125 2024-09-14 05:34:10,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=81551.83333333333, ans=0.0 2024-09-14 05:34:28,006 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 05:34:48,589 INFO [train.py:1198] (0/2) Epoch 5, batch 3250, loss[loss=0.3027, ctc_loss=0.221, cr_loss=0.4086, over 21012.00 frames. ], tot_loss[loss=0.3101, ctc_loss=0.227, cr_loss=0.4153, over 4090045.93 frames. ], batch size: 52, lr: 1.63e-02, grad_scale: 16.0 2024-09-14 05:35:12,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=81665.16666666667, ans=0.125 2024-09-14 05:35:17,074 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.197e+02 2.501e+02 2.945e+02 4.319e+02, threshold=5.002e+02, percent-clipped=0.0 2024-09-14 05:35:20,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=12.0 2024-09-14 05:35:51,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=81750.16666666667, ans=0.125 2024-09-14 05:36:02,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=81750.16666666667, ans=0.0 2024-09-14 05:36:09,446 INFO [train.py:1198] (0/2) Epoch 5, batch 3300, loss[loss=0.3269, ctc_loss=0.2369, cr_loss=0.4497, over 20926.00 frames. ], tot_loss[loss=0.3141, ctc_loss=0.2302, cr_loss=0.4192, over 4080556.93 frames. ], batch size: 60, lr: 1.63e-02, grad_scale: 16.0 2024-09-14 05:36:14,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2024-09-14 05:36:29,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=81806.83333333333, ans=0.0 2024-09-14 05:37:05,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=81863.5, ans=0.125 2024-09-14 05:37:07,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=81863.5, ans=0.125 2024-09-14 05:37:10,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-14 05:37:25,089 INFO [train.py:1198] (0/2) Epoch 5, batch 3350, loss[loss=0.2531, ctc_loss=0.1826, cr_loss=0.3526, over 19841.00 frames. ], tot_loss[loss=0.3137, ctc_loss=0.2298, cr_loss=0.4195, over 4098822.75 frames. ], batch size: 44, lr: 1.63e-02, grad_scale: 16.0 2024-09-14 05:37:27,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2024-09-14 05:37:38,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81948.5, ans=0.1 2024-09-14 05:37:51,458 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2024-09-14 05:37:53,741 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.230e+02 2.442e+02 3.002e+02 5.612e+02, threshold=4.884e+02, percent-clipped=2.0 2024-09-14 05:38:17,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=82005.16666666667, ans=0.0 2024-09-14 05:38:31,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=82033.5, ans=0.125 2024-09-14 05:38:36,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=82033.5, ans=0.025 2024-09-14 05:38:40,822 INFO [train.py:1198] (0/2) Epoch 5, batch 3400, loss[loss=0.3517, ctc_loss=0.2614, cr_loss=0.4515, over 20083.00 frames. ], tot_loss[loss=0.314, ctc_loss=0.2302, cr_loss=0.4189, over 4085124.55 frames. ], batch size: 80, lr: 1.63e-02, grad_scale: 16.0 2024-09-14 05:38:44,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=82061.83333333333, ans=0.125 2024-09-14 05:39:56,157 INFO [train.py:1198] (0/2) Epoch 5, batch 3450, loss[loss=0.3104, ctc_loss=0.2216, cr_loss=0.4442, over 21050.00 frames. ], tot_loss[loss=0.3133, ctc_loss=0.2295, cr_loss=0.4187, over 4097285.00 frames. ], batch size: 62, lr: 1.62e-02, grad_scale: 16.0 2024-09-14 05:40:02,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=22.5 2024-09-14 05:40:02,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=82203.5, ans=0.0 2024-09-14 05:40:15,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.73 vs. limit=22.5 2024-09-14 05:40:25,176 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.180e+02 2.416e+02 2.788e+02 3.622e+02, threshold=4.831e+02, percent-clipped=0.0 2024-09-14 05:40:27,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=82260.16666666667, ans=0.125 2024-09-14 05:41:10,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=82345.16666666667, ans=0.05 2024-09-14 05:41:11,325 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.62 vs. limit=15.0 2024-09-14 05:41:12,093 INFO [train.py:1198] (0/2) Epoch 5, batch 3500, loss[loss=0.3049, ctc_loss=0.2198, cr_loss=0.4257, over 21055.00 frames. ], tot_loss[loss=0.3121, ctc_loss=0.2285, cr_loss=0.4183, over 4093166.62 frames. ], batch size: 53, lr: 1.62e-02, grad_scale: 16.0 2024-09-14 05:41:43,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=82401.83333333333, ans=0.125 2024-09-14 05:42:31,936 INFO [train.py:1198] (0/2) Epoch 5, batch 3550, loss[loss=0.2973, ctc_loss=0.2189, cr_loss=0.392, over 21011.00 frames. ], tot_loss[loss=0.3105, ctc_loss=0.2271, cr_loss=0.417, over 4109000.10 frames. ], batch size: 63, lr: 1.62e-02, grad_scale: 16.0 2024-09-14 05:42:42,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=82486.83333333333, ans=0.125 2024-09-14 05:42:53,364 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-09-14 05:43:00,043 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.177e+02 2.468e+02 2.916e+02 4.942e+02, threshold=4.936e+02, percent-clipped=1.0 2024-09-14 05:43:27,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=82571.83333333333, ans=0.0 2024-09-14 05:43:46,758 INFO [train.py:1198] (0/2) Epoch 5, batch 3600, loss[loss=0.2944, ctc_loss=0.2182, cr_loss=0.3808, over 20880.00 frames. ], tot_loss[loss=0.3105, ctc_loss=0.2272, cr_loss=0.4162, over 4106790.22 frames. ], batch size: 57, lr: 1.62e-02, grad_scale: 32.0 2024-09-14 05:43:54,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82628.5, ans=0.1 2024-09-14 05:44:00,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=82656.83333333333, ans=0.125 2024-09-14 05:44:09,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=82656.83333333333, ans=0.125 2024-09-14 05:44:38,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=82713.5, ans=0.04949747468305833 2024-09-14 05:44:40,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=82713.5, ans=0.2 2024-09-14 05:44:46,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=82741.83333333333, ans=0.125 2024-09-14 05:45:01,575 INFO [train.py:1198] (0/2) Epoch 5, batch 3650, loss[loss=0.2787, ctc_loss=0.1994, cr_loss=0.3964, over 19963.00 frames. ], tot_loss[loss=0.3105, ctc_loss=0.2271, cr_loss=0.4169, over 4111048.89 frames. ], batch size: 44, lr: 1.62e-02, grad_scale: 32.0 2024-09-14 05:45:19,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.69 vs. limit=15.0 2024-09-14 05:45:29,914 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.234e+02 2.506e+02 3.067e+02 8.156e+02, threshold=5.011e+02, percent-clipped=2.0 2024-09-14 05:45:33,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=82826.83333333333, ans=0.125 2024-09-14 05:46:16,931 INFO [train.py:1198] (0/2) Epoch 5, batch 3700, loss[loss=0.2767, ctc_loss=0.2024, cr_loss=0.3713, over 20991.00 frames. ], tot_loss[loss=0.3098, ctc_loss=0.2267, cr_loss=0.4154, over 4104685.49 frames. ], batch size: 52, lr: 1.62e-02, grad_scale: 32.0 2024-09-14 05:46:30,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=82940.16666666667, ans=0.125 2024-09-14 05:47:30,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=83025.16666666667, ans=0.0 2024-09-14 05:47:35,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=83025.16666666667, ans=0.125 2024-09-14 05:47:38,172 INFO [train.py:1198] (0/2) Epoch 5, batch 3750, loss[loss=0.3075, ctc_loss=0.2258, cr_loss=0.4085, over 20994.00 frames. ], tot_loss[loss=0.3108, ctc_loss=0.2275, cr_loss=0.4167, over 4107463.18 frames. ], batch size: 61, lr: 1.62e-02, grad_scale: 32.0 2024-09-14 05:47:39,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=83053.5, ans=0.2 2024-09-14 05:47:41,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=83053.5, ans=0.125 2024-09-14 05:47:44,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=83053.5, ans=0.125 2024-09-14 05:47:49,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=22.5 2024-09-14 05:48:06,910 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.195e+02 2.435e+02 2.822e+02 4.241e+02, threshold=4.870e+02, percent-clipped=0.0 2024-09-14 05:48:10,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=83110.16666666667, ans=0.125 2024-09-14 05:48:33,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=83138.5, ans=0.125 2024-09-14 05:48:39,569 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.47 vs. limit=10.0 2024-09-14 05:48:54,030 INFO [train.py:1198] (0/2) Epoch 5, batch 3800, loss[loss=0.3139, ctc_loss=0.2253, cr_loss=0.4428, over 21030.00 frames. ], tot_loss[loss=0.3108, ctc_loss=0.2274, cr_loss=0.4169, over 4101947.69 frames. ], batch size: 62, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:50:09,171 INFO [train.py:1198] (0/2) Epoch 5, batch 3850, loss[loss=0.3167, ctc_loss=0.2322, cr_loss=0.4225, over 20881.00 frames. ], tot_loss[loss=0.3109, ctc_loss=0.2274, cr_loss=0.4175, over 4103164.22 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:50:17,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=22.5 2024-09-14 05:50:24,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=83365.16666666667, ans=0.0 2024-09-14 05:50:37,724 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.200e+02 2.463e+02 2.934e+02 5.500e+02, threshold=4.926e+02, percent-clipped=1.0 2024-09-14 05:50:38,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=83393.5, ans=0.125 2024-09-14 05:50:39,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=83393.5, ans=0.125 2024-09-14 05:50:50,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=83393.5, ans=0.09899494936611666 2024-09-14 05:51:02,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=83421.83333333333, ans=0.125 2024-09-14 05:51:23,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=83478.5, ans=0.2 2024-09-14 05:51:24,404 INFO [train.py:1198] (0/2) Epoch 5, batch 3900, loss[loss=0.2816, ctc_loss=0.2057, cr_loss=0.3798, over 20972.00 frames. ], tot_loss[loss=0.3105, ctc_loss=0.227, cr_loss=0.4177, over 4107375.51 frames. ], batch size: 48, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:52:38,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=83620.16666666667, ans=0.125 2024-09-14 05:52:39,683 INFO [train.py:1198] (0/2) Epoch 5, batch 3950, loss[loss=0.3028, ctc_loss=0.2224, cr_loss=0.4021, over 20988.00 frames. ], tot_loss[loss=0.3103, ctc_loss=0.2268, cr_loss=0.4173, over 4104143.46 frames. ], batch size: 52, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:52:58,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=83648.5, ans=0.125 2024-09-14 05:53:14,451 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.188e+02 2.319e+02 2.628e+02 3.864e+02, threshold=4.637e+02, percent-clipped=0.0 2024-09-14 05:53:20,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=83676.83333333333, ans=0.2 2024-09-14 05:53:31,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=12.0 2024-09-14 05:53:40,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=83705.16666666667, ans=0.025 2024-09-14 05:53:58,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=83733.5, ans=0.2 2024-09-14 05:54:01,328 INFO [train.py:1198] (0/2) Epoch 5, batch 4000, loss[loss=0.3067, ctc_loss=0.2208, cr_loss=0.4297, over 20981.00 frames. ], tot_loss[loss=0.31, ctc_loss=0.2265, cr_loss=0.4174, over 4118596.35 frames. ], batch size: 55, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:54:12,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2024-09-14 05:54:48,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=83846.83333333333, ans=0.05 2024-09-14 05:54:51,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=83846.83333333333, ans=0.125 2024-09-14 05:55:09,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=83875.16666666667, ans=0.125 2024-09-14 05:55:12,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=83875.16666666667, ans=0.125 2024-09-14 05:55:16,342 INFO [train.py:1198] (0/2) Epoch 5, batch 4050, loss[loss=0.3128, ctc_loss=0.2302, cr_loss=0.4128, over 20970.00 frames. ], tot_loss[loss=0.3109, ctc_loss=0.2273, cr_loss=0.4183, over 4104400.87 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:55:27,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=83903.5, ans=0.125 2024-09-14 05:55:42,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=83931.83333333333, ans=0.125 2024-09-14 05:55:44,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.169e+02 2.337e+02 2.655e+02 4.904e+02, threshold=4.675e+02, percent-clipped=1.0 2024-09-14 05:56:22,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=84016.83333333333, ans=0.025 2024-09-14 05:56:28,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=84016.83333333333, ans=0.125 2024-09-14 05:56:31,694 INFO [train.py:1198] (0/2) Epoch 5, batch 4100, loss[loss=0.302, ctc_loss=0.2175, cr_loss=0.4225, over 20969.00 frames. ], tot_loss[loss=0.3106, ctc_loss=0.2269, cr_loss=0.4183, over 4110188.72 frames. ], batch size: 50, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:56:33,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=84045.16666666667, ans=0.125 2024-09-14 05:57:03,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=84101.83333333333, ans=0.125 2024-09-14 05:57:24,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84130.16666666667, ans=0.1 2024-09-14 05:57:31,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84158.5, ans=0.1 2024-09-14 05:57:46,727 INFO [train.py:1198] (0/2) Epoch 5, batch 4150, loss[loss=0.3612, ctc_loss=0.2709, cr_loss=0.4515, over 18146.00 frames. ], tot_loss[loss=0.3103, ctc_loss=0.227, cr_loss=0.4165, over 4106157.79 frames. ], batch size: 108, lr: 1.61e-02, grad_scale: 32.0 2024-09-14 05:57:48,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=84186.83333333333, ans=0.09899494936611666 2024-09-14 05:58:15,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.189e+02 2.356e+02 2.642e+02 4.278e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-14 05:59:07,764 INFO [train.py:1198] (0/2) Epoch 5, batch 4200, loss[loss=0.349, ctc_loss=0.2616, cr_loss=0.4367, over 20309.00 frames. ], tot_loss[loss=0.3103, ctc_loss=0.227, cr_loss=0.4165, over 4097963.87 frames. ], batch size: 74, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:00:23,582 INFO [train.py:1198] (0/2) Epoch 5, batch 4250, loss[loss=0.334, ctc_loss=0.2408, cr_loss=0.4662, over 20706.00 frames. ], tot_loss[loss=0.3098, ctc_loss=0.2266, cr_loss=0.4162, over 4107891.89 frames. ], batch size: 71, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:00:52,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.222e+02 2.393e+02 2.847e+02 4.628e+02, threshold=4.785e+02, percent-clipped=0.0 2024-09-14 06:00:58,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=84526.83333333333, ans=0.125 2024-09-14 06:01:04,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84526.83333333333, ans=0.1 2024-09-14 06:01:07,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=22.5 2024-09-14 06:01:13,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=84555.16666666667, ans=0.125 2024-09-14 06:01:15,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-09-14 06:01:21,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=84555.16666666667, ans=0.0 2024-09-14 06:01:37,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=84611.83333333333, ans=0.125 2024-09-14 06:01:38,932 INFO [train.py:1198] (0/2) Epoch 5, batch 4300, loss[loss=0.3275, ctc_loss=0.2447, cr_loss=0.4143, over 20397.00 frames. ], tot_loss[loss=0.3092, ctc_loss=0.226, cr_loss=0.4161, over 4110976.94 frames. ], batch size: 74, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:02:02,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=84640.16666666667, ans=0.0 2024-09-14 06:02:08,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84668.5, ans=0.1 2024-09-14 06:02:26,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=84696.83333333333, ans=0.125 2024-09-14 06:02:54,673 INFO [train.py:1198] (0/2) Epoch 5, batch 4350, loss[loss=0.3479, ctc_loss=0.2516, cr_loss=0.4815, over 20637.00 frames. ], tot_loss[loss=0.3097, ctc_loss=0.2265, cr_loss=0.4163, over 4099643.72 frames. ], batch size: 66, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:03:12,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=84781.83333333333, ans=0.025 2024-09-14 06:03:24,551 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.307e+02 2.722e+02 3.123e+02 5.218e+02, threshold=5.444e+02, percent-clipped=3.0 2024-09-14 06:03:39,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=84838.5, ans=0.0 2024-09-14 06:03:53,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=84866.83333333333, ans=0.2 2024-09-14 06:04:09,517 INFO [train.py:1198] (0/2) Epoch 5, batch 4400, loss[loss=0.314, ctc_loss=0.2319, cr_loss=0.4105, over 20655.00 frames. ], tot_loss[loss=0.3079, ctc_loss=0.2249, cr_loss=0.4151, over 4104601.44 frames. ], batch size: 66, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:05:20,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=85008.5, ans=0.125 2024-09-14 06:05:22,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-09-14 06:05:30,856 INFO [train.py:1198] (0/2) Epoch 5, batch 4450, loss[loss=0.3322, ctc_loss=0.245, cr_loss=0.4357, over 20360.00 frames. ], tot_loss[loss=0.3085, ctc_loss=0.2255, cr_loss=0.4154, over 4091440.76 frames. ], batch size: 74, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:06:01,013 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.162e+02 2.389e+02 2.769e+02 5.314e+02, threshold=4.777e+02, percent-clipped=0.0 2024-09-14 06:06:24,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2024-09-14 06:06:45,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=85178.5, ans=0.1 2024-09-14 06:06:46,370 INFO [train.py:1198] (0/2) Epoch 5, batch 4500, loss[loss=0.2996, ctc_loss=0.2136, cr_loss=0.4299, over 20937.00 frames. ], tot_loss[loss=0.3097, ctc_loss=0.2264, cr_loss=0.4166, over 4083346.82 frames. ], batch size: 49, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:06:48,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=85178.5, ans=0.125 2024-09-14 06:07:01,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=85206.83333333333, ans=0.125 2024-09-14 06:07:09,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=85206.83333333333, ans=0.125 2024-09-14 06:07:36,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=85263.5, ans=0.0 2024-09-14 06:07:45,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2024-09-14 06:08:01,891 INFO [train.py:1198] (0/2) Epoch 5, batch 4550, loss[loss=0.3155, ctc_loss=0.2334, cr_loss=0.4103, over 21012.00 frames. ], tot_loss[loss=0.3099, ctc_loss=0.2264, cr_loss=0.4171, over 4085503.62 frames. ], batch size: 61, lr: 1.60e-02, grad_scale: 32.0 2024-09-14 06:08:05,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=85320.16666666667, ans=0.125 2024-09-14 06:08:17,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=85348.5, ans=0.125 2024-09-14 06:08:21,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=85348.5, ans=0.0 2024-09-14 06:08:32,233 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.227e+02 2.442e+02 3.193e+02 6.739e+02, threshold=4.885e+02, percent-clipped=2.0 2024-09-14 06:08:47,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=85405.16666666667, ans=0.125 2024-09-14 06:09:17,098 INFO [train.py:1198] (0/2) Epoch 5, batch 4600, loss[loss=0.3162, ctc_loss=0.2286, cr_loss=0.4382, over 20709.00 frames. ], tot_loss[loss=0.3116, ctc_loss=0.2276, cr_loss=0.4197, over 4095264.99 frames. ], batch size: 71, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:09:38,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=85490.16666666667, ans=0.125 2024-09-14 06:09:49,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=12.0 2024-09-14 06:09:52,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=85518.5, ans=0.125 2024-09-14 06:10:10,694 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2024-09-14 06:10:29,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=85575.16666666667, ans=0.125 2024-09-14 06:10:38,388 INFO [train.py:1198] (0/2) Epoch 5, batch 4650, loss[loss=0.2393, ctc_loss=0.1739, cr_loss=0.327, over 20975.00 frames. ], tot_loss[loss=0.31, ctc_loss=0.2266, cr_loss=0.4173, over 4094868.03 frames. ], batch size: 49, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:10:40,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=85603.5, ans=0.0 2024-09-14 06:10:56,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=85631.83333333333, ans=0.95 2024-09-14 06:11:08,034 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.172e+02 2.419e+02 2.724e+02 6.227e+02, threshold=4.838e+02, percent-clipped=1.0 2024-09-14 06:11:12,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=85660.16666666667, ans=0.025 2024-09-14 06:11:20,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=85660.16666666667, ans=0.125 2024-09-14 06:11:54,005 INFO [train.py:1198] (0/2) Epoch 5, batch 4700, loss[loss=0.2741, ctc_loss=0.1973, cr_loss=0.3839, over 20977.00 frames. ], tot_loss[loss=0.3121, ctc_loss=0.2281, cr_loss=0.42, over 4095330.54 frames. ], batch size: 48, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:12:00,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=85745.16666666667, ans=0.125 2024-09-14 06:12:13,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=85773.5, ans=0.125 2024-09-14 06:12:31,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=85801.83333333333, ans=0.05 2024-09-14 06:12:51,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=85830.16666666667, ans=0.1 2024-09-14 06:12:54,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=85858.5, ans=0.125 2024-09-14 06:13:06,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=85858.5, ans=0.125 2024-09-14 06:13:08,877 INFO [train.py:1198] (0/2) Epoch 5, batch 4750, loss[loss=0.3427, ctc_loss=0.2529, cr_loss=0.4492, over 20966.00 frames. ], tot_loss[loss=0.3124, ctc_loss=0.2284, cr_loss=0.4197, over 4089193.05 frames. ], batch size: 67, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:13:38,722 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.258e+02 2.644e+02 2.905e+02 5.447e+02, threshold=5.288e+02, percent-clipped=1.0 2024-09-14 06:13:43,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=85943.5, ans=0.05 2024-09-14 06:13:57,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=85971.83333333333, ans=0.2 2024-09-14 06:14:23,313 INFO [train.py:1198] (0/2) Epoch 5, batch 4800, loss[loss=0.2796, ctc_loss=0.2027, cr_loss=0.3845, over 20976.00 frames. ], tot_loss[loss=0.3101, ctc_loss=0.2266, cr_loss=0.4171, over 4094668.06 frames. ], batch size: 50, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:14:32,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=86028.5, ans=0.125 2024-09-14 06:14:53,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86085.16666666667, ans=0.1 2024-09-14 06:14:58,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=86085.16666666667, ans=0.125 2024-09-14 06:15:24,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-09-14 06:15:39,070 INFO [train.py:1198] (0/2) Epoch 5, batch 4850, loss[loss=0.2743, ctc_loss=0.2021, cr_loss=0.3611, over 20957.00 frames. ], tot_loss[loss=0.3104, ctc_loss=0.2271, cr_loss=0.4165, over 4093777.72 frames. ], batch size: 49, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:15:49,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=86170.16666666667, ans=0.0 2024-09-14 06:16:04,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=86198.5, ans=0.125 2024-09-14 06:16:12,133 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.150e+02 2.348e+02 2.725e+02 4.388e+02, threshold=4.695e+02, percent-clipped=0.0 2024-09-14 06:16:15,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=86226.83333333333, ans=0.125 2024-09-14 06:16:24,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=86226.83333333333, ans=0.1 2024-09-14 06:16:46,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=46.57 vs. limit=15.0 2024-09-14 06:16:57,470 INFO [train.py:1198] (0/2) Epoch 5, batch 4900, loss[loss=0.2791, ctc_loss=0.2017, cr_loss=0.3869, over 19921.00 frames. ], tot_loss[loss=0.3086, ctc_loss=0.2256, cr_loss=0.4149, over 4098733.41 frames. ], batch size: 44, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:17:40,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=86368.5, ans=0.2 2024-09-14 06:17:50,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=86396.83333333333, ans=0.0 2024-09-14 06:17:55,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.97 vs. limit=10.0 2024-09-14 06:18:01,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=86425.16666666667, ans=22.5 2024-09-14 06:18:12,479 INFO [train.py:1198] (0/2) Epoch 5, batch 4950, loss[loss=0.297, ctc_loss=0.2165, cr_loss=0.4026, over 21061.00 frames. ], tot_loss[loss=0.3084, ctc_loss=0.2255, cr_loss=0.4146, over 4100737.82 frames. ], batch size: 59, lr: 1.59e-02, grad_scale: 32.0 2024-09-14 06:18:39,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=86481.83333333333, ans=0.125 2024-09-14 06:18:42,441 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.235e+02 2.425e+02 2.804e+02 5.355e+02, threshold=4.850e+02, percent-clipped=1.0 2024-09-14 06:19:24,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=86566.83333333333, ans=0.125 2024-09-14 06:19:26,481 INFO [train.py:1198] (0/2) Epoch 5, batch 5000, loss[loss=0.3176, ctc_loss=0.2278, cr_loss=0.4485, over 20828.00 frames. ], tot_loss[loss=0.3079, ctc_loss=0.2251, cr_loss=0.414, over 4094744.16 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:20:39,546 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=12.0 2024-09-14 06:20:40,418 INFO [train.py:1198] (0/2) Epoch 5, batch 5050, loss[loss=0.3058, ctc_loss=0.2233, cr_loss=0.4124, over 20865.00 frames. ], tot_loss[loss=0.3082, ctc_loss=0.2253, cr_loss=0.4146, over 4095215.73 frames. ], batch size: 57, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:21:04,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86765.16666666667, ans=0.1 2024-09-14 06:21:10,543 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.212e+02 2.472e+02 3.120e+02 5.773e+02, threshold=4.944e+02, percent-clipped=3.0 2024-09-14 06:21:23,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-09-14 06:21:27,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=12.0 2024-09-14 06:21:55,371 INFO [train.py:1198] (0/2) Epoch 5, batch 5100, loss[loss=0.3179, ctc_loss=0.2347, cr_loss=0.4161, over 20310.00 frames. ], tot_loss[loss=0.3074, ctc_loss=0.2248, cr_loss=0.4133, over 4102800.52 frames. ], batch size: 74, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:22:46,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-09-14 06:23:01,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=86991.83333333333, ans=0.0 2024-09-14 06:23:10,038 INFO [train.py:1198] (0/2) Epoch 5, batch 5150, loss[loss=0.322, ctc_loss=0.2386, cr_loss=0.417, over 20811.00 frames. ], tot_loss[loss=0.3071, ctc_loss=0.2244, cr_loss=0.4133, over 4101431.07 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:23:11,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=87020.16666666667, ans=0.125 2024-09-14 06:23:11,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=87020.16666666667, ans=0.125 2024-09-14 06:23:14,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=87020.16666666667, ans=0.015 2024-09-14 06:23:39,512 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.173e+02 2.609e+02 3.097e+02 4.855e+02, threshold=5.218e+02, percent-clipped=0.0 2024-09-14 06:23:42,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=87076.83333333333, ans=0.0 2024-09-14 06:23:53,022 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:24:10,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87133.5, ans=0.1 2024-09-14 06:24:17,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=87133.5, ans=10.0 2024-09-14 06:24:25,801 INFO [train.py:1198] (0/2) Epoch 5, batch 5200, loss[loss=0.3259, ctc_loss=0.2395, cr_loss=0.4319, over 20974.00 frames. ], tot_loss[loss=0.3073, ctc_loss=0.2246, cr_loss=0.4134, over 4091191.64 frames. ], batch size: 58, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:24:38,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=87161.83333333333, ans=0.025 2024-09-14 06:25:02,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-09-14 06:25:13,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=87246.83333333333, ans=0.125 2024-09-14 06:25:22,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=12.0 2024-09-14 06:25:27,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=87275.16666666667, ans=0.125 2024-09-14 06:25:42,436 INFO [train.py:1198] (0/2) Epoch 5, batch 5250, loss[loss=0.3207, ctc_loss=0.2346, cr_loss=0.4306, over 20765.00 frames. ], tot_loss[loss=0.3082, ctc_loss=0.2254, cr_loss=0.4141, over 4077146.15 frames. ], batch size: 56, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:25:50,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2024-09-14 06:26:11,841 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.240e+02 2.492e+02 2.833e+02 8.780e+02, threshold=4.984e+02, percent-clipped=1.0 2024-09-14 06:26:31,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=87388.5, ans=0.125 2024-09-14 06:26:55,478 INFO [train.py:1198] (0/2) Epoch 5, batch 5300, loss[loss=0.2876, ctc_loss=0.204, cr_loss=0.418, over 20980.00 frames. ], tot_loss[loss=0.3094, ctc_loss=0.2262, cr_loss=0.4158, over 4074336.99 frames. ], batch size: 55, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:26:57,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87445.16666666667, ans=0.1 2024-09-14 06:27:27,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=87501.83333333333, ans=0.07 2024-09-14 06:28:07,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=87558.5, ans=0.125 2024-09-14 06:28:09,883 INFO [train.py:1198] (0/2) Epoch 5, batch 5350, loss[loss=0.3491, ctc_loss=0.2583, cr_loss=0.454, over 18046.00 frames. ], tot_loss[loss=0.3095, ctc_loss=0.2262, cr_loss=0.4168, over 4076115.99 frames. ], batch size: 108, lr: 1.58e-02, grad_scale: 32.0 2024-09-14 06:28:14,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=87586.83333333333, ans=0.125 2024-09-14 06:28:17,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=87586.83333333333, ans=0.125 2024-09-14 06:28:22,193 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:28:39,469 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.198e+02 2.381e+02 2.732e+02 3.900e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-14 06:28:57,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=87671.83333333333, ans=0.125 2024-09-14 06:29:17,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=87700.16666666667, ans=0.0 2024-09-14 06:29:24,284 INFO [train.py:1198] (0/2) Epoch 5, batch 5400, loss[loss=0.2709, ctc_loss=0.1918, cr_loss=0.3957, over 21001.00 frames. ], tot_loss[loss=0.3088, ctc_loss=0.2255, cr_loss=0.4163, over 4093637.25 frames. ], batch size: 52, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:29:44,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.35 vs. limit=10.0 2024-09-14 06:29:55,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=87785.16666666667, ans=0.07 2024-09-14 06:29:58,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=87785.16666666667, ans=0.0 2024-09-14 06:30:27,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=87841.83333333333, ans=0.0 2024-09-14 06:30:38,568 INFO [train.py:1198] (0/2) Epoch 5, batch 5450, loss[loss=0.3482, ctc_loss=0.2551, cr_loss=0.4659, over 20668.00 frames. ], tot_loss[loss=0.3097, ctc_loss=0.2263, cr_loss=0.4168, over 4093001.37 frames. ], batch size: 71, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:31:02,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=87898.5, ans=0.125 2024-09-14 06:31:08,444 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.336e+02 2.698e+02 3.124e+02 5.401e+02, threshold=5.396e+02, percent-clipped=3.0 2024-09-14 06:31:59,268 INFO [train.py:1198] (0/2) Epoch 5, batch 5500, loss[loss=0.3107, ctc_loss=0.2287, cr_loss=0.41, over 20894.00 frames. ], tot_loss[loss=0.3101, ctc_loss=0.2267, cr_loss=0.4172, over 4084468.47 frames. ], batch size: 57, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:32:02,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=88011.83333333333, ans=0.125 2024-09-14 06:32:27,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88068.5, ans=0.1 2024-09-14 06:32:58,793 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:33:13,314 INFO [train.py:1198] (0/2) Epoch 5, batch 5550, loss[loss=0.3111, ctc_loss=0.2334, cr_loss=0.3887, over 21006.00 frames. ], tot_loss[loss=0.3104, ctc_loss=0.2269, cr_loss=0.4173, over 4085218.49 frames. ], batch size: 61, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:33:22,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=88153.5, ans=0.125 2024-09-14 06:33:31,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=88181.83333333333, ans=0.125 2024-09-14 06:33:45,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.172e+02 2.399e+02 2.777e+02 7.183e+02, threshold=4.797e+02, percent-clipped=1.0 2024-09-14 06:33:51,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=88210.16666666667, ans=0.125 2024-09-14 06:34:32,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.66 vs. limit=5.0 2024-09-14 06:34:32,455 INFO [train.py:1198] (0/2) Epoch 5, batch 5600, loss[loss=0.3014, ctc_loss=0.2187, cr_loss=0.414, over 20857.00 frames. ], tot_loss[loss=0.3097, ctc_loss=0.2262, cr_loss=0.4171, over 4093957.50 frames. ], batch size: 65, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:34:32,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=88295.16666666667, ans=0.125 2024-09-14 06:35:04,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=88351.83333333333, ans=0.0 2024-09-14 06:35:15,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.70 vs. limit=10.0 2024-09-14 06:35:22,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=88380.16666666667, ans=0.2 2024-09-14 06:35:46,075 INFO [train.py:1198] (0/2) Epoch 5, batch 5650, loss[loss=0.2687, ctc_loss=0.1928, cr_loss=0.3796, over 21048.00 frames. ], tot_loss[loss=0.3093, ctc_loss=0.226, cr_loss=0.4166, over 4090729.42 frames. ], batch size: 53, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:35:47,924 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:35:49,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=88436.83333333333, ans=0.0 2024-09-14 06:35:56,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=88436.83333333333, ans=0.125 2024-09-14 06:36:15,848 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.310e+02 2.734e+02 3.176e+02 4.614e+02, threshold=5.469e+02, percent-clipped=0.0 2024-09-14 06:36:42,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=88521.83333333333, ans=0.125 2024-09-14 06:36:59,073 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 06:37:00,106 INFO [train.py:1198] (0/2) Epoch 5, batch 5700, loss[loss=0.3165, ctc_loss=0.2312, cr_loss=0.4268, over 21075.00 frames. ], tot_loss[loss=0.3089, ctc_loss=0.2255, cr_loss=0.4168, over 4092047.60 frames. ], batch size: 59, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:37:44,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=88663.5, ans=0.125 2024-09-14 06:38:00,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=88691.83333333333, ans=0.025 2024-09-14 06:38:06,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=88691.83333333333, ans=0.0 2024-09-14 06:38:13,760 INFO [train.py:1198] (0/2) Epoch 5, batch 5750, loss[loss=0.2988, ctc_loss=0.2154, cr_loss=0.4169, over 20832.00 frames. ], tot_loss[loss=0.3092, ctc_loss=0.2257, cr_loss=0.4172, over 4091614.70 frames. ], batch size: 59, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:38:14,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=88720.16666666667, ans=0.125 2024-09-14 06:38:25,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=88720.16666666667, ans=0.0 2024-09-14 06:38:30,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=88748.5, ans=0.125 2024-09-14 06:38:33,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=88748.5, ans=0.0 2024-09-14 06:38:40,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=88748.5, ans=0.0 2024-09-14 06:38:43,453 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.222e+02 2.411e+02 2.798e+02 5.182e+02, threshold=4.821e+02, percent-clipped=0.0 2024-09-14 06:39:07,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88805.16666666667, ans=0.1 2024-09-14 06:39:28,269 INFO [train.py:1198] (0/2) Epoch 5, batch 5800, loss[loss=0.2814, ctc_loss=0.2058, cr_loss=0.3779, over 21064.00 frames. ], tot_loss[loss=0.3089, ctc_loss=0.2256, cr_loss=0.4166, over 4070780.51 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2024-09-14 06:40:01,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=88918.5, ans=0.125 2024-09-14 06:40:25,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=88946.83333333333, ans=0.0 2024-09-14 06:40:42,761 INFO [train.py:1198] (0/2) Epoch 5, batch 5850, loss[loss=0.3026, ctc_loss=0.2205, cr_loss=0.4103, over 21073.00 frames. ], tot_loss[loss=0.3108, ctc_loss=0.2271, cr_loss=0.4184, over 4061408.78 frames. ], batch size: 59, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:41:05,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-09-14 06:41:12,557 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.198e+02 2.446e+02 2.758e+02 4.223e+02, threshold=4.892e+02, percent-clipped=0.0 2024-09-14 06:41:20,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-09-14 06:41:32,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=89088.5, ans=0.125 2024-09-14 06:41:39,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=89088.5, ans=0.0 2024-09-14 06:41:57,345 INFO [train.py:1198] (0/2) Epoch 5, batch 5900, loss[loss=0.3452, ctc_loss=0.2564, cr_loss=0.4441, over 20312.00 frames. ], tot_loss[loss=0.3091, ctc_loss=0.2259, cr_loss=0.416, over 4067974.04 frames. ], batch size: 74, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:42:11,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=89173.5, ans=0.0 2024-09-14 06:42:19,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-14 06:42:36,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89201.83333333333, ans=0.1 2024-09-14 06:42:53,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.50 vs. limit=15.0 2024-09-14 06:42:59,046 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2024-09-14 06:43:03,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89258.5, ans=0.1 2024-09-14 06:43:03,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2024-09-14 06:43:16,220 INFO [train.py:1198] (0/2) Epoch 5, batch 5950, loss[loss=0.389, ctc_loss=0.2922, cr_loss=0.4838, over 18219.00 frames. ], tot_loss[loss=0.3083, ctc_loss=0.2252, cr_loss=0.4153, over 4065117.20 frames. ], batch size: 108, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:43:23,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=89286.83333333333, ans=0.125 2024-09-14 06:43:45,058 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.145e+02 2.332e+02 2.572e+02 3.096e+02, threshold=4.663e+02, percent-clipped=0.0 2024-09-14 06:43:46,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89343.5, ans=0.1 2024-09-14 06:43:46,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=89343.5, ans=0.2 2024-09-14 06:43:51,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=89343.5, ans=0.125 2024-09-14 06:44:00,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=89371.83333333333, ans=0.125 2024-09-14 06:44:01,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=89371.83333333333, ans=0.125 2024-09-14 06:44:13,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=89400.16666666667, ans=0.125 2024-09-14 06:44:29,440 INFO [train.py:1198] (0/2) Epoch 5, batch 6000, loss[loss=0.297, ctc_loss=0.2135, cr_loss=0.4175, over 20979.00 frames. ], tot_loss[loss=0.3083, ctc_loss=0.2252, cr_loss=0.4153, over 4069336.54 frames. ], batch size: 55, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:44:29,440 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 06:44:47,866 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.5179, 4.0002, 2.8720, 3.7211], device='cuda:0') 2024-09-14 06:44:49,637 INFO [train.py:1230] (0/2) Epoch 5, validation: loss=0.06916, ctc_loss=0.06916, cr_loss=9.296e-15, over 944034.00 frames. 2024-09-14 06:44:49,637 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 06:44:49,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=89428.5, ans=0.5 2024-09-14 06:45:25,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2024-09-14 06:45:59,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89541.83333333333, ans=0.1 2024-09-14 06:46:03,638 INFO [train.py:1198] (0/2) Epoch 5, batch 6050, loss[loss=0.266, ctc_loss=0.1893, cr_loss=0.3836, over 19921.00 frames. ], tot_loss[loss=0.3094, ctc_loss=0.2261, cr_loss=0.4165, over 4071233.44 frames. ], batch size: 44, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:46:34,340 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.140e+02 2.421e+02 2.825e+02 4.309e+02, threshold=4.843e+02, percent-clipped=0.0 2024-09-14 06:47:15,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=89683.5, ans=0.2 2024-09-14 06:47:19,441 INFO [train.py:1198] (0/2) Epoch 5, batch 6100, loss[loss=0.2997, ctc_loss=0.2148, cr_loss=0.4244, over 20953.00 frames. ], tot_loss[loss=0.3072, ctc_loss=0.2243, cr_loss=0.4147, over 4081877.05 frames. ], batch size: 50, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:47:24,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=89711.83333333333, ans=0.125 2024-09-14 06:47:25,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=89711.83333333333, ans=0.0 2024-09-14 06:47:31,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89711.83333333333, ans=0.1 2024-09-14 06:48:01,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2024-09-14 06:48:07,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=89796.83333333333, ans=0.025 2024-09-14 06:48:24,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.19 vs. limit=22.5 2024-09-14 06:48:26,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=89825.16666666667, ans=0.0 2024-09-14 06:48:31,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-14 06:48:33,627 INFO [train.py:1198] (0/2) Epoch 5, batch 6150, loss[loss=0.2896, ctc_loss=0.2096, cr_loss=0.4, over 20960.00 frames. ], tot_loss[loss=0.3058, ctc_loss=0.223, cr_loss=0.4139, over 4086991.63 frames. ], batch size: 50, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:49:02,758 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.185e+02 2.420e+02 2.689e+02 5.052e+02, threshold=4.840e+02, percent-clipped=1.0 2024-09-14 06:49:03,460 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-09-14 06:49:07,944 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=22.5 2024-09-14 06:49:46,942 INFO [train.py:1198] (0/2) Epoch 5, batch 6200, loss[loss=0.3814, ctc_loss=0.2917, cr_loss=0.4481, over 14190.00 frames. ], tot_loss[loss=0.3073, ctc_loss=0.2244, cr_loss=0.4144, over 4061051.68 frames. ], batch size: 150, lr: 1.56e-02, grad_scale: 32.0 2024-09-14 06:49:51,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=89995.16666666667, ans=0.0 2024-09-14 06:50:16,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=90051.83333333333, ans=0.125 2024-09-14 06:50:24,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-09-14 06:50:28,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=90051.83333333333, ans=0.09899494936611666 2024-09-14 06:50:28,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=90051.83333333333, ans=0.125 2024-09-14 06:50:50,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=90108.5, ans=0.1 2024-09-14 06:50:53,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90108.5, ans=0.1 2024-09-14 06:51:02,362 INFO [train.py:1198] (0/2) Epoch 5, batch 6250, loss[loss=0.375, ctc_loss=0.2914, cr_loss=0.4183, over 14541.00 frames. ], tot_loss[loss=0.3098, ctc_loss=0.2266, cr_loss=0.416, over 4021549.62 frames. ], batch size: 149, lr: 1.55e-02, grad_scale: 32.0 2024-09-14 06:51:20,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=90165.16666666667, ans=0.0 2024-09-14 06:51:31,338 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.186e+02 2.467e+02 2.736e+02 5.742e+02, threshold=4.934e+02, percent-clipped=1.0 2024-09-14 06:51:34,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=90193.5, ans=0.07 2024-09-14 06:51:47,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=90221.83333333333, ans=0.0 2024-09-14 06:52:14,316 INFO [train.py:1198] (0/2) Epoch 5, batch 6300, loss[loss=0.3193, ctc_loss=0.2303, cr_loss=0.4451, over 21035.00 frames. ], tot_loss[loss=0.314, ctc_loss=0.2304, cr_loss=0.418, over 3964826.23 frames. ], batch size: 62, lr: 1.55e-02, grad_scale: 32.0 2024-09-14 06:52:32,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-14 06:53:09,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=90363.5, ans=0.125 2024-09-14 06:53:27,426 INFO [train.py:1198] (0/2) Epoch 5, batch 6350, loss[loss=0.397, ctc_loss=0.3125, cr_loss=0.4224, over 14408.00 frames. ], tot_loss[loss=0.3233, ctc_loss=0.2391, cr_loss=0.4207, over 3777575.69 frames. ], batch size: 151, lr: 1.55e-02, grad_scale: 32.0 2024-09-14 06:53:38,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90420.16666666667, ans=0.125 2024-09-14 06:53:55,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.453e+02 2.732e+02 3.097e+02 5.994e+02, threshold=5.463e+02, percent-clipped=2.0 2024-09-14 06:53:59,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=90476.83333333333, ans=0.0 2024-09-14 06:54:00,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90476.83333333333, ans=0.1 2024-09-14 06:54:07,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=90505.16666666667, ans=0.2 2024-09-14 06:54:24,008 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-5.pt 2024-09-14 06:55:10,265 INFO [train.py:1198] (0/2) Epoch 6, batch 0, loss[loss=0.3145, ctc_loss=0.2276, cr_loss=0.4349, over 21062.00 frames. ], tot_loss[loss=0.3145, ctc_loss=0.2276, cr_loss=0.4349, over 21062.00 frames. ], batch size: 56, lr: 1.45e-02, grad_scale: 64.0 2024-09-14 06:55:10,266 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 06:55:29,605 INFO [train.py:1230] (0/2) Epoch 6, validation: loss=0.07128, ctc_loss=0.07128, cr_loss=9.659e-15, over 944034.00 frames. 2024-09-14 06:55:29,606 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 06:55:49,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=90564.66666666667, ans=0.125 2024-09-14 06:55:54,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=90564.66666666667, ans=0.125 2024-09-14 06:56:07,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=90593.0, ans=0.125 2024-09-14 06:56:09,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-14 06:56:14,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2024-09-14 06:56:30,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=90649.66666666667, ans=0.125 2024-09-14 06:56:37,627 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-32000.pt 2024-09-14 06:56:52,294 INFO [train.py:1198] (0/2) Epoch 6, batch 50, loss[loss=0.2888, ctc_loss=0.2098, cr_loss=0.3951, over 21071.00 frames. ], tot_loss[loss=0.31, ctc_loss=0.2275, cr_loss=0.4129, over 915326.55 frames. ], batch size: 59, lr: 1.45e-02, grad_scale: 64.0 2024-09-14 06:57:21,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=90734.66666666667, ans=0.09899494936611666 2024-09-14 06:57:21,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=90734.66666666667, ans=0.125 2024-09-14 06:57:35,972 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.192e+02 2.577e+02 3.091e+02 5.422e+02, threshold=5.154e+02, percent-clipped=0.0 2024-09-14 06:57:39,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=90763.0, ans=0.125 2024-09-14 06:57:49,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90763.0, ans=0.125 2024-09-14 06:57:55,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=90791.33333333333, ans=0.5 2024-09-14 06:58:07,852 INFO [train.py:1198] (0/2) Epoch 6, batch 100, loss[loss=0.2801, ctc_loss=0.1981, cr_loss=0.4098, over 20960.00 frames. ], tot_loss[loss=0.3075, ctc_loss=0.2244, cr_loss=0.4151, over 1620739.32 frames. ], batch size: 51, lr: 1.45e-02, grad_scale: 64.0 2024-09-14 06:59:23,680 INFO [train.py:1198] (0/2) Epoch 6, batch 150, loss[loss=0.3125, ctc_loss=0.2282, cr_loss=0.4213, over 20963.00 frames. ], tot_loss[loss=0.3047, ctc_loss=0.2219, cr_loss=0.4138, over 2184471.21 frames. ], batch size: 58, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 06:59:24,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=90961.33333333333, ans=0.125 2024-09-14 07:00:07,708 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.128e+02 2.301e+02 2.592e+02 4.019e+02, threshold=4.603e+02, percent-clipped=0.0 2024-09-14 07:00:19,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2024-09-14 07:00:20,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=91046.33333333333, ans=0.0 2024-09-14 07:00:39,208 INFO [train.py:1198] (0/2) Epoch 6, batch 200, loss[loss=0.3274, ctc_loss=0.2452, cr_loss=0.4111, over 20849.00 frames. ], tot_loss[loss=0.3037, ctc_loss=0.221, cr_loss=0.4134, over 2611624.90 frames. ], batch size: 65, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:00:45,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=91103.0, ans=0.125 2024-09-14 07:01:21,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=91159.66666666667, ans=0.125 2024-09-14 07:01:22,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=91188.0, ans=0.125 2024-09-14 07:01:54,192 INFO [train.py:1198] (0/2) Epoch 6, batch 250, loss[loss=0.3759, ctc_loss=0.2843, cr_loss=0.458, over 13701.00 frames. ], tot_loss[loss=0.3043, ctc_loss=0.2213, cr_loss=0.4151, over 2934819.65 frames. ], batch size: 149, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:01:56,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91244.66666666667, ans=0.1 2024-09-14 07:02:18,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=91273.0, ans=0.0 2024-09-14 07:02:45,204 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.162e+02 2.423e+02 2.706e+02 4.760e+02, threshold=4.847e+02, percent-clipped=1.0 2024-09-14 07:03:16,857 INFO [train.py:1198] (0/2) Epoch 6, batch 300, loss[loss=0.2875, ctc_loss=0.2036, cr_loss=0.4194, over 21072.00 frames. ], tot_loss[loss=0.3026, ctc_loss=0.22, cr_loss=0.4131, over 3195437.18 frames. ], batch size: 59, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:03:21,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=91386.33333333333, ans=0.125 2024-09-14 07:03:55,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2024-09-14 07:04:15,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=91499.66666666667, ans=0.125 2024-09-14 07:04:20,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=91499.66666666667, ans=0.125 2024-09-14 07:04:23,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=91499.66666666667, ans=0.125 2024-09-14 07:04:25,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91499.66666666667, ans=0.1 2024-09-14 07:04:32,419 INFO [train.py:1198] (0/2) Epoch 6, batch 350, loss[loss=0.2783, ctc_loss=0.203, cr_loss=0.3762, over 20262.00 frames. ], tot_loss[loss=0.3029, ctc_loss=0.2201, cr_loss=0.4138, over 3396372.62 frames. ], batch size: 45, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:04:42,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=91528.0, ans=0.0 2024-09-14 07:04:51,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2024-09-14 07:05:00,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=91556.33333333333, ans=0.125 2024-09-14 07:05:05,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2024-09-14 07:05:11,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=91584.66666666667, ans=0.2 2024-09-14 07:05:16,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.322e+02 2.660e+02 3.181e+02 4.521e+02, threshold=5.320e+02, percent-clipped=0.0 2024-09-14 07:05:30,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=91613.0, ans=0.025 2024-09-14 07:05:41,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=91641.33333333333, ans=10.0 2024-09-14 07:05:48,390 INFO [train.py:1198] (0/2) Epoch 6, batch 400, loss[loss=0.3083, ctc_loss=0.2238, cr_loss=0.4223, over 20992.00 frames. ], tot_loss[loss=0.3035, ctc_loss=0.2207, cr_loss=0.4143, over 3561655.32 frames. ], batch size: 61, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:06:43,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91754.66666666667, ans=0.1 2024-09-14 07:07:03,017 INFO [train.py:1198] (0/2) Epoch 6, batch 450, loss[loss=0.3172, ctc_loss=0.2262, cr_loss=0.4549, over 20841.00 frames. ], tot_loss[loss=0.3048, ctc_loss=0.2216, cr_loss=0.4157, over 3670456.35 frames. ], batch size: 59, lr: 1.44e-02, grad_scale: 64.0 2024-09-14 07:07:10,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=91811.33333333333, ans=0.125 2024-09-14 07:07:33,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=91868.0, ans=0.125 2024-09-14 07:07:46,351 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.092e+02 2.348e+02 2.671e+02 3.553e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-14 07:07:48,602 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.43 vs. limit=10.0 2024-09-14 07:08:01,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=91924.66666666667, ans=0.125 2024-09-14 07:08:03,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=91924.66666666667, ans=0.125 2024-09-14 07:08:23,978 INFO [train.py:1198] (0/2) Epoch 6, batch 500, loss[loss=0.2991, ctc_loss=0.2125, cr_loss=0.4334, over 20813.00 frames. ], tot_loss[loss=0.3029, ctc_loss=0.2202, cr_loss=0.4138, over 3766460.51 frames. ], batch size: 59, lr: 1.44e-02, grad_scale: 32.0 2024-09-14 07:08:34,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=91953.0, ans=0.125 2024-09-14 07:08:49,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=91981.33333333333, ans=0.0 2024-09-14 07:09:39,080 INFO [train.py:1198] (0/2) Epoch 6, batch 550, loss[loss=0.2864, ctc_loss=0.2079, cr_loss=0.3927, over 20939.00 frames. ], tot_loss[loss=0.3021, ctc_loss=0.2193, cr_loss=0.4138, over 3835121.70 frames. ], batch size: 60, lr: 1.44e-02, grad_scale: 32.0 2024-09-14 07:09:46,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=92094.66666666667, ans=0.125 2024-09-14 07:10:24,064 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.659e+02 2.206e+02 2.463e+02 2.967e+02 4.675e+02, threshold=4.927e+02, percent-clipped=0.0 2024-09-14 07:10:54,446 INFO [train.py:1198] (0/2) Epoch 6, batch 600, loss[loss=0.3074, ctc_loss=0.2237, cr_loss=0.4181, over 20276.00 frames. ], tot_loss[loss=0.303, ctc_loss=0.22, cr_loss=0.415, over 3892142.64 frames. ], batch size: 74, lr: 1.44e-02, grad_scale: 32.0 2024-09-14 07:10:56,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=92236.33333333333, ans=0.0 2024-09-14 07:11:22,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.11 vs. limit=10.0 2024-09-14 07:11:29,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=92293.0, ans=0.2 2024-09-14 07:12:09,876 INFO [train.py:1198] (0/2) Epoch 6, batch 650, loss[loss=0.2812, ctc_loss=0.2072, cr_loss=0.3698, over 20896.00 frames. ], tot_loss[loss=0.3025, ctc_loss=0.2196, cr_loss=0.4141, over 3933524.30 frames. ], batch size: 54, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:12:10,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=92378.0, ans=0.125 2024-09-14 07:12:20,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=92378.0, ans=0.2 2024-09-14 07:12:54,799 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.230e+02 2.474e+02 3.000e+02 5.178e+02, threshold=4.949e+02, percent-clipped=1.0 2024-09-14 07:13:24,599 INFO [train.py:1198] (0/2) Epoch 6, batch 700, loss[loss=0.2606, ctc_loss=0.1858, cr_loss=0.3744, over 20902.00 frames. ], tot_loss[loss=0.3011, ctc_loss=0.2185, cr_loss=0.4132, over 3977879.63 frames. ], batch size: 57, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:14:12,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=22.5 2024-09-14 07:14:46,086 INFO [train.py:1198] (0/2) Epoch 6, batch 750, loss[loss=0.2826, ctc_loss=0.2056, cr_loss=0.3851, over 21097.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.2192, cr_loss=0.4128, over 3999648.15 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:14:49,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=92661.33333333333, ans=0.2 2024-09-14 07:15:23,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=92718.0, ans=0.125 2024-09-14 07:15:31,202 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.110e+02 2.298e+02 2.553e+02 4.101e+02, threshold=4.596e+02, percent-clipped=0.0 2024-09-14 07:15:49,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=92774.66666666667, ans=0.125 2024-09-14 07:16:01,318 INFO [train.py:1198] (0/2) Epoch 6, batch 800, loss[loss=0.2884, ctc_loss=0.2047, cr_loss=0.4187, over 20784.00 frames. ], tot_loss[loss=0.3029, ctc_loss=0.2201, cr_loss=0.4139, over 4019750.61 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:16:33,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=92859.66666666667, ans=0.125 2024-09-14 07:16:39,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=92859.66666666667, ans=0.125 2024-09-14 07:16:58,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=92888.0, ans=0.125 2024-09-14 07:17:10,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=92916.33333333333, ans=0.125 2024-09-14 07:17:16,001 INFO [train.py:1198] (0/2) Epoch 6, batch 850, loss[loss=0.314, ctc_loss=0.2262, cr_loss=0.439, over 20979.00 frames. ], tot_loss[loss=0.3032, ctc_loss=0.2203, cr_loss=0.4146, over 4032540.44 frames. ], batch size: 64, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:17:43,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=92973.0, ans=0.125 2024-09-14 07:18:01,094 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.159e+02 2.447e+02 2.716e+02 5.128e+02, threshold=4.894e+02, percent-clipped=1.0 2024-09-14 07:18:30,996 INFO [train.py:1198] (0/2) Epoch 6, batch 900, loss[loss=0.3866, ctc_loss=0.2997, cr_loss=0.4347, over 14241.00 frames. ], tot_loss[loss=0.3059, ctc_loss=0.2224, cr_loss=0.4176, over 4031539.98 frames. ], batch size: 149, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:18:37,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93086.33333333333, ans=0.1 2024-09-14 07:18:41,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=93086.33333333333, ans=0.125 2024-09-14 07:18:56,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=93114.66666666667, ans=0.0 2024-09-14 07:19:43,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93199.66666666667, ans=0.1 2024-09-14 07:19:48,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=22.5 2024-09-14 07:19:51,860 INFO [train.py:1198] (0/2) Epoch 6, batch 950, loss[loss=0.2477, ctc_loss=0.1763, cr_loss=0.357, over 20962.00 frames. ], tot_loss[loss=0.3047, ctc_loss=0.2214, cr_loss=0.4164, over 4045723.77 frames. ], batch size: 50, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:20:06,183 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.80 vs. limit=10.0 2024-09-14 07:20:16,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-09-14 07:20:37,143 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.452e+02 2.996e+02 3.444e+02 5.252e+02, threshold=5.991e+02, percent-clipped=4.0 2024-09-14 07:21:07,335 INFO [train.py:1198] (0/2) Epoch 6, batch 1000, loss[loss=0.2746, ctc_loss=0.1957, cr_loss=0.3948, over 20986.00 frames. ], tot_loss[loss=0.304, ctc_loss=0.2209, cr_loss=0.4155, over 4057861.04 frames. ], batch size: 52, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:21:18,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=93369.66666666667, ans=0.0 2024-09-14 07:21:21,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=93398.0, ans=0.125 2024-09-14 07:21:34,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=93398.0, ans=0.125 2024-09-14 07:21:36,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=93426.33333333333, ans=0.0 2024-09-14 07:21:59,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=93454.66666666667, ans=0.125 2024-09-14 07:22:22,011 INFO [train.py:1198] (0/2) Epoch 6, batch 1050, loss[loss=0.3377, ctc_loss=0.2555, cr_loss=0.4107, over 18085.00 frames. ], tot_loss[loss=0.3043, ctc_loss=0.2213, cr_loss=0.4152, over 4067036.83 frames. ], batch size: 108, lr: 1.43e-02, grad_scale: 32.0 2024-09-14 07:22:22,812 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.78 vs. limit=10.0 2024-09-14 07:22:45,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93539.66666666667, ans=0.1 2024-09-14 07:22:49,581 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 07:23:03,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=93568.0, ans=0.125 2024-09-14 07:23:07,302 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.152e+02 2.467e+02 2.932e+02 4.425e+02, threshold=4.934e+02, percent-clipped=0.0 2024-09-14 07:23:13,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=93596.33333333333, ans=0.0 2024-09-14 07:23:35,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93653.0, ans=0.1 2024-09-14 07:23:37,076 INFO [train.py:1198] (0/2) Epoch 6, batch 1100, loss[loss=0.293, ctc_loss=0.2096, cr_loss=0.4171, over 21072.00 frames. ], tot_loss[loss=0.3046, ctc_loss=0.2214, cr_loss=0.4159, over 4080027.18 frames. ], batch size: 59, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:23:51,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=93681.33333333333, ans=0.0 2024-09-14 07:23:55,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=93681.33333333333, ans=0.025 2024-09-14 07:24:04,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=93681.33333333333, ans=0.125 2024-09-14 07:24:33,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2024-09-14 07:24:34,995 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 07:24:36,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2024-09-14 07:24:52,506 INFO [train.py:1198] (0/2) Epoch 6, batch 1150, loss[loss=0.3172, ctc_loss=0.2289, cr_loss=0.4415, over 20687.00 frames. ], tot_loss[loss=0.3044, ctc_loss=0.2212, cr_loss=0.4162, over 4078522.71 frames. ], batch size: 68, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:24:58,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-09-14 07:25:11,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93823.0, ans=0.1 2024-09-14 07:25:12,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=93823.0, ans=0.0 2024-09-14 07:25:36,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=93851.33333333333, ans=0.0 2024-09-14 07:25:43,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.333e+02 2.657e+02 3.154e+02 5.433e+02, threshold=5.314e+02, percent-clipped=1.0 2024-09-14 07:25:55,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-09-14 07:26:02,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=93908.0, ans=0.02 2024-09-14 07:26:14,092 INFO [train.py:1198] (0/2) Epoch 6, batch 1200, loss[loss=0.3009, ctc_loss=0.218, cr_loss=0.4144, over 20887.00 frames. ], tot_loss[loss=0.3022, ctc_loss=0.2193, cr_loss=0.4146, over 4091153.65 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:26:28,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=12.0 2024-09-14 07:26:44,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=93993.0, ans=0.125 2024-09-14 07:26:49,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93993.0, ans=0.1 2024-09-14 07:26:50,907 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 07:26:52,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=93993.0, ans=0.0 2024-09-14 07:27:17,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=94049.66666666667, ans=0.125 2024-09-14 07:27:27,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=12.0 2024-09-14 07:27:29,644 INFO [train.py:1198] (0/2) Epoch 6, batch 1250, loss[loss=0.2943, ctc_loss=0.2075, cr_loss=0.4338, over 21005.00 frames. ], tot_loss[loss=0.3012, ctc_loss=0.2184, cr_loss=0.414, over 4085236.83 frames. ], batch size: 63, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:27:31,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=94078.0, ans=0.2 2024-09-14 07:27:31,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94078.0, ans=0.1 2024-09-14 07:28:14,109 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.128e+02 2.280e+02 2.536e+02 3.412e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-14 07:28:43,933 INFO [train.py:1198] (0/2) Epoch 6, batch 1300, loss[loss=0.3156, ctc_loss=0.2323, cr_loss=0.4164, over 20936.00 frames. ], tot_loss[loss=0.3004, ctc_loss=0.2179, cr_loss=0.4128, over 4096963.29 frames. ], batch size: 60, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:28:52,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94219.66666666667, ans=0.1 2024-09-14 07:29:20,385 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-09-14 07:29:58,827 INFO [train.py:1198] (0/2) Epoch 6, batch 1350, loss[loss=0.2691, ctc_loss=0.1932, cr_loss=0.3799, over 20766.00 frames. ], tot_loss[loss=0.3006, ctc_loss=0.2181, cr_loss=0.4126, over 4100702.28 frames. ], batch size: 53, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:30:10,026 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-09-14 07:30:18,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=94389.66666666667, ans=0.125 2024-09-14 07:30:23,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=94389.66666666667, ans=0.95 2024-09-14 07:30:47,091 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.106e+02 2.252e+02 2.526e+02 4.986e+02, threshold=4.503e+02, percent-clipped=1.0 2024-09-14 07:30:47,558 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 07:30:51,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=94446.33333333333, ans=0.025 2024-09-14 07:30:57,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=94446.33333333333, ans=0.125 2024-09-14 07:31:20,077 INFO [train.py:1198] (0/2) Epoch 6, batch 1400, loss[loss=0.2618, ctc_loss=0.1912, cr_loss=0.353, over 20323.00 frames. ], tot_loss[loss=0.3023, ctc_loss=0.2196, cr_loss=0.4132, over 4089406.93 frames. ], batch size: 45, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:31:34,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.15 vs. limit=6.0 2024-09-14 07:31:39,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=94531.33333333333, ans=0.125 2024-09-14 07:31:55,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=94559.66666666667, ans=0.125 2024-09-14 07:31:58,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=94559.66666666667, ans=0.125 2024-09-14 07:32:05,786 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-14 07:32:28,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=94616.33333333333, ans=0.125 2024-09-14 07:32:35,377 INFO [train.py:1198] (0/2) Epoch 6, batch 1450, loss[loss=0.2996, ctc_loss=0.2192, cr_loss=0.4021, over 21078.00 frames. ], tot_loss[loss=0.3031, ctc_loss=0.2203, cr_loss=0.4139, over 4080741.28 frames. ], batch size: 59, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:32:58,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=94673.0, ans=0.125 2024-09-14 07:33:20,413 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.216e+02 2.482e+02 2.897e+02 5.742e+02, threshold=4.965e+02, percent-clipped=1.0 2024-09-14 07:33:49,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94786.33333333333, ans=0.1 2024-09-14 07:33:50,311 INFO [train.py:1198] (0/2) Epoch 6, batch 1500, loss[loss=0.2844, ctc_loss=0.2039, cr_loss=0.4024, over 20860.00 frames. ], tot_loss[loss=0.3024, ctc_loss=0.2197, cr_loss=0.4133, over 4083004.33 frames. ], batch size: 54, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:34:07,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=94814.66666666667, ans=0.0 2024-09-14 07:34:39,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=22.5 2024-09-14 07:34:56,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=94899.66666666667, ans=0.125 2024-09-14 07:35:05,997 INFO [train.py:1198] (0/2) Epoch 6, batch 1550, loss[loss=0.3106, ctc_loss=0.2209, cr_loss=0.4485, over 21066.00 frames. ], tot_loss[loss=0.3012, ctc_loss=0.2186, cr_loss=0.4127, over 4098178.48 frames. ], batch size: 59, lr: 1.42e-02, grad_scale: 32.0 2024-09-14 07:35:16,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=94928.0, ans=0.125 2024-09-14 07:35:24,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=94956.33333333333, ans=0.125 2024-09-14 07:35:33,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=94956.33333333333, ans=0.0 2024-09-14 07:35:50,546 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.334e+02 2.720e+02 3.342e+02 4.593e+02, threshold=5.439e+02, percent-clipped=0.0 2024-09-14 07:36:18,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=95041.33333333333, ans=0.125 2024-09-14 07:36:23,704 INFO [train.py:1198] (0/2) Epoch 6, batch 1600, loss[loss=0.336, ctc_loss=0.2465, cr_loss=0.4474, over 21017.00 frames. ], tot_loss[loss=0.303, ctc_loss=0.2201, cr_loss=0.4149, over 4087040.25 frames. ], batch size: 63, lr: 1.41e-02, grad_scale: 32.0 2024-09-14 07:36:35,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=95069.66666666667, ans=0.125 2024-09-14 07:37:17,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=12.0 2024-09-14 07:37:17,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=95154.66666666667, ans=0.2 2024-09-14 07:37:29,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=95183.0, ans=0.125 2024-09-14 07:37:34,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-14 07:37:41,693 INFO [train.py:1198] (0/2) Epoch 6, batch 1650, loss[loss=0.3356, ctc_loss=0.2421, cr_loss=0.4675, over 20647.00 frames. ], tot_loss[loss=0.3032, ctc_loss=0.2201, cr_loss=0.4155, over 4089433.90 frames. ], batch size: 68, lr: 1.41e-02, grad_scale: 32.0 2024-09-14 07:37:57,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=95239.66666666667, ans=0.125 2024-09-14 07:38:26,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.254e+02 2.468e+02 2.996e+02 5.862e+02, threshold=4.937e+02, percent-clipped=1.0 2024-09-14 07:38:57,343 INFO [train.py:1198] (0/2) Epoch 6, batch 1700, loss[loss=0.3327, ctc_loss=0.2431, cr_loss=0.4478, over 20661.00 frames. ], tot_loss[loss=0.3021, ctc_loss=0.2192, cr_loss=0.4146, over 4096355.20 frames. ], batch size: 66, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:39:05,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=95353.0, ans=0.025 2024-09-14 07:39:20,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=95381.33333333333, ans=0.035 2024-09-14 07:39:29,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=95409.66666666667, ans=0.125 2024-09-14 07:40:05,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95466.33333333333, ans=0.1 2024-09-14 07:40:09,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2024-09-14 07:40:13,008 INFO [train.py:1198] (0/2) Epoch 6, batch 1750, loss[loss=0.312, ctc_loss=0.2277, cr_loss=0.4214, over 20977.00 frames. ], tot_loss[loss=0.3024, ctc_loss=0.2196, cr_loss=0.4142, over 4090928.01 frames. ], batch size: 64, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:40:14,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=95494.66666666667, ans=0.09899494936611666 2024-09-14 07:40:25,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=95494.66666666667, ans=0.125 2024-09-14 07:40:40,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=95523.0, ans=0.2 2024-09-14 07:40:49,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=95551.33333333333, ans=0.0 2024-09-14 07:40:52,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95551.33333333333, ans=0.1 2024-09-14 07:40:59,645 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.226e+02 2.397e+02 2.692e+02 3.705e+02, threshold=4.795e+02, percent-clipped=0.0 2024-09-14 07:41:06,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=95579.66666666667, ans=0.0 2024-09-14 07:41:13,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=95608.0, ans=0.0 2024-09-14 07:41:28,515 INFO [train.py:1198] (0/2) Epoch 6, batch 1800, loss[loss=0.2921, ctc_loss=0.2134, cr_loss=0.3934, over 21052.00 frames. ], tot_loss[loss=0.3028, ctc_loss=0.2198, cr_loss=0.4149, over 4097214.33 frames. ], batch size: 53, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:41:44,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-09-14 07:42:03,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.34 vs. limit=10.0 2024-09-14 07:42:39,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=95749.66666666667, ans=0.125 2024-09-14 07:42:47,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=95749.66666666667, ans=0.2 2024-09-14 07:42:49,616 INFO [train.py:1198] (0/2) Epoch 6, batch 1850, loss[loss=0.3045, ctc_loss=0.2199, cr_loss=0.423, over 21053.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.219, cr_loss=0.4139, over 4104071.02 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:43:18,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2024-09-14 07:43:24,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=95834.66666666667, ans=0.025 2024-09-14 07:43:36,121 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.132e+02 2.311e+02 2.496e+02 4.751e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-14 07:43:58,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2024-09-14 07:44:00,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=95891.33333333333, ans=0.125 2024-09-14 07:44:04,441 INFO [train.py:1198] (0/2) Epoch 6, batch 1900, loss[loss=0.2868, ctc_loss=0.2112, cr_loss=0.3779, over 19974.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.219, cr_loss=0.4134, over 4090987.51 frames. ], batch size: 44, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:44:13,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=95919.66666666667, ans=0.125 2024-09-14 07:44:20,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=95948.0, ans=0.125 2024-09-14 07:44:49,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=96004.66666666667, ans=0.0 2024-09-14 07:45:02,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=96004.66666666667, ans=0.125 2024-09-14 07:45:20,107 INFO [train.py:1198] (0/2) Epoch 6, batch 1950, loss[loss=0.2938, ctc_loss=0.2142, cr_loss=0.3982, over 20872.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.2188, cr_loss=0.4143, over 4094850.72 frames. ], batch size: 54, lr: 1.41e-02, grad_scale: 16.0 2024-09-14 07:45:20,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=96061.33333333333, ans=0.0 2024-09-14 07:45:24,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=96061.33333333333, ans=0.025 2024-09-14 07:46:06,844 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.169e+02 2.396e+02 2.758e+02 4.232e+02, threshold=4.791e+02, percent-clipped=0.0 2024-09-14 07:46:34,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=96203.0, ans=0.025 2024-09-14 07:46:35,457 INFO [train.py:1198] (0/2) Epoch 6, batch 2000, loss[loss=0.3203, ctc_loss=0.2322, cr_loss=0.4409, over 20706.00 frames. ], tot_loss[loss=0.301, ctc_loss=0.2183, cr_loss=0.4134, over 4098606.01 frames. ], batch size: 68, lr: 1.41e-02, grad_scale: 32.0 2024-09-14 07:46:38,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=96203.0, ans=0.125 2024-09-14 07:46:45,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=96203.0, ans=0.0 2024-09-14 07:46:48,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=96203.0, ans=0.07 2024-09-14 07:46:49,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=96231.33333333333, ans=0.125 2024-09-14 07:46:51,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96231.33333333333, ans=0.1 2024-09-14 07:47:16,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=96259.66666666667, ans=0.125 2024-09-14 07:47:28,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=96288.0, ans=0.125 2024-09-14 07:47:53,804 INFO [train.py:1198] (0/2) Epoch 6, batch 2050, loss[loss=0.3027, ctc_loss=0.2223, cr_loss=0.4017, over 20666.00 frames. ], tot_loss[loss=0.3009, ctc_loss=0.2181, cr_loss=0.4138, over 4114113.95 frames. ], batch size: 66, lr: 1.41e-02, grad_scale: 32.0 2024-09-14 07:48:12,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=96373.0, ans=0.0 2024-09-14 07:48:15,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=96373.0, ans=0.035 2024-09-14 07:48:27,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=96401.33333333333, ans=0.025 2024-09-14 07:48:43,607 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.134e+02 2.316e+02 2.679e+02 4.546e+02, threshold=4.632e+02, percent-clipped=0.0 2024-09-14 07:48:51,323 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 07:49:11,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=96486.33333333333, ans=0.0 2024-09-14 07:49:12,174 INFO [train.py:1198] (0/2) Epoch 6, batch 2100, loss[loss=0.2789, ctc_loss=0.2025, cr_loss=0.3823, over 20349.00 frames. ], tot_loss[loss=0.299, ctc_loss=0.2164, cr_loss=0.4127, over 4115562.40 frames. ], batch size: 74, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:49:41,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=96543.0, ans=0.0 2024-09-14 07:50:27,262 INFO [train.py:1198] (0/2) Epoch 6, batch 2150, loss[loss=0.2652, ctc_loss=0.1909, cr_loss=0.3718, over 20891.00 frames. ], tot_loss[loss=0.2982, ctc_loss=0.2158, cr_loss=0.4118, over 4116496.21 frames. ], batch size: 54, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:50:33,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=96628.0, ans=0.2 2024-09-14 07:50:35,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=96628.0, ans=0.0 2024-09-14 07:50:51,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.05 vs. limit=15.0 2024-09-14 07:51:14,516 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.129e+02 2.323e+02 2.669e+02 3.552e+02, threshold=4.647e+02, percent-clipped=0.0 2024-09-14 07:51:19,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=96713.0, ans=0.09899494936611666 2024-09-14 07:51:27,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=96741.33333333333, ans=0.125 2024-09-14 07:51:39,583 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-14 07:51:39,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.56 vs. limit=10.0 2024-09-14 07:51:43,462 INFO [train.py:1198] (0/2) Epoch 6, batch 2200, loss[loss=0.3242, ctc_loss=0.2367, cr_loss=0.4374, over 20959.00 frames. ], tot_loss[loss=0.2988, ctc_loss=0.2164, cr_loss=0.4118, over 4113829.66 frames. ], batch size: 64, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:51:49,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=96769.66666666667, ans=0.0 2024-09-14 07:52:11,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.54 vs. limit=15.0 2024-09-14 07:52:30,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=96854.66666666667, ans=0.0 2024-09-14 07:52:39,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.22 vs. limit=15.0 2024-09-14 07:52:45,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=96883.0, ans=0.2 2024-09-14 07:52:58,956 INFO [train.py:1198] (0/2) Epoch 6, batch 2250, loss[loss=0.3472, ctc_loss=0.2675, cr_loss=0.3982, over 14056.00 frames. ], tot_loss[loss=0.2985, ctc_loss=0.2162, cr_loss=0.4113, over 4105172.22 frames. ], batch size: 150, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:53:14,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=96939.66666666667, ans=0.2 2024-09-14 07:53:22,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=96939.66666666667, ans=0.2 2024-09-14 07:53:22,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=96939.66666666667, ans=0.2 2024-09-14 07:53:43,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=96968.0, ans=0.0 2024-09-14 07:53:49,043 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.122e+02 2.479e+02 2.958e+02 5.399e+02, threshold=4.959e+02, percent-clipped=1.0 2024-09-14 07:53:49,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2024-09-14 07:54:16,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=97024.66666666667, ans=0.125 2024-09-14 07:54:19,935 INFO [train.py:1198] (0/2) Epoch 6, batch 2300, loss[loss=0.2493, ctc_loss=0.1779, cr_loss=0.3572, over 21079.00 frames. ], tot_loss[loss=0.2989, ctc_loss=0.2166, cr_loss=0.4117, over 4104301.20 frames. ], batch size: 53, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:54:23,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=97053.0, ans=0.125 2024-09-14 07:54:30,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=97053.0, ans=0.125 2024-09-14 07:54:50,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=97109.66666666667, ans=0.025 2024-09-14 07:54:52,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-14 07:55:07,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-09-14 07:55:09,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=97138.0, ans=0.125 2024-09-14 07:55:35,207 INFO [train.py:1198] (0/2) Epoch 6, batch 2350, loss[loss=0.2819, ctc_loss=0.2065, cr_loss=0.3769, over 21027.00 frames. ], tot_loss[loss=0.2993, ctc_loss=0.2169, cr_loss=0.4118, over 4109394.12 frames. ], batch size: 62, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:56:21,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.220e+02 2.402e+02 2.900e+02 4.265e+02, threshold=4.805e+02, percent-clipped=0.0 2024-09-14 07:56:37,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=97308.0, ans=0.0 2024-09-14 07:56:50,333 INFO [train.py:1198] (0/2) Epoch 6, batch 2400, loss[loss=0.2838, ctc_loss=0.2032, cr_loss=0.4031, over 20964.00 frames. ], tot_loss[loss=0.2988, ctc_loss=0.2165, cr_loss=0.4113, over 4102237.60 frames. ], batch size: 55, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:57:10,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=97364.66666666667, ans=0.125 2024-09-14 07:57:16,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=97364.66666666667, ans=0.1 2024-09-14 07:57:26,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=97393.0, ans=0.1 2024-09-14 07:57:46,385 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-14 07:58:04,855 INFO [train.py:1198] (0/2) Epoch 6, batch 2450, loss[loss=0.3142, ctc_loss=0.2318, cr_loss=0.4117, over 19516.00 frames. ], tot_loss[loss=0.2996, ctc_loss=0.2173, cr_loss=0.4118, over 4101861.60 frames. ], batch size: 90, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:58:21,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=97506.33333333333, ans=0.125 2024-09-14 07:58:27,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=97506.33333333333, ans=0.0 2024-09-14 07:58:38,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2024-09-14 07:58:51,438 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.158e+02 2.367e+02 2.694e+02 5.623e+02, threshold=4.734e+02, percent-clipped=1.0 2024-09-14 07:59:22,806 INFO [train.py:1198] (0/2) Epoch 6, batch 2500, loss[loss=0.305, ctc_loss=0.2203, cr_loss=0.4234, over 20729.00 frames. ], tot_loss[loss=0.3006, ctc_loss=0.2181, cr_loss=0.4126, over 4100368.96 frames. ], batch size: 71, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 07:59:24,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=97619.66666666667, ans=0.0 2024-09-14 07:59:24,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=97619.66666666667, ans=0.0 2024-09-14 07:59:26,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=15.0 2024-09-14 07:59:39,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=97648.0, ans=0.125 2024-09-14 08:00:02,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=97676.33333333333, ans=0.125 2024-09-14 08:00:32,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=97733.0, ans=0.125 2024-09-14 08:00:40,795 INFO [train.py:1198] (0/2) Epoch 6, batch 2550, loss[loss=0.254, ctc_loss=0.1797, cr_loss=0.3715, over 20958.00 frames. ], tot_loss[loss=0.2995, ctc_loss=0.2172, cr_loss=0.4115, over 4102898.58 frames. ], batch size: 48, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 08:00:54,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=97789.66666666667, ans=0.125 2024-09-14 08:00:57,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97789.66666666667, ans=0.1 2024-09-14 08:01:00,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=97789.66666666667, ans=0.015 2024-09-14 08:01:06,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=97789.66666666667, ans=0.125 2024-09-14 08:01:13,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-14 08:01:28,062 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.179e+02 2.442e+02 2.743e+02 4.136e+02, threshold=4.884e+02, percent-clipped=0.0 2024-09-14 08:01:30,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=22.5 2024-09-14 08:01:56,860 INFO [train.py:1198] (0/2) Epoch 6, batch 2600, loss[loss=0.2519, ctc_loss=0.1797, cr_loss=0.3611, over 20978.00 frames. ], tot_loss[loss=0.3007, ctc_loss=0.2181, cr_loss=0.4131, over 4106411.34 frames. ], batch size: 51, lr: 1.40e-02, grad_scale: 32.0 2024-09-14 08:02:01,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=97903.0, ans=0.125 2024-09-14 08:02:04,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=97903.0, ans=0.125 2024-09-14 08:03:09,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=98016.33333333333, ans=0.125 2024-09-14 08:03:11,944 INFO [train.py:1198] (0/2) Epoch 6, batch 2650, loss[loss=0.2824, ctc_loss=0.2058, cr_loss=0.3829, over 21064.00 frames. ], tot_loss[loss=0.3013, ctc_loss=0.2185, cr_loss=0.4139, over 4098233.92 frames. ], batch size: 59, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:03:54,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=98101.33333333333, ans=0.125 2024-09-14 08:03:59,020 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.239e+02 2.576e+02 3.103e+02 5.234e+02, threshold=5.152e+02, percent-clipped=2.0 2024-09-14 08:04:27,456 INFO [train.py:1198] (0/2) Epoch 6, batch 2700, loss[loss=0.2368, ctc_loss=0.1662, cr_loss=0.3529, over 20891.00 frames. ], tot_loss[loss=0.3004, ctc_loss=0.2178, cr_loss=0.4131, over 4100916.23 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:04:55,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-09-14 08:04:57,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=98214.66666666667, ans=0.125 2024-09-14 08:05:18,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=98271.33333333333, ans=0.2 2024-09-14 08:05:19,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=22.5 2024-09-14 08:05:34,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=98299.66666666667, ans=0.125 2024-09-14 08:05:48,139 INFO [train.py:1198] (0/2) Epoch 6, batch 2750, loss[loss=0.3111, ctc_loss=0.2261, cr_loss=0.4254, over 20845.00 frames. ], tot_loss[loss=0.3007, ctc_loss=0.218, cr_loss=0.4134, over 4087747.46 frames. ], batch size: 65, lr: 1.39e-02, grad_scale: 16.0 2024-09-14 08:06:06,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=98356.33333333333, ans=0.05 2024-09-14 08:06:26,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=98384.66666666667, ans=0.1 2024-09-14 08:06:35,385 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.141e+02 2.401e+02 2.853e+02 4.229e+02, threshold=4.801e+02, percent-clipped=0.0 2024-09-14 08:06:41,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=98413.0, ans=0.1 2024-09-14 08:07:02,677 INFO [train.py:1198] (0/2) Epoch 6, batch 2800, loss[loss=0.2931, ctc_loss=0.2148, cr_loss=0.3912, over 21083.00 frames. ], tot_loss[loss=0.3, ctc_loss=0.2175, cr_loss=0.4127, over 4087744.43 frames. ], batch size: 59, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:08:18,254 INFO [train.py:1198] (0/2) Epoch 6, batch 2850, loss[loss=0.2865, ctc_loss=0.2076, cr_loss=0.3948, over 21057.00 frames. ], tot_loss[loss=0.2978, ctc_loss=0.2157, cr_loss=0.4108, over 4096726.21 frames. ], batch size: 62, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:08:22,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=98611.33333333333, ans=0.2 2024-09-14 08:08:24,954 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=15.0 2024-09-14 08:08:29,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.49 vs. limit=15.0 2024-09-14 08:09:06,284 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.291e+02 2.592e+02 3.023e+02 9.557e+02, threshold=5.183e+02, percent-clipped=2.0 2024-09-14 08:09:20,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=98724.66666666667, ans=0.125 2024-09-14 08:09:31,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-09-14 08:09:33,591 INFO [train.py:1198] (0/2) Epoch 6, batch 2900, loss[loss=0.3267, ctc_loss=0.2369, cr_loss=0.4488, over 19489.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.216, cr_loss=0.4122, over 4104839.35 frames. ], batch size: 90, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:10:17,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.87 vs. limit=22.5 2024-09-14 08:10:35,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=98866.33333333333, ans=0.125 2024-09-14 08:10:40,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=98866.33333333333, ans=0.125 2024-09-14 08:10:51,774 INFO [train.py:1198] (0/2) Epoch 6, batch 2950, loss[loss=0.3141, ctc_loss=0.2262, cr_loss=0.4396, over 20965.00 frames. ], tot_loss[loss=0.2973, ctc_loss=0.215, cr_loss=0.4114, over 4104859.90 frames. ], batch size: 58, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:11:38,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=98979.66666666667, ans=0.025 2024-09-14 08:11:42,701 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.133e+02 2.361e+02 2.750e+02 4.226e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-14 08:11:42,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98979.66666666667, ans=0.1 2024-09-14 08:11:54,208 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=12.0 2024-09-14 08:11:56,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=99008.0, ans=0.0 2024-09-14 08:12:10,168 INFO [train.py:1198] (0/2) Epoch 6, batch 3000, loss[loss=0.2781, ctc_loss=0.1967, cr_loss=0.4069, over 19088.00 frames. ], tot_loss[loss=0.2973, ctc_loss=0.2151, cr_loss=0.4108, over 4105148.46 frames. ], batch size: 42, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:12:10,169 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 08:12:27,164 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7817, 4.3645, 4.1347, 4.6999], device='cuda:0') 2024-09-14 08:12:30,999 INFO [train.py:1230] (0/2) Epoch 6, validation: loss=0.06701, ctc_loss=0.06701, cr_loss=9.557e-15, over 944034.00 frames. 2024-09-14 08:12:31,000 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 08:13:04,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=99093.0, ans=0.0 2024-09-14 08:13:47,805 INFO [train.py:1198] (0/2) Epoch 6, batch 3050, loss[loss=0.3813, ctc_loss=0.2877, cr_loss=0.4679, over 14550.00 frames. ], tot_loss[loss=0.2979, ctc_loss=0.2157, cr_loss=0.4109, over 4087540.77 frames. ], batch size: 149, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:14:03,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99206.33333333333, ans=0.1 2024-09-14 08:14:09,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=99206.33333333333, ans=0.125 2024-09-14 08:14:15,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=99206.33333333333, ans=0.2 2024-09-14 08:14:33,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=99263.0, ans=0.0 2024-09-14 08:14:36,177 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.214e+02 2.500e+02 2.844e+02 3.726e+02, threshold=4.999e+02, percent-clipped=0.0 2024-09-14 08:15:03,413 INFO [train.py:1198] (0/2) Epoch 6, batch 3100, loss[loss=0.2964, ctc_loss=0.2157, cr_loss=0.4034, over 21022.00 frames. ], tot_loss[loss=0.299, ctc_loss=0.2165, cr_loss=0.4124, over 4078224.49 frames. ], batch size: 62, lr: 1.39e-02, grad_scale: 32.0 2024-09-14 08:15:34,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.18 vs. limit=22.5 2024-09-14 08:15:41,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99376.33333333333, ans=0.1 2024-09-14 08:15:51,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=99404.66666666667, ans=0.2 2024-09-14 08:16:21,505 INFO [train.py:1198] (0/2) Epoch 6, batch 3150, loss[loss=0.3537, ctc_loss=0.2763, cr_loss=0.3869, over 14669.00 frames. ], tot_loss[loss=0.2998, ctc_loss=0.2172, cr_loss=0.4132, over 4077494.88 frames. ], batch size: 149, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:16:52,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=99489.66666666667, ans=0.125 2024-09-14 08:17:12,896 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.248e+02 2.527e+02 3.281e+02 4.804e+02, threshold=5.055e+02, percent-clipped=0.0 2024-09-14 08:17:13,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=99546.33333333333, ans=0.0 2024-09-14 08:17:39,920 INFO [train.py:1198] (0/2) Epoch 6, batch 3200, loss[loss=0.2852, ctc_loss=0.2038, cr_loss=0.4069, over 20985.00 frames. ], tot_loss[loss=0.2996, ctc_loss=0.217, cr_loss=0.4129, over 4075378.02 frames. ], batch size: 52, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:17:59,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=99631.33333333333, ans=0.0 2024-09-14 08:18:04,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=99631.33333333333, ans=0.125 2024-09-14 08:18:23,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=99688.0, ans=0.09899494936611666 2024-09-14 08:18:30,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.70 vs. limit=22.5 2024-09-14 08:18:48,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-14 08:18:51,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=99716.33333333333, ans=0.5 2024-09-14 08:18:55,450 INFO [train.py:1198] (0/2) Epoch 6, batch 3250, loss[loss=0.4031, ctc_loss=0.3121, cr_loss=0.4553, over 14286.00 frames. ], tot_loss[loss=0.3007, ctc_loss=0.218, cr_loss=0.4135, over 4067941.75 frames. ], batch size: 149, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:18:55,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=99744.66666666667, ans=0.025 2024-09-14 08:18:55,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=99744.66666666667, ans=0.0 2024-09-14 08:19:12,959 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.06 vs. limit=6.0 2024-09-14 08:19:19,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=99773.0, ans=0.125 2024-09-14 08:19:26,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=99801.33333333333, ans=0.025 2024-09-14 08:19:43,886 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.121e+02 2.272e+02 2.581e+02 3.879e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-14 08:19:47,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=99829.66666666667, ans=0.125 2024-09-14 08:20:10,854 INFO [train.py:1198] (0/2) Epoch 6, batch 3300, loss[loss=0.2866, ctc_loss=0.2129, cr_loss=0.3686, over 20257.00 frames. ], tot_loss[loss=0.2996, ctc_loss=0.2171, cr_loss=0.4124, over 4075030.20 frames. ], batch size: 74, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:20:56,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=99971.33333333333, ans=0.125 2024-09-14 08:21:26,436 INFO [train.py:1198] (0/2) Epoch 6, batch 3350, loss[loss=0.3182, ctc_loss=0.2337, cr_loss=0.4224, over 19574.00 frames. ], tot_loss[loss=0.2992, ctc_loss=0.2169, cr_loss=0.4117, over 4076822.48 frames. ], batch size: 90, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:21:28,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=100028.0, ans=0.125 2024-09-14 08:21:48,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2024-09-14 08:22:17,430 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.300e+02 2.722e+02 3.531e+02 5.777e+02, threshold=5.444e+02, percent-clipped=5.0 2024-09-14 08:22:31,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=100141.33333333333, ans=0.125 2024-09-14 08:22:39,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.93 vs. limit=10.0 2024-09-14 08:22:47,516 INFO [train.py:1198] (0/2) Epoch 6, batch 3400, loss[loss=0.3219, ctc_loss=0.2343, cr_loss=0.4378, over 20662.00 frames. ], tot_loss[loss=0.2994, ctc_loss=0.217, cr_loss=0.4122, over 4083742.01 frames. ], batch size: 68, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:23:32,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=100254.66666666667, ans=0.2 2024-09-14 08:24:03,101 INFO [train.py:1198] (0/2) Epoch 6, batch 3450, loss[loss=0.3056, ctc_loss=0.2223, cr_loss=0.4162, over 20878.00 frames. ], tot_loss[loss=0.2992, ctc_loss=0.2168, cr_loss=0.412, over 4094896.26 frames. ], batch size: 54, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:24:03,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=22.5 2024-09-14 08:24:13,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=100311.33333333333, ans=0.0 2024-09-14 08:24:28,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=100339.66666666667, ans=0.125 2024-09-14 08:24:36,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=100368.0, ans=0.1 2024-09-14 08:24:42,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=100368.0, ans=0.0 2024-09-14 08:24:51,170 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.270e+02 2.590e+02 2.910e+02 4.055e+02, threshold=5.180e+02, percent-clipped=0.0 2024-09-14 08:25:18,687 INFO [train.py:1198] (0/2) Epoch 6, batch 3500, loss[loss=0.2956, ctc_loss=0.2135, cr_loss=0.4106, over 20832.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.2161, cr_loss=0.4112, over 4101417.16 frames. ], batch size: 59, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:25:44,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=100481.33333333333, ans=0.2 2024-09-14 08:26:30,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=100566.33333333333, ans=0.1 2024-09-14 08:26:34,718 INFO [train.py:1198] (0/2) Epoch 6, batch 3550, loss[loss=0.3152, ctc_loss=0.2262, cr_loss=0.445, over 20994.00 frames. ], tot_loss[loss=0.2974, ctc_loss=0.2153, cr_loss=0.4106, over 4098780.93 frames. ], batch size: 61, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:26:40,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=100594.66666666667, ans=0.125 2024-09-14 08:26:51,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=100623.0, ans=10.0 2024-09-14 08:27:12,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=100651.33333333333, ans=0.0 2024-09-14 08:27:19,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=100679.66666666667, ans=0.125 2024-09-14 08:27:25,414 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.086e+02 2.280e+02 2.546e+02 4.280e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-14 08:27:43,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=100708.0, ans=0.125 2024-09-14 08:27:52,581 INFO [train.py:1198] (0/2) Epoch 6, batch 3600, loss[loss=0.2494, ctc_loss=0.1744, cr_loss=0.3752, over 20948.00 frames. ], tot_loss[loss=0.2975, ctc_loss=0.2152, cr_loss=0.4114, over 4105738.55 frames. ], batch size: 49, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:27:57,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=100736.33333333333, ans=0.0 2024-09-14 08:28:31,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=100793.0, ans=0.125 2024-09-14 08:28:43,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-09-14 08:29:03,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=100849.66666666667, ans=0.125 2024-09-14 08:29:10,933 INFO [train.py:1198] (0/2) Epoch 6, batch 3650, loss[loss=0.2506, ctc_loss=0.1786, cr_loss=0.3601, over 20964.00 frames. ], tot_loss[loss=0.2969, ctc_loss=0.2148, cr_loss=0.4105, over 4107065.12 frames. ], batch size: 49, lr: 1.38e-02, grad_scale: 32.0 2024-09-14 08:29:34,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=100906.33333333333, ans=15.0 2024-09-14 08:29:59,352 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.126e+02 2.389e+02 2.658e+02 5.184e+02, threshold=4.778e+02, percent-clipped=1.0 2024-09-14 08:30:13,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=100991.33333333333, ans=0.125 2024-09-14 08:30:17,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=100991.33333333333, ans=0.125 2024-09-14 08:30:26,131 INFO [train.py:1198] (0/2) Epoch 6, batch 3700, loss[loss=0.3959, ctc_loss=0.3049, cr_loss=0.4548, over 14146.00 frames. ], tot_loss[loss=0.2963, ctc_loss=0.2143, cr_loss=0.4102, over 4106509.32 frames. ], batch size: 149, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:30:26,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=101019.66666666667, ans=0.09899494936611666 2024-09-14 08:30:28,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=101019.66666666667, ans=0.07 2024-09-14 08:30:31,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=101019.66666666667, ans=0.125 2024-09-14 08:30:32,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=101019.66666666667, ans=0.04949747468305833 2024-09-14 08:30:37,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=101019.66666666667, ans=0.1 2024-09-14 08:30:41,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=101048.0, ans=0.125 2024-09-14 08:30:59,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=101076.33333333333, ans=0.125 2024-09-14 08:31:07,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-14 08:31:33,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=101133.0, ans=0.2 2024-09-14 08:31:36,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101133.0, ans=0.1 2024-09-14 08:31:40,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.49 vs. limit=22.5 2024-09-14 08:31:41,904 INFO [train.py:1198] (0/2) Epoch 6, batch 3750, loss[loss=0.33, ctc_loss=0.2461, cr_loss=0.4193, over 19453.00 frames. ], tot_loss[loss=0.2962, ctc_loss=0.2142, cr_loss=0.4095, over 4107136.28 frames. ], batch size: 90, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:31:49,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=101161.33333333333, ans=0.09899494936611666 2024-09-14 08:32:30,644 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.249e+02 2.659e+02 3.159e+02 4.836e+02, threshold=5.318e+02, percent-clipped=1.0 2024-09-14 08:32:49,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=101274.66666666667, ans=0.125 2024-09-14 08:32:49,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=101274.66666666667, ans=0.125 2024-09-14 08:32:57,782 INFO [train.py:1198] (0/2) Epoch 6, batch 3800, loss[loss=0.3374, ctc_loss=0.2486, cr_loss=0.4445, over 21019.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.2161, cr_loss=0.4116, over 4089843.11 frames. ], batch size: 62, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:33:17,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=101331.33333333333, ans=0.125 2024-09-14 08:33:34,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2024-09-14 08:33:53,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=101388.0, ans=0.2 2024-09-14 08:33:56,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=101388.0, ans=0.0 2024-09-14 08:34:14,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=101416.33333333333, ans=0.025 2024-09-14 08:34:18,822 INFO [train.py:1198] (0/2) Epoch 6, batch 3850, loss[loss=0.2561, ctc_loss=0.1847, cr_loss=0.3571, over 21066.00 frames. ], tot_loss[loss=0.2994, ctc_loss=0.2168, cr_loss=0.4127, over 4079960.59 frames. ], batch size: 53, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:34:26,763 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 08:35:07,116 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.191e+02 2.419e+02 2.889e+02 4.083e+02, threshold=4.839e+02, percent-clipped=0.0 2024-09-14 08:35:07,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=101529.66666666667, ans=0.2 2024-09-14 08:35:12,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=101529.66666666667, ans=0.125 2024-09-14 08:35:34,200 INFO [train.py:1198] (0/2) Epoch 6, batch 3900, loss[loss=0.272, ctc_loss=0.1948, cr_loss=0.3861, over 20769.00 frames. ], tot_loss[loss=0.2997, ctc_loss=0.2171, cr_loss=0.413, over 4077485.90 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:35:39,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=101586.33333333333, ans=0.125 2024-09-14 08:35:39,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-14 08:35:49,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=101614.66666666667, ans=0.1 2024-09-14 08:35:53,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=101614.66666666667, ans=0.1 2024-09-14 08:36:01,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=22.5 2024-09-14 08:36:05,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=101643.0, ans=0.125 2024-09-14 08:36:10,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=101643.0, ans=0.0 2024-09-14 08:36:18,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=101671.33333333333, ans=0.0 2024-09-14 08:36:29,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=101671.33333333333, ans=0.0 2024-09-14 08:36:48,636 INFO [train.py:1198] (0/2) Epoch 6, batch 3950, loss[loss=0.3187, ctc_loss=0.2344, cr_loss=0.4217, over 20713.00 frames. ], tot_loss[loss=0.2985, ctc_loss=0.2159, cr_loss=0.4126, over 4090130.38 frames. ], batch size: 68, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:36:51,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=101728.0, ans=0.5 2024-09-14 08:36:53,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=101728.0, ans=0.2 2024-09-14 08:36:58,060 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 08:37:20,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=101784.66666666667, ans=0.125 2024-09-14 08:37:29,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=101784.66666666667, ans=0.0 2024-09-14 08:37:37,130 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.217e+02 2.436e+02 2.828e+02 5.297e+02, threshold=4.873e+02, percent-clipped=1.0 2024-09-14 08:37:38,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=101813.0, ans=0.2 2024-09-14 08:37:38,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=101813.0, ans=0.125 2024-09-14 08:37:41,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=101813.0, ans=0.5 2024-09-14 08:37:46,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=101813.0, ans=0.0 2024-09-14 08:38:04,429 INFO [train.py:1198] (0/2) Epoch 6, batch 4000, loss[loss=0.2929, ctc_loss=0.2128, cr_loss=0.4008, over 20138.00 frames. ], tot_loss[loss=0.2995, ctc_loss=0.2169, cr_loss=0.4135, over 4087873.41 frames. ], batch size: 80, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:38:32,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-14 08:39:12,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=101983.0, ans=0.125 2024-09-14 08:39:15,750 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-36000.pt 2024-09-14 08:39:24,167 INFO [train.py:1198] (0/2) Epoch 6, batch 4050, loss[loss=0.3628, ctc_loss=0.2705, cr_loss=0.4612, over 18206.00 frames. ], tot_loss[loss=0.3011, ctc_loss=0.2182, cr_loss=0.4146, over 4081955.72 frames. ], batch size: 108, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:39:58,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-09-14 08:39:59,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=102068.0, ans=0.0 2024-09-14 08:40:07,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102068.0, ans=0.1 2024-09-14 08:40:16,001 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.240e+02 2.572e+02 3.256e+02 5.234e+02, threshold=5.143e+02, percent-clipped=2.0 2024-09-14 08:40:43,483 INFO [train.py:1198] (0/2) Epoch 6, batch 4100, loss[loss=0.3437, ctc_loss=0.2507, cr_loss=0.4649, over 20895.00 frames. ], tot_loss[loss=0.299, ctc_loss=0.2163, cr_loss=0.4133, over 4097620.38 frames. ], batch size: 57, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:41:00,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=102181.33333333333, ans=0.04949747468305833 2024-09-14 08:41:01,158 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=15.0 2024-09-14 08:41:18,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=102209.66666666667, ans=0.125 2024-09-14 08:41:33,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=102238.0, ans=0.125 2024-09-14 08:41:42,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=102266.33333333333, ans=0.2 2024-09-14 08:41:46,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=102266.33333333333, ans=0.125 2024-09-14 08:41:56,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=102266.33333333333, ans=0.025 2024-09-14 08:41:59,439 INFO [train.py:1198] (0/2) Epoch 6, batch 4150, loss[loss=0.3014, ctc_loss=0.2168, cr_loss=0.4232, over 21010.00 frames. ], tot_loss[loss=0.2994, ctc_loss=0.2167, cr_loss=0.4132, over 4089006.34 frames. ], batch size: 63, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:42:08,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=102294.66666666667, ans=0.125 2024-09-14 08:42:20,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=102323.0, ans=0.0 2024-09-14 08:42:47,314 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.196e+02 2.505e+02 2.718e+02 6.723e+02, threshold=5.010e+02, percent-clipped=1.0 2024-09-14 08:43:14,195 INFO [train.py:1198] (0/2) Epoch 6, batch 4200, loss[loss=0.3602, ctc_loss=0.2631, cr_loss=0.4854, over 18392.00 frames. ], tot_loss[loss=0.2986, ctc_loss=0.2161, cr_loss=0.4123, over 4084965.95 frames. ], batch size: 108, lr: 1.37e-02, grad_scale: 32.0 2024-09-14 08:44:06,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=102521.33333333333, ans=0.125 2024-09-14 08:44:15,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=102549.66666666667, ans=0.125 2024-09-14 08:44:33,374 INFO [train.py:1198] (0/2) Epoch 6, batch 4250, loss[loss=0.3887, ctc_loss=0.2982, cr_loss=0.4523, over 14539.00 frames. ], tot_loss[loss=0.2992, ctc_loss=0.2166, cr_loss=0.4133, over 4077859.96 frames. ], batch size: 149, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:45:03,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=102634.66666666667, ans=0.125 2024-09-14 08:45:12,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=102634.66666666667, ans=0.1 2024-09-14 08:45:23,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102663.0, ans=0.1 2024-09-14 08:45:24,569 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.179e+02 2.361e+02 2.812e+02 4.445e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-14 08:45:30,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=102663.0, ans=0.0 2024-09-14 08:45:51,932 INFO [train.py:1198] (0/2) Epoch 6, batch 4300, loss[loss=0.2867, ctc_loss=0.2125, cr_loss=0.3708, over 21038.00 frames. ], tot_loss[loss=0.2981, ctc_loss=0.2158, cr_loss=0.4118, over 4069405.76 frames. ], batch size: 62, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:46:12,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=102748.0, ans=0.125 2024-09-14 08:47:05,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2024-09-14 08:47:06,401 INFO [train.py:1198] (0/2) Epoch 6, batch 4350, loss[loss=0.3074, ctc_loss=0.2236, cr_loss=0.4186, over 21001.00 frames. ], tot_loss[loss=0.2991, ctc_loss=0.2165, cr_loss=0.413, over 4066785.09 frames. ], batch size: 61, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:47:09,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=102861.33333333333, ans=0.125 2024-09-14 08:47:10,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=102861.33333333333, ans=0.125 2024-09-14 08:47:19,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=102889.66666666667, ans=0.125 2024-09-14 08:47:24,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=102889.66666666667, ans=0.0 2024-09-14 08:47:27,991 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.27 vs. limit=10.0 2024-09-14 08:47:36,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=102918.0, ans=0.125 2024-09-14 08:47:54,417 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.283e+02 2.599e+02 3.001e+02 4.794e+02, threshold=5.197e+02, percent-clipped=1.0 2024-09-14 08:48:21,767 INFO [train.py:1198] (0/2) Epoch 6, batch 4400, loss[loss=0.2731, ctc_loss=0.1967, cr_loss=0.3818, over 20881.00 frames. ], tot_loss[loss=0.2979, ctc_loss=0.2154, cr_loss=0.4121, over 4084321.15 frames. ], batch size: 54, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:48:32,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=103003.0, ans=0.0 2024-09-14 08:49:04,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=103059.66666666667, ans=0.125 2024-09-14 08:49:04,454 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.43 vs. limit=22.5 2024-09-14 08:49:15,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=103088.0, ans=0.125 2024-09-14 08:49:37,290 INFO [train.py:1198] (0/2) Epoch 6, batch 4450, loss[loss=0.2957, ctc_loss=0.2113, cr_loss=0.4224, over 20867.00 frames. ], tot_loss[loss=0.298, ctc_loss=0.2154, cr_loss=0.4129, over 4095404.47 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:49:42,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2024-09-14 08:50:28,462 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.123e+02 2.377e+02 2.745e+02 5.864e+02, threshold=4.754e+02, percent-clipped=1.0 2024-09-14 08:50:30,345 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 08:50:30,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=103229.66666666667, ans=0.0 2024-09-14 08:50:50,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=103258.0, ans=0.05 2024-09-14 08:50:55,976 INFO [train.py:1198] (0/2) Epoch 6, batch 4500, loss[loss=0.2668, ctc_loss=0.1895, cr_loss=0.3863, over 20991.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.2148, cr_loss=0.4119, over 4095333.98 frames. ], batch size: 48, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:51:51,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=103371.33333333333, ans=0.125 2024-09-14 08:52:12,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=103428.0, ans=0.2 2024-09-14 08:52:13,917 INFO [train.py:1198] (0/2) Epoch 6, batch 4550, loss[loss=0.2931, ctc_loss=0.2105, cr_loss=0.4129, over 20973.00 frames. ], tot_loss[loss=0.2956, ctc_loss=0.2135, cr_loss=0.4104, over 4098441.96 frames. ], batch size: 55, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:52:17,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=103428.0, ans=0.1 2024-09-14 08:52:29,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=103456.33333333333, ans=0.125 2024-09-14 08:52:48,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=103484.66666666667, ans=0.0 2024-09-14 08:53:02,004 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.247e+02 2.416e+02 2.805e+02 4.163e+02, threshold=4.832e+02, percent-clipped=0.0 2024-09-14 08:53:28,888 INFO [train.py:1198] (0/2) Epoch 6, batch 4600, loss[loss=0.2901, ctc_loss=0.2066, cr_loss=0.4175, over 21013.00 frames. ], tot_loss[loss=0.2976, ctc_loss=0.2151, cr_loss=0.4122, over 4093401.95 frames. ], batch size: 52, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:53:57,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=22.5 2024-09-14 08:54:00,064 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-09-14 08:54:25,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=103654.66666666667, ans=0.0 2024-09-14 08:54:45,023 INFO [train.py:1198] (0/2) Epoch 6, batch 4650, loss[loss=0.3081, ctc_loss=0.2217, cr_loss=0.4318, over 21037.00 frames. ], tot_loss[loss=0.2969, ctc_loss=0.2144, cr_loss=0.4125, over 4099397.13 frames. ], batch size: 62, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:54:45,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=103711.33333333333, ans=0.2 2024-09-14 08:55:32,919 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.179e+02 2.337e+02 2.741e+02 4.133e+02, threshold=4.674e+02, percent-clipped=0.0 2024-09-14 08:55:48,346 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 08:56:02,832 INFO [train.py:1198] (0/2) Epoch 6, batch 4700, loss[loss=0.3466, ctc_loss=0.2588, cr_loss=0.4391, over 18206.00 frames. ], tot_loss[loss=0.2968, ctc_loss=0.2145, cr_loss=0.4118, over 4098887.02 frames. ], batch size: 108, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:56:39,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=103909.66666666667, ans=0.125 2024-09-14 08:56:40,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=103909.66666666667, ans=0.125 2024-09-14 08:57:15,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=103966.33333333333, ans=0.125 2024-09-14 08:57:21,065 INFO [train.py:1198] (0/2) Epoch 6, batch 4750, loss[loss=0.2667, ctc_loss=0.189, cr_loss=0.3885, over 19541.00 frames. ], tot_loss[loss=0.2973, ctc_loss=0.2148, cr_loss=0.4127, over 4095878.51 frames. ], batch size: 43, lr: 1.36e-02, grad_scale: 32.0 2024-09-14 08:57:39,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=104023.0, ans=0.0 2024-09-14 08:57:42,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104023.0, ans=0.1 2024-09-14 08:57:51,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=104051.33333333333, ans=0.2 2024-09-14 08:58:00,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=104051.33333333333, ans=0.125 2024-09-14 08:58:03,827 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2024-09-14 08:58:10,126 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.181e+02 2.355e+02 2.604e+02 3.883e+02, threshold=4.709e+02, percent-clipped=0.0 2024-09-14 08:58:12,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=104079.66666666667, ans=0.07 2024-09-14 08:58:34,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=104136.33333333333, ans=0.2 2024-09-14 08:58:35,972 INFO [train.py:1198] (0/2) Epoch 6, batch 4800, loss[loss=0.2672, ctc_loss=0.1941, cr_loss=0.3654, over 20276.00 frames. ], tot_loss[loss=0.2993, ctc_loss=0.2163, cr_loss=0.415, over 4089455.74 frames. ], batch size: 45, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 08:58:48,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=104136.33333333333, ans=0.125 2024-09-14 08:59:02,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=22.5 2024-09-14 08:59:16,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104193.0, ans=0.1 2024-09-14 08:59:18,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=104193.0, ans=0.0 2024-09-14 08:59:20,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-09-14 08:59:29,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=104221.33333333333, ans=0.0 2024-09-14 08:59:49,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=104249.66666666667, ans=0.125 2024-09-14 08:59:51,764 INFO [train.py:1198] (0/2) Epoch 6, batch 4850, loss[loss=0.3136, ctc_loss=0.2265, cr_loss=0.4354, over 20013.00 frames. ], tot_loss[loss=0.2993, ctc_loss=0.2163, cr_loss=0.4151, over 4095520.40 frames. ], batch size: 80, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 09:00:12,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=15.0 2024-09-14 09:00:13,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=104306.33333333333, ans=0.0 2024-09-14 09:00:19,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104306.33333333333, ans=0.1 2024-09-14 09:00:41,189 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.121e+02 2.391e+02 2.822e+02 4.771e+02, threshold=4.781e+02, percent-clipped=1.0 2024-09-14 09:00:54,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=104391.33333333333, ans=0.125 2024-09-14 09:01:06,292 INFO [train.py:1198] (0/2) Epoch 6, batch 4900, loss[loss=0.2937, ctc_loss=0.2113, cr_loss=0.4119, over 21033.00 frames. ], tot_loss[loss=0.2997, ctc_loss=0.2167, cr_loss=0.4151, over 4085158.48 frames. ], batch size: 63, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 09:01:26,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=104448.0, ans=0.2 2024-09-14 09:02:00,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=104504.66666666667, ans=0.0 2024-09-14 09:02:22,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=104561.33333333333, ans=15.0 2024-09-14 09:02:23,115 INFO [train.py:1198] (0/2) Epoch 6, batch 4950, loss[loss=0.2517, ctc_loss=0.1767, cr_loss=0.3749, over 21000.00 frames. ], tot_loss[loss=0.2974, ctc_loss=0.2149, cr_loss=0.4125, over 4085586.26 frames. ], batch size: 52, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 09:03:13,650 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.200e+02 2.395e+02 2.783e+02 3.877e+02, threshold=4.790e+02, percent-clipped=0.0 2024-09-14 09:03:37,964 INFO [train.py:1198] (0/2) Epoch 6, batch 5000, loss[loss=0.2658, ctc_loss=0.1881, cr_loss=0.3886, over 21072.00 frames. ], tot_loss[loss=0.297, ctc_loss=0.2145, cr_loss=0.4121, over 4094113.72 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 16.0 2024-09-14 09:04:07,846 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-09-14 09:04:09,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2024-09-14 09:04:14,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=104759.66666666667, ans=0.2 2024-09-14 09:04:19,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=104759.66666666667, ans=0.025 2024-09-14 09:04:54,857 INFO [train.py:1198] (0/2) Epoch 6, batch 5050, loss[loss=0.2971, ctc_loss=0.2124, cr_loss=0.4233, over 21025.00 frames. ], tot_loss[loss=0.2955, ctc_loss=0.2133, cr_loss=0.411, over 4094838.41 frames. ], batch size: 61, lr: 1.35e-02, grad_scale: 16.0 2024-09-14 09:05:16,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=104873.0, ans=0.125 2024-09-14 09:05:39,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=104929.66666666667, ans=0.0 2024-09-14 09:05:45,492 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.077e+02 2.241e+02 2.493e+02 4.745e+02, threshold=4.482e+02, percent-clipped=0.0 2024-09-14 09:06:09,261 INFO [train.py:1198] (0/2) Epoch 6, batch 5100, loss[loss=0.2514, ctc_loss=0.1784, cr_loss=0.3651, over 20983.00 frames. ], tot_loss[loss=0.295, ctc_loss=0.213, cr_loss=0.4102, over 4099502.78 frames. ], batch size: 51, lr: 1.35e-02, grad_scale: 16.0 2024-09-14 09:07:00,153 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.14 vs. limit=10.0 2024-09-14 09:07:16,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=105099.66666666667, ans=0.125 2024-09-14 09:07:16,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=105099.66666666667, ans=0.125 2024-09-14 09:07:19,638 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2024-09-14 09:07:23,166 INFO [train.py:1198] (0/2) Epoch 6, batch 5150, loss[loss=0.3019, ctc_loss=0.2148, cr_loss=0.4352, over 20129.00 frames. ], tot_loss[loss=0.2963, ctc_loss=0.214, cr_loss=0.4115, over 4107171.13 frames. ], batch size: 80, lr: 1.35e-02, grad_scale: 16.0 2024-09-14 09:07:39,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105156.33333333333, ans=0.1 2024-09-14 09:07:41,801 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=22.5 2024-09-14 09:07:44,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=105156.33333333333, ans=0.125 2024-09-14 09:07:56,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=105184.66666666667, ans=0.125 2024-09-14 09:08:14,007 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.184e+02 2.580e+02 3.007e+02 4.983e+02, threshold=5.160e+02, percent-clipped=2.0 2024-09-14 09:08:37,415 INFO [train.py:1198] (0/2) Epoch 6, batch 5200, loss[loss=0.2947, ctc_loss=0.2107, cr_loss=0.4202, over 20239.00 frames. ], tot_loss[loss=0.2956, ctc_loss=0.2135, cr_loss=0.4108, over 4110241.73 frames. ], batch size: 74, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 09:08:48,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2024-09-14 09:08:49,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=105269.66666666667, ans=0.09899494936611666 2024-09-14 09:08:53,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=105298.0, ans=0.0 2024-09-14 09:09:04,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=105298.0, ans=0.125 2024-09-14 09:09:10,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=105326.33333333333, ans=0.0 2024-09-14 09:09:29,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=105354.66666666667, ans=0.125 2024-09-14 09:09:51,784 INFO [train.py:1198] (0/2) Epoch 6, batch 5250, loss[loss=0.3141, ctc_loss=0.2258, cr_loss=0.4414, over 20657.00 frames. ], tot_loss[loss=0.2961, ctc_loss=0.2138, cr_loss=0.4115, over 4102409.00 frames. ], batch size: 68, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 09:10:10,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=12.0 2024-09-14 09:10:42,711 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.226e+02 2.488e+02 3.042e+02 5.046e+02, threshold=4.977e+02, percent-clipped=0.0 2024-09-14 09:11:00,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=105524.66666666667, ans=0.0 2024-09-14 09:11:06,187 INFO [train.py:1198] (0/2) Epoch 6, batch 5300, loss[loss=0.3292, ctc_loss=0.2405, cr_loss=0.4436, over 21020.00 frames. ], tot_loss[loss=0.2966, ctc_loss=0.2141, cr_loss=0.4123, over 4107219.77 frames. ], batch size: 63, lr: 1.35e-02, grad_scale: 32.0 2024-09-14 09:11:07,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=105553.0, ans=0.125 2024-09-14 09:11:38,015 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=22.5 2024-09-14 09:11:49,128 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.42 vs. limit=15.0 2024-09-14 09:12:22,816 INFO [train.py:1198] (0/2) Epoch 6, batch 5350, loss[loss=0.3083, ctc_loss=0.2205, cr_loss=0.439, over 21019.00 frames. ], tot_loss[loss=0.2973, ctc_loss=0.2148, cr_loss=0.4124, over 4088920.17 frames. ], batch size: 63, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:12:30,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=105694.66666666667, ans=0.0 2024-09-14 09:12:42,594 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-09-14 09:12:53,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=105751.33333333333, ans=0.1 2024-09-14 09:12:57,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105751.33333333333, ans=0.1 2024-09-14 09:12:58,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=105751.33333333333, ans=0.025 2024-09-14 09:13:09,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=105779.66666666667, ans=0.125 2024-09-14 09:13:12,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.28 vs. limit=15.0 2024-09-14 09:13:13,341 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.121e+02 2.316e+02 2.792e+02 3.922e+02, threshold=4.631e+02, percent-clipped=0.0 2024-09-14 09:13:16,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=105779.66666666667, ans=0.0 2024-09-14 09:13:37,344 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-09-14 09:13:39,567 INFO [train.py:1198] (0/2) Epoch 6, batch 5400, loss[loss=0.2524, ctc_loss=0.1773, cr_loss=0.3752, over 21008.00 frames. ], tot_loss[loss=0.2962, ctc_loss=0.2138, cr_loss=0.4119, over 4102745.12 frames. ], batch size: 48, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:14:08,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=105893.0, ans=0.0 2024-09-14 09:14:14,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105893.0, ans=0.1 2024-09-14 09:14:18,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=105893.0, ans=0.2 2024-09-14 09:14:52,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=105978.0, ans=0.125 2024-09-14 09:14:54,010 INFO [train.py:1198] (0/2) Epoch 6, batch 5450, loss[loss=0.3127, ctc_loss=0.2295, cr_loss=0.4161, over 20989.00 frames. ], tot_loss[loss=0.2964, ctc_loss=0.214, cr_loss=0.4118, over 4101489.77 frames. ], batch size: 64, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:15:31,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=106034.66666666667, ans=0.04949747468305833 2024-09-14 09:15:42,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=106063.0, ans=0.125 2024-09-14 09:15:44,130 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.253e+02 2.505e+02 2.912e+02 6.066e+02, threshold=5.009e+02, percent-clipped=2.0 2024-09-14 09:15:47,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=106063.0, ans=0.2 2024-09-14 09:15:54,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=106091.33333333333, ans=0.0 2024-09-14 09:15:57,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=106091.33333333333, ans=0.0 2024-09-14 09:16:08,069 INFO [train.py:1198] (0/2) Epoch 6, batch 5500, loss[loss=0.2949, ctc_loss=0.2145, cr_loss=0.4019, over 21035.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.2146, cr_loss=0.4127, over 4094455.22 frames. ], batch size: 62, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:16:37,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=106176.33333333333, ans=0.0 2024-09-14 09:16:48,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-14 09:16:49,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=106176.33333333333, ans=0.125 2024-09-14 09:17:22,279 INFO [train.py:1198] (0/2) Epoch 6, batch 5550, loss[loss=0.3, ctc_loss=0.214, cr_loss=0.4302, over 21054.00 frames. ], tot_loss[loss=0.2969, ctc_loss=0.2145, cr_loss=0.4123, over 4091626.55 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:17:26,036 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-14 09:18:12,502 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.128e+02 2.265e+02 2.495e+02 3.567e+02, threshold=4.531e+02, percent-clipped=0.0 2024-09-14 09:18:20,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-14 09:18:36,069 INFO [train.py:1198] (0/2) Epoch 6, batch 5600, loss[loss=0.2903, ctc_loss=0.2054, cr_loss=0.4243, over 21074.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2134, cr_loss=0.4116, over 4092305.33 frames. ], batch size: 59, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:19:25,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=106488.0, ans=0.1 2024-09-14 09:19:35,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=106516.33333333333, ans=0.2 2024-09-14 09:19:48,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=106516.33333333333, ans=0.125 2024-09-14 09:19:50,761 INFO [train.py:1198] (0/2) Epoch 6, batch 5650, loss[loss=0.2952, ctc_loss=0.2114, cr_loss=0.419, over 20919.00 frames. ], tot_loss[loss=0.2967, ctc_loss=0.2141, cr_loss=0.4127, over 4095008.96 frames. ], batch size: 64, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:19:51,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=106544.66666666667, ans=0.0 2024-09-14 09:19:55,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=106544.66666666667, ans=0.5 2024-09-14 09:20:03,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=106544.66666666667, ans=0.125 2024-09-14 09:20:07,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=12.0 2024-09-14 09:20:24,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=106601.33333333333, ans=0.5 2024-09-14 09:20:41,644 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.152e+02 2.410e+02 2.740e+02 3.893e+02, threshold=4.820e+02, percent-clipped=0.0 2024-09-14 09:20:43,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=106629.66666666667, ans=0.125 2024-09-14 09:21:07,798 INFO [train.py:1198] (0/2) Epoch 6, batch 5700, loss[loss=0.2597, ctc_loss=0.1886, cr_loss=0.3555, over 21005.00 frames. ], tot_loss[loss=0.2943, ctc_loss=0.2123, cr_loss=0.41, over 4104689.99 frames. ], batch size: 52, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:21:16,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=106686.33333333333, ans=0.125 2024-09-14 09:21:31,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=106714.66666666667, ans=15.0 2024-09-14 09:21:34,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=106714.66666666667, ans=10.0 2024-09-14 09:21:48,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=106743.0, ans=0.125 2024-09-14 09:21:51,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-09-14 09:22:01,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=106771.33333333333, ans=0.125 2024-09-14 09:22:21,983 INFO [train.py:1198] (0/2) Epoch 6, batch 5750, loss[loss=0.3705, ctc_loss=0.2721, cr_loss=0.4919, over 20686.00 frames. ], tot_loss[loss=0.2947, ctc_loss=0.2124, cr_loss=0.411, over 4105481.75 frames. ], batch size: 71, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:22:55,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=106884.66666666667, ans=0.125 2024-09-14 09:22:57,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=106884.66666666667, ans=0.0 2024-09-14 09:23:00,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=106884.66666666667, ans=0.2 2024-09-14 09:23:14,884 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.113e+02 2.306e+02 2.589e+02 3.888e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-14 09:23:22,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=106941.33333333333, ans=0.025 2024-09-14 09:23:29,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=106941.33333333333, ans=0.0 2024-09-14 09:23:38,705 INFO [train.py:1198] (0/2) Epoch 6, batch 5800, loss[loss=0.2861, ctc_loss=0.2011, cr_loss=0.4246, over 21039.00 frames. ], tot_loss[loss=0.2955, ctc_loss=0.2132, cr_loss=0.4115, over 4098265.43 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:23:44,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=106969.66666666667, ans=0.025 2024-09-14 09:23:49,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=106969.66666666667, ans=0.0 2024-09-14 09:24:23,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=107054.66666666667, ans=0.0 2024-09-14 09:24:30,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.64 vs. limit=5.0 2024-09-14 09:24:38,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=107083.0, ans=0.125 2024-09-14 09:24:52,805 INFO [train.py:1198] (0/2) Epoch 6, batch 5850, loss[loss=0.2867, ctc_loss=0.2038, cr_loss=0.4145, over 21040.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.2126, cr_loss=0.4111, over 4111828.29 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:24:57,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=107111.33333333333, ans=0.025 2024-09-14 09:25:27,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=107168.0, ans=0.125 2024-09-14 09:25:43,373 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.138e+02 2.348e+02 2.763e+02 4.155e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-14 09:26:06,869 INFO [train.py:1198] (0/2) Epoch 6, batch 5900, loss[loss=0.2622, ctc_loss=0.1847, cr_loss=0.3877, over 20981.00 frames. ], tot_loss[loss=0.2952, ctc_loss=0.2129, cr_loss=0.4115, over 4104373.07 frames. ], batch size: 52, lr: 1.34e-02, grad_scale: 32.0 2024-09-14 09:26:57,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-09-14 09:27:12,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=107366.33333333333, ans=0.125 2024-09-14 09:27:21,079 INFO [train.py:1198] (0/2) Epoch 6, batch 5950, loss[loss=0.2439, ctc_loss=0.1744, cr_loss=0.3474, over 20300.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2133, cr_loss=0.412, over 4094802.89 frames. ], batch size: 45, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:27:23,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=107394.66666666667, ans=0.2 2024-09-14 09:27:38,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=107423.0, ans=0.0 2024-09-14 09:27:47,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=107423.0, ans=0.125 2024-09-14 09:28:00,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=107451.33333333333, ans=0.125 2024-09-14 09:28:02,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107451.33333333333, ans=0.1 2024-09-14 09:28:12,603 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.063e+02 2.261e+02 2.552e+02 4.834e+02, threshold=4.521e+02, percent-clipped=1.0 2024-09-14 09:28:15,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=107479.66666666667, ans=0.0 2024-09-14 09:28:36,080 INFO [train.py:1198] (0/2) Epoch 6, batch 6000, loss[loss=0.3214, ctc_loss=0.2322, cr_loss=0.4457, over 21007.00 frames. ], tot_loss[loss=0.2945, ctc_loss=0.2124, cr_loss=0.4107, over 4101002.68 frames. ], batch size: 61, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:28:36,081 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 09:28:58,854 INFO [train.py:1230] (0/2) Epoch 6, validation: loss=0.06491, ctc_loss=0.06491, cr_loss=9.455e-15, over 944034.00 frames. 2024-09-14 09:28:58,855 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 09:29:28,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=107593.0, ans=0.2 2024-09-14 09:29:47,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=107621.33333333333, ans=0.125 2024-09-14 09:30:13,094 INFO [train.py:1198] (0/2) Epoch 6, batch 6050, loss[loss=0.2469, ctc_loss=0.1762, cr_loss=0.3536, over 20297.00 frames. ], tot_loss[loss=0.2934, ctc_loss=0.2116, cr_loss=0.4093, over 4106304.34 frames. ], batch size: 45, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:30:42,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107706.33333333333, ans=0.1 2024-09-14 09:30:47,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2024-09-14 09:31:05,997 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.153e+02 2.341e+02 2.684e+02 6.077e+02, threshold=4.681e+02, percent-clipped=1.0 2024-09-14 09:31:25,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=107791.33333333333, ans=0.125 2024-09-14 09:31:29,978 INFO [train.py:1198] (0/2) Epoch 6, batch 6100, loss[loss=0.301, ctc_loss=0.2237, cr_loss=0.3864, over 20667.00 frames. ], tot_loss[loss=0.2958, ctc_loss=0.2134, cr_loss=0.4121, over 4107217.91 frames. ], batch size: 71, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:31:34,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=107819.66666666667, ans=0.0 2024-09-14 09:31:54,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=107848.0, ans=0.125 2024-09-14 09:32:06,494 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2024-09-14 09:32:34,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=107933.0, ans=0.2 2024-09-14 09:32:44,860 INFO [train.py:1198] (0/2) Epoch 6, batch 6150, loss[loss=0.2742, ctc_loss=0.1931, cr_loss=0.4053, over 20970.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2135, cr_loss=0.4112, over 4096107.62 frames. ], batch size: 49, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:32:57,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=22.5 2024-09-14 09:33:35,726 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.158e+02 2.452e+02 2.758e+02 5.690e+02, threshold=4.905e+02, percent-clipped=1.0 2024-09-14 09:33:39,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.37 vs. limit=15.0 2024-09-14 09:33:49,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=108074.66666666667, ans=0.0 2024-09-14 09:33:59,200 INFO [train.py:1198] (0/2) Epoch 6, batch 6200, loss[loss=0.3197, ctc_loss=0.2357, cr_loss=0.4202, over 19468.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.2148, cr_loss=0.4118, over 4062537.49 frames. ], batch size: 90, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:34:17,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=108131.33333333333, ans=0.1 2024-09-14 09:34:57,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.49 vs. limit=10.0 2024-09-14 09:35:03,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.44 vs. limit=10.0 2024-09-14 09:35:07,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=108216.33333333333, ans=0.0 2024-09-14 09:35:14,289 INFO [train.py:1198] (0/2) Epoch 6, batch 6250, loss[loss=0.2569, ctc_loss=0.1852, cr_loss=0.3584, over 20289.00 frames. ], tot_loss[loss=0.2942, ctc_loss=0.2125, cr_loss=0.4085, over 4053770.45 frames. ], batch size: 45, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:35:54,958 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-09-14 09:35:58,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=108329.66666666667, ans=0.125 2024-09-14 09:36:03,854 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-14 09:36:06,162 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.214e+02 2.419e+02 2.929e+02 6.352e+02, threshold=4.837e+02, percent-clipped=3.0 2024-09-14 09:36:12,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=108329.66666666667, ans=0.125 2024-09-14 09:36:29,621 INFO [train.py:1198] (0/2) Epoch 6, batch 6300, loss[loss=0.3648, ctc_loss=0.2686, cr_loss=0.481, over 18328.00 frames. ], tot_loss[loss=0.2971, ctc_loss=0.2151, cr_loss=0.4102, over 4015554.75 frames. ], batch size: 108, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:36:31,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=108386.33333333333, ans=0.125 2024-09-14 09:36:34,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=108386.33333333333, ans=0.0 2024-09-14 09:36:41,867 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2024-09-14 09:37:07,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=108443.0, ans=0.0 2024-09-14 09:37:25,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=108499.66666666667, ans=0.07 2024-09-14 09:37:40,254 INFO [train.py:1198] (0/2) Epoch 6, batch 6350, loss[loss=0.348, ctc_loss=0.2653, cr_loss=0.4134, over 14471.00 frames. ], tot_loss[loss=0.3071, ctc_loss=0.2241, cr_loss=0.415, over 3838353.01 frames. ], batch size: 149, lr: 1.33e-02, grad_scale: 32.0 2024-09-14 09:37:55,388 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 09:38:12,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=108584.66666666667, ans=0.2 2024-09-14 09:38:16,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=108584.66666666667, ans=0.125 2024-09-14 09:38:28,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=108613.0, ans=0.0 2024-09-14 09:38:29,143 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.069e+02 2.339e+02 2.477e+02 2.708e+02 4.665e+02, threshold=4.953e+02, percent-clipped=0.0 2024-09-14 09:38:29,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=108613.0, ans=0.125 2024-09-14 09:38:39,050 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-6.pt 2024-09-14 09:39:27,370 INFO [train.py:1198] (0/2) Epoch 7, batch 0, loss[loss=0.2678, ctc_loss=0.1913, cr_loss=0.3826, over 20972.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1913, cr_loss=0.3826, over 20972.00 frames. ], batch size: 51, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:39:27,371 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 09:39:49,196 INFO [train.py:1230] (0/2) Epoch 7, validation: loss=0.06594, ctc_loss=0.06594, cr_loss=9.26e-15, over 944034.00 frames. 2024-09-14 09:39:49,197 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 09:40:31,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=108700.83333333333, ans=0.0 2024-09-14 09:41:04,376 INFO [train.py:1198] (0/2) Epoch 7, batch 50, loss[loss=0.2833, ctc_loss=0.2076, cr_loss=0.3785, over 21050.00 frames. ], tot_loss[loss=0.2961, ctc_loss=0.2138, cr_loss=0.4114, over 924187.41 frames. ], batch size: 56, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:41:48,631 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-09-14 09:42:09,358 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.131e+02 2.316e+02 2.574e+02 5.024e+02, threshold=4.631e+02, percent-clipped=1.0 2024-09-14 09:42:11,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-14 09:42:13,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.43 vs. limit=15.0 2024-09-14 09:42:20,098 INFO [train.py:1198] (0/2) Epoch 7, batch 100, loss[loss=0.2972, ctc_loss=0.2116, cr_loss=0.428, over 20846.00 frames. ], tot_loss[loss=0.2942, ctc_loss=0.2119, cr_loss=0.4117, over 1634090.05 frames. ], batch size: 65, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:42:39,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-09-14 09:43:10,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=109012.5, ans=0.0 2024-09-14 09:43:34,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=109040.83333333333, ans=0.125 2024-09-14 09:43:38,927 INFO [train.py:1198] (0/2) Epoch 7, batch 150, loss[loss=0.2656, ctc_loss=0.1911, cr_loss=0.3725, over 19925.00 frames. ], tot_loss[loss=0.2942, ctc_loss=0.2122, cr_loss=0.4102, over 2176356.01 frames. ], batch size: 44, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:43:39,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.78 vs. limit=15.0 2024-09-14 09:44:04,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-14 09:44:11,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=109125.83333333333, ans=0.0 2024-09-14 09:44:44,732 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.118e+02 2.261e+02 2.574e+02 3.936e+02, threshold=4.523e+02, percent-clipped=0.0 2024-09-14 09:44:55,149 INFO [train.py:1198] (0/2) Epoch 7, batch 200, loss[loss=0.2521, ctc_loss=0.1832, cr_loss=0.3449, over 21061.00 frames. ], tot_loss[loss=0.2953, ctc_loss=0.2129, cr_loss=0.412, over 2605075.86 frames. ], batch size: 53, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:45:00,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2024-09-14 09:45:04,917 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 09:45:21,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=109239.16666666667, ans=0.125 2024-09-14 09:46:00,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=109324.16666666667, ans=0.0 2024-09-14 09:46:08,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=109324.16666666667, ans=0.0 2024-09-14 09:46:13,797 INFO [train.py:1198] (0/2) Epoch 7, batch 250, loss[loss=0.3586, ctc_loss=0.267, cr_loss=0.4579, over 18112.00 frames. ], tot_loss[loss=0.2942, ctc_loss=0.212, cr_loss=0.411, over 2926137.78 frames. ], batch size: 108, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:46:55,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=109409.16666666667, ans=0.125 2024-09-14 09:47:18,785 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.090e+02 2.265e+02 2.566e+02 4.003e+02, threshold=4.530e+02, percent-clipped=0.0 2024-09-14 09:47:19,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=109465.83333333333, ans=0.125 2024-09-14 09:47:29,383 INFO [train.py:1198] (0/2) Epoch 7, batch 300, loss[loss=0.3258, ctc_loss=0.2336, cr_loss=0.461, over 19984.00 frames. ], tot_loss[loss=0.295, ctc_loss=0.2126, cr_loss=0.4121, over 3184198.56 frames. ], batch size: 80, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:47:33,336 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=12.0 2024-09-14 09:47:55,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=109522.5, ans=0.0 2024-09-14 09:48:47,816 INFO [train.py:1198] (0/2) Epoch 7, batch 350, loss[loss=0.2648, ctc_loss=0.1908, cr_loss=0.3698, over 21081.00 frames. ], tot_loss[loss=0.2939, ctc_loss=0.2118, cr_loss=0.4105, over 3387816.57 frames. ], batch size: 53, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:48:48,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=109635.83333333333, ans=0.125 2024-09-14 09:48:54,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=109635.83333333333, ans=0.125 2024-09-14 09:49:09,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=12.0 2024-09-14 09:49:53,124 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.071e+02 2.305e+02 2.810e+02 4.321e+02, threshold=4.611e+02, percent-clipped=0.0 2024-09-14 09:50:03,596 INFO [train.py:1198] (0/2) Epoch 7, batch 400, loss[loss=0.2729, ctc_loss=0.192, cr_loss=0.4043, over 20872.00 frames. ], tot_loss[loss=0.2929, ctc_loss=0.2109, cr_loss=0.41, over 3539230.99 frames. ], batch size: 57, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:50:16,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=109777.5, ans=0.0 2024-09-14 09:50:18,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=109805.83333333333, ans=0.125 2024-09-14 09:50:47,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=109834.16666666667, ans=10.0 2024-09-14 09:51:23,724 INFO [train.py:1198] (0/2) Epoch 7, batch 450, loss[loss=0.3269, ctc_loss=0.2326, cr_loss=0.4712, over 20831.00 frames. ], tot_loss[loss=0.2931, ctc_loss=0.2109, cr_loss=0.4109, over 3656639.28 frames. ], batch size: 65, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:51:23,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=109919.16666666667, ans=0.125 2024-09-14 09:51:54,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109975.83333333333, ans=0.1 2024-09-14 09:51:57,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=109975.83333333333, ans=0.2 2024-09-14 09:52:06,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=109975.83333333333, ans=0.02 2024-09-14 09:52:16,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2024-09-14 09:52:17,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=110004.16666666667, ans=0.2 2024-09-14 09:52:18,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=110004.16666666667, ans=0.0 2024-09-14 09:52:29,056 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.714e+02 2.067e+02 2.280e+02 2.580e+02 4.527e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-14 09:52:39,900 INFO [train.py:1198] (0/2) Epoch 7, batch 500, loss[loss=0.3094, ctc_loss=0.22, cr_loss=0.447, over 20846.00 frames. ], tot_loss[loss=0.2925, ctc_loss=0.2104, cr_loss=0.4104, over 3745859.91 frames. ], batch size: 65, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:52:52,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=22.5 2024-09-14 09:53:02,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=110089.16666666667, ans=0.125 2024-09-14 09:53:04,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110089.16666666667, ans=0.1 2024-09-14 09:53:11,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=110117.5, ans=0.05 2024-09-14 09:53:55,322 INFO [train.py:1198] (0/2) Epoch 7, batch 550, loss[loss=0.3115, ctc_loss=0.2253, cr_loss=0.431, over 20692.00 frames. ], tot_loss[loss=0.2915, ctc_loss=0.2096, cr_loss=0.4099, over 3835857.87 frames. ], batch size: 71, lr: 1.24e-02, grad_scale: 32.0 2024-09-14 09:54:03,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=110202.5, ans=0.125 2024-09-14 09:55:02,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=110315.83333333333, ans=0.125 2024-09-14 09:55:04,110 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.091e+02 2.360e+02 2.552e+02 4.215e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-14 09:55:14,571 INFO [train.py:1198] (0/2) Epoch 7, batch 600, loss[loss=0.2303, ctc_loss=0.1633, cr_loss=0.3348, over 19934.00 frames. ], tot_loss[loss=0.2912, ctc_loss=0.2094, cr_loss=0.4092, over 3880260.50 frames. ], batch size: 44, lr: 1.23e-02, grad_scale: 64.0 2024-09-14 09:55:36,528 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=12.0 2024-09-14 09:55:41,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-09-14 09:55:44,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=110400.83333333333, ans=0.125 2024-09-14 09:55:44,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.97 vs. limit=6.0 2024-09-14 09:56:19,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110457.5, ans=0.1 2024-09-14 09:56:29,902 INFO [train.py:1198] (0/2) Epoch 7, batch 650, loss[loss=0.2663, ctc_loss=0.1886, cr_loss=0.3881, over 20933.00 frames. ], tot_loss[loss=0.2913, ctc_loss=0.2095, cr_loss=0.4091, over 3922760.37 frames. ], batch size: 49, lr: 1.23e-02, grad_scale: 64.0 2024-09-14 09:57:39,735 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.076e+02 2.242e+02 2.482e+02 3.772e+02, threshold=4.484e+02, percent-clipped=0.0 2024-09-14 09:57:48,718 INFO [train.py:1198] (0/2) Epoch 7, batch 700, loss[loss=0.2873, ctc_loss=0.2037, cr_loss=0.4183, over 21055.00 frames. ], tot_loss[loss=0.2906, ctc_loss=0.2088, cr_loss=0.4086, over 3974577.91 frames. ], batch size: 56, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 09:59:04,101 INFO [train.py:1198] (0/2) Epoch 7, batch 750, loss[loss=0.3193, ctc_loss=0.2336, cr_loss=0.4284, over 20840.00 frames. ], tot_loss[loss=0.2903, ctc_loss=0.2085, cr_loss=0.4086, over 4002641.78 frames. ], batch size: 59, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 09:59:40,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110825.83333333333, ans=0.1 2024-09-14 09:59:46,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=110825.83333333333, ans=0.0 2024-09-14 09:59:48,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-09-14 10:00:13,804 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.111e+02 2.325e+02 2.832e+02 4.896e+02, threshold=4.649e+02, percent-clipped=1.0 2024-09-14 10:00:23,090 INFO [train.py:1198] (0/2) Epoch 7, batch 800, loss[loss=0.2717, ctc_loss=0.1918, cr_loss=0.3996, over 21024.00 frames. ], tot_loss[loss=0.2894, ctc_loss=0.208, cr_loss=0.4074, over 4021042.31 frames. ], batch size: 62, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:00:53,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=110967.5, ans=0.0 2024-09-14 10:01:04,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=110967.5, ans=0.0 2024-09-14 10:01:23,258 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2024-09-14 10:01:27,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=111024.16666666667, ans=0.125 2024-09-14 10:01:37,041 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.55 vs. limit=10.0 2024-09-14 10:01:39,258 INFO [train.py:1198] (0/2) Epoch 7, batch 850, loss[loss=0.2903, ctc_loss=0.209, cr_loss=0.4064, over 20949.00 frames. ], tot_loss[loss=0.29, ctc_loss=0.2085, cr_loss=0.4078, over 4022104.20 frames. ], batch size: 64, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:01:55,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=111080.83333333333, ans=0.125 2024-09-14 10:01:56,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=111080.83333333333, ans=0.0 2024-09-14 10:01:59,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=111080.83333333333, ans=0.0 2024-09-14 10:01:59,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=111080.83333333333, ans=0.125 2024-09-14 10:02:11,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111109.16666666667, ans=0.1 2024-09-14 10:02:23,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=111137.5, ans=0.0 2024-09-14 10:02:35,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=111137.5, ans=0.5 2024-09-14 10:02:47,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=111165.83333333333, ans=0.05 2024-09-14 10:02:48,936 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.181e+02 2.357e+02 2.694e+02 4.495e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-14 10:02:58,172 INFO [train.py:1198] (0/2) Epoch 7, batch 900, loss[loss=0.286, ctc_loss=0.2059, cr_loss=0.4001, over 20711.00 frames. ], tot_loss[loss=0.2917, ctc_loss=0.2098, cr_loss=0.4093, over 4029885.71 frames. ], batch size: 71, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:03:01,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=111194.16666666667, ans=0.125 2024-09-14 10:03:03,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2024-09-14 10:03:30,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=111250.83333333333, ans=0.0 2024-09-14 10:03:45,896 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2024-09-14 10:03:52,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=111279.16666666667, ans=0.125 2024-09-14 10:04:13,825 INFO [train.py:1198] (0/2) Epoch 7, batch 950, loss[loss=0.2835, ctc_loss=0.1997, cr_loss=0.419, over 20784.00 frames. ], tot_loss[loss=0.2908, ctc_loss=0.2091, cr_loss=0.4082, over 4025492.99 frames. ], batch size: 53, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:04:25,582 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=22.5 2024-09-14 10:04:46,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.29 vs. limit=15.0 2024-09-14 10:05:16,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=111449.16666666667, ans=0.2 2024-09-14 10:05:17,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=111449.16666666667, ans=0.2 2024-09-14 10:05:20,378 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.117e+02 2.264e+02 2.484e+02 5.104e+02, threshold=4.529e+02, percent-clipped=2.0 2024-09-14 10:05:29,491 INFO [train.py:1198] (0/2) Epoch 7, batch 1000, loss[loss=0.3234, ctc_loss=0.2375, cr_loss=0.4296, over 20710.00 frames. ], tot_loss[loss=0.2905, ctc_loss=0.2089, cr_loss=0.4079, over 4038074.48 frames. ], batch size: 71, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:06:13,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=111534.16666666667, ans=0.0 2024-09-14 10:06:28,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=111562.5, ans=0.5 2024-09-14 10:06:34,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=111590.83333333333, ans=0.125 2024-09-14 10:06:40,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=111590.83333333333, ans=0.125 2024-09-14 10:06:47,983 INFO [train.py:1198] (0/2) Epoch 7, batch 1050, loss[loss=0.2442, ctc_loss=0.1725, cr_loss=0.3585, over 21000.00 frames. ], tot_loss[loss=0.2908, ctc_loss=0.2092, cr_loss=0.4084, over 4051486.92 frames. ], batch size: 48, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:07:22,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=111675.83333333333, ans=0.0 2024-09-14 10:07:47,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=111732.5, ans=0.0 2024-09-14 10:07:54,552 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.102e+02 2.215e+02 2.394e+02 3.325e+02, threshold=4.431e+02, percent-clipped=0.0 2024-09-14 10:08:03,633 INFO [train.py:1198] (0/2) Epoch 7, batch 1100, loss[loss=0.2489, ctc_loss=0.1757, cr_loss=0.3664, over 20914.00 frames. ], tot_loss[loss=0.2908, ctc_loss=0.2091, cr_loss=0.4087, over 4073657.04 frames. ], batch size: 50, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:08:30,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.47 vs. limit=10.0 2024-09-14 10:08:32,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=111789.16666666667, ans=6.0 2024-09-14 10:09:09,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=111874.16666666667, ans=0.2 2024-09-14 10:09:13,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=111874.16666666667, ans=0.025 2024-09-14 10:09:22,258 INFO [train.py:1198] (0/2) Epoch 7, batch 1150, loss[loss=0.2984, ctc_loss=0.2115, cr_loss=0.4347, over 20639.00 frames. ], tot_loss[loss=0.2899, ctc_loss=0.2084, cr_loss=0.4074, over 4075222.40 frames. ], batch size: 66, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:09:44,060 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 10:10:22,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=112015.83333333333, ans=0.125 2024-09-14 10:10:23,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=112015.83333333333, ans=0.2 2024-09-14 10:10:28,446 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.073e+02 2.271e+02 2.649e+02 5.867e+02, threshold=4.542e+02, percent-clipped=1.0 2024-09-14 10:10:29,296 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-09-14 10:10:37,493 INFO [train.py:1198] (0/2) Epoch 7, batch 1200, loss[loss=0.2682, ctc_loss=0.1939, cr_loss=0.3715, over 20998.00 frames. ], tot_loss[loss=0.2899, ctc_loss=0.2084, cr_loss=0.4071, over 4086906.72 frames. ], batch size: 52, lr: 1.23e-02, grad_scale: 32.0 2024-09-14 10:10:51,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112072.5, ans=0.1 2024-09-14 10:10:56,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=112072.5, ans=0.0 2024-09-14 10:11:09,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=112100.83333333333, ans=0.125 2024-09-14 10:11:31,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=112129.16666666667, ans=0.0 2024-09-14 10:11:48,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=12.0 2024-09-14 10:11:55,991 INFO [train.py:1198] (0/2) Epoch 7, batch 1250, loss[loss=0.2905, ctc_loss=0.2076, cr_loss=0.4147, over 20957.00 frames. ], tot_loss[loss=0.2913, ctc_loss=0.2097, cr_loss=0.4081, over 4072984.97 frames. ], batch size: 60, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:12:07,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=112185.83333333333, ans=0.125 2024-09-14 10:12:43,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112270.83333333333, ans=0.1 2024-09-14 10:12:43,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2024-09-14 10:12:57,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.98 vs. limit=6.0 2024-09-14 10:13:02,638 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.125e+02 2.290e+02 2.823e+02 4.648e+02, threshold=4.581e+02, percent-clipped=1.0 2024-09-14 10:13:02,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=112299.16666666667, ans=0.2 2024-09-14 10:13:11,853 INFO [train.py:1198] (0/2) Epoch 7, batch 1300, loss[loss=0.2769, ctc_loss=0.198, cr_loss=0.394, over 21059.00 frames. ], tot_loss[loss=0.2914, ctc_loss=0.2097, cr_loss=0.4086, over 4076243.10 frames. ], batch size: 53, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:13:28,286 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=15.0 2024-09-14 10:13:44,765 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2024-09-14 10:13:51,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=112384.16666666667, ans=0.125 2024-09-14 10:14:20,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112440.83333333333, ans=0.1 2024-09-14 10:14:31,182 INFO [train.py:1198] (0/2) Epoch 7, batch 1350, loss[loss=0.2555, ctc_loss=0.1792, cr_loss=0.3813, over 20307.00 frames. ], tot_loss[loss=0.2917, ctc_loss=0.2099, cr_loss=0.409, over 4083186.29 frames. ], batch size: 45, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:14:45,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=112497.5, ans=0.0 2024-09-14 10:15:14,416 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 10:15:34,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=112582.5, ans=0.125 2024-09-14 10:15:38,508 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.151e+02 2.366e+02 2.823e+02 4.048e+02, threshold=4.732e+02, percent-clipped=0.0 2024-09-14 10:15:43,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=112582.5, ans=0.125 2024-09-14 10:15:47,611 INFO [train.py:1198] (0/2) Epoch 7, batch 1400, loss[loss=0.2491, ctc_loss=0.1771, cr_loss=0.3598, over 20956.00 frames. ], tot_loss[loss=0.2899, ctc_loss=0.2084, cr_loss=0.4075, over 4085162.20 frames. ], batch size: 48, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:15:52,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-14 10:16:13,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=112639.16666666667, ans=0.0 2024-09-14 10:16:28,905 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 10:16:38,259 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=22.5 2024-09-14 10:16:46,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112724.16666666667, ans=0.1 2024-09-14 10:16:48,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.67 vs. limit=22.5 2024-09-14 10:16:50,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2024-09-14 10:17:03,108 INFO [train.py:1198] (0/2) Epoch 7, batch 1450, loss[loss=0.3615, ctc_loss=0.2756, cr_loss=0.4296, over 14439.00 frames. ], tot_loss[loss=0.289, ctc_loss=0.2077, cr_loss=0.4065, over 4095926.21 frames. ], batch size: 149, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:17:08,174 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 10:17:11,517 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2024-09-14 10:17:46,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=112809.16666666667, ans=0.125 2024-09-14 10:17:52,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=112837.5, ans=0.125 2024-09-14 10:18:13,204 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.149e+02 2.360e+02 2.729e+02 4.113e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-14 10:18:21,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=112894.16666666667, ans=0.125 2024-09-14 10:18:22,404 INFO [train.py:1198] (0/2) Epoch 7, batch 1500, loss[loss=0.3271, ctc_loss=0.2421, cr_loss=0.4248, over 20329.00 frames. ], tot_loss[loss=0.2895, ctc_loss=0.2081, cr_loss=0.4071, over 4104333.80 frames. ], batch size: 74, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:18:30,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=112894.16666666667, ans=0.0 2024-09-14 10:18:38,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-09-14 10:18:38,659 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2024-09-14 10:19:22,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=113007.5, ans=0.0 2024-09-14 10:19:27,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2024-09-14 10:19:36,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=113035.83333333333, ans=0.125 2024-09-14 10:19:37,618 INFO [train.py:1198] (0/2) Epoch 7, batch 1550, loss[loss=0.2846, ctc_loss=0.2068, cr_loss=0.389, over 20311.00 frames. ], tot_loss[loss=0.2903, ctc_loss=0.2087, cr_loss=0.4076, over 4100924.45 frames. ], batch size: 74, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:20:05,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=113064.16666666667, ans=0.125 2024-09-14 10:20:30,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=113120.83333333333, ans=0.125 2024-09-14 10:20:47,926 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.192e+02 2.442e+02 2.769e+02 5.210e+02, threshold=4.884e+02, percent-clipped=1.0 2024-09-14 10:20:48,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=113149.16666666667, ans=0.125 2024-09-14 10:20:49,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=113149.16666666667, ans=0.2 2024-09-14 10:20:56,686 INFO [train.py:1198] (0/2) Epoch 7, batch 1600, loss[loss=0.3294, ctc_loss=0.2401, cr_loss=0.4465, over 21005.00 frames. ], tot_loss[loss=0.29, ctc_loss=0.2085, cr_loss=0.4078, over 4106928.38 frames. ], batch size: 63, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:21:02,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=113177.5, ans=0.0 2024-09-14 10:21:24,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=113205.83333333333, ans=0.05 2024-09-14 10:21:53,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=113262.5, ans=10.0 2024-09-14 10:21:54,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=113262.5, ans=0.125 2024-09-14 10:22:12,820 INFO [train.py:1198] (0/2) Epoch 7, batch 1650, loss[loss=0.2874, ctc_loss=0.2071, cr_loss=0.4013, over 21070.00 frames. ], tot_loss[loss=0.2905, ctc_loss=0.2087, cr_loss=0.4089, over 4099095.15 frames. ], batch size: 59, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:22:18,898 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-40000.pt 2024-09-14 10:22:35,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=113347.5, ans=0.125 2024-09-14 10:22:44,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=113375.83333333333, ans=0.2 2024-09-14 10:23:22,565 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.157e+02 2.384e+02 2.743e+02 4.246e+02, threshold=4.769e+02, percent-clipped=0.0 2024-09-14 10:23:24,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=113432.5, ans=0.125 2024-09-14 10:23:31,851 INFO [train.py:1198] (0/2) Epoch 7, batch 1700, loss[loss=0.2957, ctc_loss=0.2097, cr_loss=0.4302, over 21051.00 frames. ], tot_loss[loss=0.2898, ctc_loss=0.2081, cr_loss=0.4087, over 4108799.45 frames. ], batch size: 62, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:23:32,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=113460.83333333333, ans=0.125 2024-09-14 10:23:52,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.62 vs. limit=22.5 2024-09-14 10:24:12,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=113517.5, ans=0.1 2024-09-14 10:24:17,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-14 10:24:32,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=113574.16666666667, ans=0.125 2024-09-14 10:24:41,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=113574.16666666667, ans=0.0 2024-09-14 10:24:43,035 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 10:24:45,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2024-09-14 10:24:47,344 INFO [train.py:1198] (0/2) Epoch 7, batch 1750, loss[loss=0.3064, ctc_loss=0.2216, cr_loss=0.4236, over 21021.00 frames. ], tot_loss[loss=0.2908, ctc_loss=0.2088, cr_loss=0.4101, over 4113322.48 frames. ], batch size: 61, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:24:49,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=113602.5, ans=0.125 2024-09-14 10:24:51,236 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2024-09-14 10:25:16,831 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-09-14 10:25:45,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=113687.5, ans=0.0 2024-09-14 10:25:51,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=113715.83333333333, ans=0.0 2024-09-14 10:25:56,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.134e+02 2.288e+02 2.521e+02 6.349e+02, threshold=4.577e+02, percent-clipped=1.0 2024-09-14 10:26:05,982 INFO [train.py:1198] (0/2) Epoch 7, batch 1800, loss[loss=0.2979, ctc_loss=0.2164, cr_loss=0.4076, over 21026.00 frames. ], tot_loss[loss=0.2907, ctc_loss=0.2087, cr_loss=0.4101, over 4111792.90 frames. ], batch size: 61, lr: 1.22e-02, grad_scale: 32.0 2024-09-14 10:26:41,487 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2024-09-14 10:27:21,553 INFO [train.py:1198] (0/2) Epoch 7, batch 1850, loss[loss=0.2801, ctc_loss=0.2023, cr_loss=0.3891, over 20785.00 frames. ], tot_loss[loss=0.2899, ctc_loss=0.2082, cr_loss=0.4089, over 4106090.88 frames. ], batch size: 56, lr: 1.22e-02, grad_scale: 16.0 2024-09-14 10:27:35,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=113914.16666666667, ans=0.125 2024-09-14 10:27:37,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=113914.16666666667, ans=0.1 2024-09-14 10:28:02,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=113942.5, ans=0.2 2024-09-14 10:28:05,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=113942.5, ans=0.0 2024-09-14 10:28:26,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=113999.16666666667, ans=12.0 2024-09-14 10:28:30,312 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.142e+02 2.441e+02 2.773e+02 5.535e+02, threshold=4.882e+02, percent-clipped=1.0 2024-09-14 10:28:37,791 INFO [train.py:1198] (0/2) Epoch 7, batch 1900, loss[loss=0.2658, ctc_loss=0.1904, cr_loss=0.3767, over 20892.00 frames. ], tot_loss[loss=0.2885, ctc_loss=0.207, cr_loss=0.4075, over 4106044.87 frames. ], batch size: 54, lr: 1.22e-02, grad_scale: 16.0 2024-09-14 10:28:43,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=114027.5, ans=0.125 2024-09-14 10:29:01,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=114055.83333333333, ans=0.09899494936611666 2024-09-14 10:29:27,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=114112.5, ans=0.125 2024-09-14 10:29:36,694 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 10:29:56,226 INFO [train.py:1198] (0/2) Epoch 7, batch 1950, loss[loss=0.2998, ctc_loss=0.2141, cr_loss=0.4286, over 20782.00 frames. ], tot_loss[loss=0.2882, ctc_loss=0.2066, cr_loss=0.4078, over 4107081.53 frames. ], batch size: 56, lr: 1.21e-02, grad_scale: 16.0 2024-09-14 10:30:20,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=114197.5, ans=0.0 2024-09-14 10:30:22,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2024-09-14 10:30:38,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=114225.83333333333, ans=0.035 2024-09-14 10:30:47,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=114254.16666666667, ans=0.0 2024-09-14 10:31:07,341 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.120e+02 2.285e+02 2.612e+02 4.419e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-14 10:31:15,031 INFO [train.py:1198] (0/2) Epoch 7, batch 2000, loss[loss=0.2973, ctc_loss=0.2111, cr_loss=0.4309, over 21018.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.207, cr_loss=0.4086, over 4121576.07 frames. ], batch size: 63, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:31:24,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=114310.83333333333, ans=0.125 2024-09-14 10:32:28,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=114424.16666666667, ans=0.2 2024-09-14 10:32:31,331 INFO [train.py:1198] (0/2) Epoch 7, batch 2050, loss[loss=0.3093, ctc_loss=0.2256, cr_loss=0.4188, over 19479.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2065, cr_loss=0.4073, over 4108302.36 frames. ], batch size: 90, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:32:39,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=114452.5, ans=0.125 2024-09-14 10:32:42,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=114452.5, ans=0.125 2024-09-14 10:32:53,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=114480.83333333333, ans=0.025 2024-09-14 10:32:56,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=114480.83333333333, ans=0.0 2024-09-14 10:32:56,266 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-14 10:32:59,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=114480.83333333333, ans=0.1 2024-09-14 10:33:14,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=114509.16666666667, ans=0.025 2024-09-14 10:33:34,357 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 10:33:40,023 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.101e+02 2.291e+02 2.639e+02 5.111e+02, threshold=4.582e+02, percent-clipped=1.0 2024-09-14 10:33:47,918 INFO [train.py:1198] (0/2) Epoch 7, batch 2100, loss[loss=0.2914, ctc_loss=0.2105, cr_loss=0.4042, over 18186.00 frames. ], tot_loss[loss=0.2873, ctc_loss=0.2061, cr_loss=0.4062, over 4108358.66 frames. ], batch size: 108, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:34:03,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=114622.5, ans=0.0 2024-09-14 10:34:18,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=114650.83333333333, ans=0.0 2024-09-14 10:34:24,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=114650.83333333333, ans=0.09899494936611666 2024-09-14 10:34:32,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=114679.16666666667, ans=0.1 2024-09-14 10:34:37,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=114679.16666666667, ans=0.025 2024-09-14 10:34:44,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=114679.16666666667, ans=0.125 2024-09-14 10:34:59,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=114707.5, ans=0.125 2024-09-14 10:35:06,999 INFO [train.py:1198] (0/2) Epoch 7, batch 2150, loss[loss=0.3375, ctc_loss=0.2498, cr_loss=0.4386, over 19562.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2067, cr_loss=0.4068, over 4104002.86 frames. ], batch size: 90, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:35:24,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=114764.16666666667, ans=0.125 2024-09-14 10:35:25,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=114764.16666666667, ans=0.0 2024-09-14 10:35:32,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=114764.16666666667, ans=0.025 2024-09-14 10:35:43,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=114792.5, ans=0.125 2024-09-14 10:35:46,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=114792.5, ans=0.125 2024-09-14 10:36:14,556 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.123e+02 2.357e+02 2.620e+02 4.145e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-14 10:36:14,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=114849.16666666667, ans=0.125 2024-09-14 10:36:19,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=114849.16666666667, ans=0.2 2024-09-14 10:36:22,045 INFO [train.py:1198] (0/2) Epoch 7, batch 2200, loss[loss=0.2395, ctc_loss=0.1692, cr_loss=0.3515, over 20975.00 frames. ], tot_loss[loss=0.29, ctc_loss=0.2084, cr_loss=0.4081, over 4087569.55 frames. ], batch size: 51, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:36:24,132 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-14 10:36:34,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=114877.5, ans=0.015 2024-09-14 10:36:39,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=114905.83333333333, ans=0.125 2024-09-14 10:36:50,057 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2024-09-14 10:36:53,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=114905.83333333333, ans=0.0 2024-09-14 10:36:54,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=114934.16666666667, ans=0.125 2024-09-14 10:36:55,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=114934.16666666667, ans=0.125 2024-09-14 10:37:41,450 INFO [train.py:1198] (0/2) Epoch 7, batch 2250, loss[loss=0.2301, ctc_loss=0.1654, cr_loss=0.3239, over 20941.00 frames. ], tot_loss[loss=0.2898, ctc_loss=0.2082, cr_loss=0.4081, over 4090911.52 frames. ], batch size: 49, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:38:10,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=115075.83333333333, ans=0.0 2024-09-14 10:38:28,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=115104.16666666667, ans=0.125 2024-09-14 10:38:36,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=115104.16666666667, ans=0.125 2024-09-14 10:38:36,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=115104.16666666667, ans=0.025 2024-09-14 10:38:49,940 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.138e+02 2.343e+02 2.723e+02 4.756e+02, threshold=4.687e+02, percent-clipped=1.0 2024-09-14 10:38:57,409 INFO [train.py:1198] (0/2) Epoch 7, batch 2300, loss[loss=0.2903, ctc_loss=0.2115, cr_loss=0.394, over 20818.00 frames. ], tot_loss[loss=0.2904, ctc_loss=0.2087, cr_loss=0.4084, over 4083737.22 frames. ], batch size: 59, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:39:17,706 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2024-09-14 10:39:21,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=115189.16666666667, ans=0.125 2024-09-14 10:39:47,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=115245.83333333333, ans=0.125 2024-09-14 10:40:12,466 INFO [train.py:1198] (0/2) Epoch 7, batch 2350, loss[loss=0.2742, ctc_loss=0.1961, cr_loss=0.3903, over 20658.00 frames. ], tot_loss[loss=0.2896, ctc_loss=0.208, cr_loss=0.4081, over 4094418.38 frames. ], batch size: 68, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:40:53,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=115359.16666666667, ans=0.0 2024-09-14 10:41:23,774 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.116e+02 2.345e+02 2.707e+02 4.552e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-14 10:41:31,622 INFO [train.py:1198] (0/2) Epoch 7, batch 2400, loss[loss=0.275, ctc_loss=0.1929, cr_loss=0.4104, over 20994.00 frames. ], tot_loss[loss=0.2891, ctc_loss=0.2076, cr_loss=0.4074, over 4100903.46 frames. ], batch size: 52, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:41:40,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=115444.16666666667, ans=0.125 2024-09-14 10:41:57,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=115472.5, ans=0.0 2024-09-14 10:42:49,860 INFO [train.py:1198] (0/2) Epoch 7, batch 2450, loss[loss=0.3237, ctc_loss=0.2312, cr_loss=0.4624, over 20650.00 frames. ], tot_loss[loss=0.2891, ctc_loss=0.2075, cr_loss=0.4078, over 4097499.15 frames. ], batch size: 66, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:43:19,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=115642.5, ans=0.125 2024-09-14 10:43:27,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=115642.5, ans=0.0 2024-09-14 10:43:46,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.82 vs. limit=10.0 2024-09-14 10:43:57,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.62 vs. limit=22.5 2024-09-14 10:43:58,240 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.055e+02 2.185e+02 2.524e+02 3.228e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-14 10:44:00,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2024-09-14 10:44:06,161 INFO [train.py:1198] (0/2) Epoch 7, batch 2500, loss[loss=0.3047, ctc_loss=0.2188, cr_loss=0.4293, over 20967.00 frames. ], tot_loss[loss=0.2883, ctc_loss=0.2069, cr_loss=0.4071, over 4104127.34 frames. ], batch size: 67, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:44:08,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=115727.5, ans=0.125 2024-09-14 10:44:26,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.13 vs. limit=10.0 2024-09-14 10:44:28,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=115755.83333333333, ans=0.125 2024-09-14 10:44:58,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115812.5, ans=0.1 2024-09-14 10:45:04,522 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2024-09-14 10:45:06,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2024-09-14 10:45:16,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=115840.83333333333, ans=0.5 2024-09-14 10:45:22,097 INFO [train.py:1198] (0/2) Epoch 7, batch 2550, loss[loss=0.2753, ctc_loss=0.1987, cr_loss=0.3828, over 21066.00 frames. ], tot_loss[loss=0.2895, ctc_loss=0.208, cr_loss=0.4075, over 4088673.43 frames. ], batch size: 53, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:45:54,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=115925.83333333333, ans=0.125 2024-09-14 10:46:33,240 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.166e+02 2.412e+02 2.780e+02 5.008e+02, threshold=4.825e+02, percent-clipped=1.0 2024-09-14 10:46:40,821 INFO [train.py:1198] (0/2) Epoch 7, batch 2600, loss[loss=0.2983, ctc_loss=0.2158, cr_loss=0.4122, over 20874.00 frames. ], tot_loss[loss=0.2898, ctc_loss=0.2083, cr_loss=0.4075, over 4085477.98 frames. ], batch size: 57, lr: 1.21e-02, grad_scale: 32.0 2024-09-14 10:47:01,298 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2024-09-14 10:47:17,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=116067.5, ans=0.0 2024-09-14 10:47:18,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=116067.5, ans=0.0 2024-09-14 10:47:23,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=116067.5, ans=0.125 2024-09-14 10:47:31,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.46 vs. limit=15.0 2024-09-14 10:47:56,314 INFO [train.py:1198] (0/2) Epoch 7, batch 2650, loss[loss=0.2462, ctc_loss=0.1725, cr_loss=0.3684, over 20935.00 frames. ], tot_loss[loss=0.2883, ctc_loss=0.207, cr_loss=0.4064, over 4075674.06 frames. ], batch size: 48, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:48:25,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=116180.83333333333, ans=0.125 2024-09-14 10:48:51,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=116237.5, ans=0.0 2024-09-14 10:49:07,387 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.066e+02 2.216e+02 2.463e+02 5.166e+02, threshold=4.432e+02, percent-clipped=1.0 2024-09-14 10:49:12,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.31 vs. limit=22.5 2024-09-14 10:49:14,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=116294.16666666667, ans=0.025 2024-09-14 10:49:15,235 INFO [train.py:1198] (0/2) Epoch 7, batch 2700, loss[loss=0.3919, ctc_loss=0.2994, cr_loss=0.4624, over 14120.00 frames. ], tot_loss[loss=0.2898, ctc_loss=0.2083, cr_loss=0.4076, over 4063111.44 frames. ], batch size: 149, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:49:20,401 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 10:49:35,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=116322.5, ans=0.125 2024-09-14 10:50:02,728 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 10:50:08,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=116379.16666666667, ans=0.125 2024-09-14 10:50:30,900 INFO [train.py:1198] (0/2) Epoch 7, batch 2750, loss[loss=0.2626, ctc_loss=0.185, cr_loss=0.3878, over 21052.00 frames. ], tot_loss[loss=0.2891, ctc_loss=0.2076, cr_loss=0.4072, over 4063685.32 frames. ], batch size: 56, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:51:32,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2024-09-14 10:51:38,723 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.148e+02 2.366e+02 2.624e+02 4.175e+02, threshold=4.733e+02, percent-clipped=0.0 2024-09-14 10:51:46,177 INFO [train.py:1198] (0/2) Epoch 7, batch 2800, loss[loss=0.3118, ctc_loss=0.2226, cr_loss=0.4457, over 20946.00 frames. ], tot_loss[loss=0.2902, ctc_loss=0.2085, cr_loss=0.4083, over 4066999.93 frames. ], batch size: 67, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:52:07,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=116605.83333333333, ans=0.0 2024-09-14 10:52:30,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2024-09-14 10:53:03,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2024-09-14 10:53:04,516 INFO [train.py:1198] (0/2) Epoch 7, batch 2850, loss[loss=0.2762, ctc_loss=0.1987, cr_loss=0.3873, over 21056.00 frames. ], tot_loss[loss=0.2897, ctc_loss=0.2081, cr_loss=0.4082, over 4072461.34 frames. ], batch size: 59, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:53:29,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116747.5, ans=0.1 2024-09-14 10:53:35,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=116775.83333333333, ans=0.07 2024-09-14 10:53:35,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=116775.83333333333, ans=0.1 2024-09-14 10:54:06,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=116832.5, ans=0.125 2024-09-14 10:54:15,452 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.089e+02 2.446e+02 2.967e+02 4.494e+02, threshold=4.891e+02, percent-clipped=0.0 2024-09-14 10:54:22,775 INFO [train.py:1198] (0/2) Epoch 7, batch 2900, loss[loss=0.3007, ctc_loss=0.2204, cr_loss=0.4014, over 18321.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.2065, cr_loss=0.4065, over 4082551.97 frames. ], batch size: 108, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:54:24,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=116860.83333333333, ans=0.05 2024-09-14 10:54:26,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.21 vs. limit=15.0 2024-09-14 10:54:29,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=116860.83333333333, ans=0.2 2024-09-14 10:54:29,552 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-09-14 10:55:38,994 INFO [train.py:1198] (0/2) Epoch 7, batch 2950, loss[loss=0.2557, ctc_loss=0.1824, cr_loss=0.3663, over 20919.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.2072, cr_loss=0.4078, over 4078000.30 frames. ], batch size: 49, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:56:36,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=117087.5, ans=0.2 2024-09-14 10:56:44,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=117115.83333333333, ans=0.035 2024-09-14 10:56:45,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=117115.83333333333, ans=0.125 2024-09-14 10:56:46,985 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.145e+02 2.311e+02 2.716e+02 4.685e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-14 10:56:54,608 INFO [train.py:1198] (0/2) Epoch 7, batch 3000, loss[loss=0.3071, ctc_loss=0.2258, cr_loss=0.4063, over 20655.00 frames. ], tot_loss[loss=0.2896, ctc_loss=0.2079, cr_loss=0.4084, over 4076134.32 frames. ], batch size: 68, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:56:54,609 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 10:57:06,420 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4255, 4.3552, 4.3690, 3.8381], device='cuda:0') 2024-09-14 10:57:16,779 INFO [train.py:1230] (0/2) Epoch 7, validation: loss=0.06173, ctc_loss=0.06173, cr_loss=8.781e-15, over 944034.00 frames. 2024-09-14 10:57:16,780 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 10:57:24,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=117144.16666666667, ans=0.125 2024-09-14 10:57:30,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=117172.5, ans=0.0 2024-09-14 10:58:05,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=117229.16666666667, ans=0.2 2024-09-14 10:58:07,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=117229.16666666667, ans=0.0 2024-09-14 10:58:10,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-09-14 10:58:13,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=117229.16666666667, ans=0.025 2024-09-14 10:58:30,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-09-14 10:58:31,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=117285.83333333333, ans=0.125 2024-09-14 10:58:32,605 INFO [train.py:1198] (0/2) Epoch 7, batch 3050, loss[loss=0.2476, ctc_loss=0.1747, cr_loss=0.3648, over 20989.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2067, cr_loss=0.4072, over 4085615.85 frames. ], batch size: 52, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 10:58:40,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-09-14 10:58:42,633 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2024-09-14 10:59:44,213 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.268e+02 2.424e+02 2.793e+02 4.428e+02, threshold=4.848e+02, percent-clipped=0.0 2024-09-14 10:59:51,828 INFO [train.py:1198] (0/2) Epoch 7, batch 3100, loss[loss=0.2274, ctc_loss=0.1587, cr_loss=0.3435, over 20974.00 frames. ], tot_loss[loss=0.2885, ctc_loss=0.207, cr_loss=0.4073, over 4079719.01 frames. ], batch size: 49, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 11:00:01,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=117427.5, ans=0.2 2024-09-14 11:00:17,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=117455.83333333333, ans=0.1 2024-09-14 11:01:07,782 INFO [train.py:1198] (0/2) Epoch 7, batch 3150, loss[loss=0.2856, ctc_loss=0.2067, cr_loss=0.3945, over 20868.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2066, cr_loss=0.4069, over 4077314.70 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 11:01:15,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=117569.16666666667, ans=0.125 2024-09-14 11:01:37,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=117625.83333333333, ans=22.5 2024-09-14 11:01:57,064 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.01 vs. limit=6.0 2024-09-14 11:01:58,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=117654.16666666667, ans=0.125 2024-09-14 11:02:15,665 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.073e+02 2.278e+02 2.514e+02 3.593e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-14 11:02:23,420 INFO [train.py:1198] (0/2) Epoch 7, batch 3200, loss[loss=0.2521, ctc_loss=0.1793, cr_loss=0.3641, over 20971.00 frames. ], tot_loss[loss=0.2872, ctc_loss=0.2061, cr_loss=0.4058, over 4081279.09 frames. ], batch size: 51, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 11:03:04,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=117767.5, ans=0.125 2024-09-14 11:03:13,845 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=12.0 2024-09-14 11:03:34,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=117824.16666666667, ans=0.125 2024-09-14 11:03:41,495 INFO [train.py:1198] (0/2) Epoch 7, batch 3250, loss[loss=0.2671, ctc_loss=0.1885, cr_loss=0.393, over 21001.00 frames. ], tot_loss[loss=0.2871, ctc_loss=0.206, cr_loss=0.4053, over 4082892.98 frames. ], batch size: 52, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 11:03:57,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=117880.83333333333, ans=0.125 2024-09-14 11:03:58,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=117880.83333333333, ans=0.125 2024-09-14 11:04:21,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=117909.16666666667, ans=0.2 2024-09-14 11:04:22,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=117909.16666666667, ans=0.2 2024-09-14 11:04:46,998 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:04:49,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.120e+02 2.281e+02 2.578e+02 3.524e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-14 11:04:57,205 INFO [train.py:1198] (0/2) Epoch 7, batch 3300, loss[loss=0.2081, ctc_loss=0.1447, cr_loss=0.3171, over 20913.00 frames. ], tot_loss[loss=0.2888, ctc_loss=0.2073, cr_loss=0.4073, over 4072261.68 frames. ], batch size: 50, lr: 1.20e-02, grad_scale: 32.0 2024-09-14 11:05:06,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=117994.16666666667, ans=0.0 2024-09-14 11:05:13,260 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.97 vs. limit=6.0 2024-09-14 11:05:17,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118022.5, ans=0.1 2024-09-14 11:05:23,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=22.5 2024-09-14 11:05:58,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=118079.16666666667, ans=0.0 2024-09-14 11:06:05,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=118107.5, ans=0.2 2024-09-14 11:06:06,384 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2024-09-14 11:06:15,985 INFO [train.py:1198] (0/2) Epoch 7, batch 3350, loss[loss=0.2912, ctc_loss=0.2165, cr_loss=0.3734, over 18241.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2065, cr_loss=0.4066, over 4080129.96 frames. ], batch size: 108, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:06:25,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-14 11:06:43,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=118164.16666666667, ans=0.0 2024-09-14 11:07:24,164 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.119e+02 2.422e+02 2.854e+02 5.353e+02, threshold=4.844e+02, percent-clipped=1.0 2024-09-14 11:07:31,626 INFO [train.py:1198] (0/2) Epoch 7, batch 3400, loss[loss=0.2564, ctc_loss=0.1833, cr_loss=0.3656, over 20994.00 frames. ], tot_loss[loss=0.2882, ctc_loss=0.2069, cr_loss=0.4065, over 4079925.57 frames. ], batch size: 52, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:08:20,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118362.5, ans=0.1 2024-09-14 11:08:42,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=118390.83333333333, ans=0.0 2024-09-14 11:08:50,951 INFO [train.py:1198] (0/2) Epoch 7, batch 3450, loss[loss=0.225, ctc_loss=0.157, cr_loss=0.34, over 20963.00 frames. ], tot_loss[loss=0.2876, ctc_loss=0.2064, cr_loss=0.406, over 4072513.81 frames. ], batch size: 49, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:08:54,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=118419.16666666667, ans=0.0 2024-09-14 11:08:54,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2024-09-14 11:09:26,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=118475.83333333333, ans=0.125 2024-09-14 11:09:50,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=118532.5, ans=0.125 2024-09-14 11:09:59,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.162e+02 2.361e+02 2.765e+02 4.066e+02, threshold=4.723e+02, percent-clipped=0.0 2024-09-14 11:10:06,971 INFO [train.py:1198] (0/2) Epoch 7, batch 3500, loss[loss=0.2744, ctc_loss=0.1955, cr_loss=0.3947, over 20786.00 frames. ], tot_loss[loss=0.2877, ctc_loss=0.2065, cr_loss=0.406, over 4075553.42 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:11:12,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=118674.16666666667, ans=0.0 2024-09-14 11:11:24,922 INFO [train.py:1198] (0/2) Epoch 7, batch 3550, loss[loss=0.3175, ctc_loss=0.2258, cr_loss=0.4585, over 20375.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.2074, cr_loss=0.4066, over 4082013.51 frames. ], batch size: 74, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:12:07,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=118759.16666666667, ans=0.0 2024-09-14 11:12:22,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=118787.5, ans=0.125 2024-09-14 11:12:32,699 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.194e+02 2.382e+02 2.801e+02 4.228e+02, threshold=4.763e+02, percent-clipped=0.0 2024-09-14 11:12:40,395 INFO [train.py:1198] (0/2) Epoch 7, batch 3600, loss[loss=0.2823, ctc_loss=0.201, cr_loss=0.4063, over 21001.00 frames. ], tot_loss[loss=0.2897, ctc_loss=0.2081, cr_loss=0.4078, over 4074465.29 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:12:56,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118872.5, ans=0.1 2024-09-14 11:12:56,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-09-14 11:13:47,996 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.98 vs. limit=10.0 2024-09-14 11:13:56,355 INFO [train.py:1198] (0/2) Epoch 7, batch 3650, loss[loss=0.2446, ctc_loss=0.1756, cr_loss=0.3446, over 19864.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2066, cr_loss=0.4071, over 4090814.50 frames. ], batch size: 44, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:14:14,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119014.16666666667, ans=0.1 2024-09-14 11:14:19,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=119014.16666666667, ans=0.2 2024-09-14 11:15:07,271 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.101e+02 2.330e+02 2.666e+02 3.785e+02, threshold=4.661e+02, percent-clipped=0.0 2024-09-14 11:15:14,990 INFO [train.py:1198] (0/2) Epoch 7, batch 3700, loss[loss=0.3168, ctc_loss=0.2299, cr_loss=0.4345, over 20057.00 frames. ], tot_loss[loss=0.2883, ctc_loss=0.2069, cr_loss=0.4068, over 4083486.56 frames. ], batch size: 80, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:15:28,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119155.83333333333, ans=0.1 2024-09-14 11:15:37,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=119155.83333333333, ans=0.125 2024-09-14 11:16:33,725 INFO [train.py:1198] (0/2) Epoch 7, batch 3750, loss[loss=0.308, ctc_loss=0.2225, cr_loss=0.4276, over 20983.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2066, cr_loss=0.4069, over 4089737.27 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:17:15,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=119325.83333333333, ans=0.125 2024-09-14 11:17:23,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=119354.16666666667, ans=0.2 2024-09-14 11:17:28,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=119354.16666666667, ans=0.0 2024-09-14 11:17:41,410 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.097e+02 2.294e+02 2.621e+02 4.657e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-14 11:17:48,795 INFO [train.py:1198] (0/2) Epoch 7, batch 3800, loss[loss=0.2992, ctc_loss=0.2126, cr_loss=0.4329, over 19942.00 frames. ], tot_loss[loss=0.2872, ctc_loss=0.2059, cr_loss=0.4065, over 4096165.27 frames. ], batch size: 80, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:17:50,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=119410.83333333333, ans=0.035 2024-09-14 11:19:04,578 INFO [train.py:1198] (0/2) Epoch 7, batch 3850, loss[loss=0.3339, ctc_loss=0.2432, cr_loss=0.4534, over 18352.00 frames. ], tot_loss[loss=0.2872, ctc_loss=0.2058, cr_loss=0.4072, over 4097898.25 frames. ], batch size: 108, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:19:11,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2024-09-14 11:19:12,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119552.5, ans=0.1 2024-09-14 11:19:36,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=119609.16666666667, ans=0.2 2024-09-14 11:20:16,900 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.149e+02 2.314e+02 2.701e+02 4.576e+02, threshold=4.628e+02, percent-clipped=0.0 2024-09-14 11:20:22,921 INFO [train.py:1198] (0/2) Epoch 7, batch 3900, loss[loss=0.3222, ctc_loss=0.2305, cr_loss=0.4584, over 20693.00 frames. ], tot_loss[loss=0.2874, ctc_loss=0.2059, cr_loss=0.4075, over 4095412.37 frames. ], batch size: 71, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:20:40,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=119722.5, ans=0.125 2024-09-14 11:20:52,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2024-09-14 11:21:02,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=119750.83333333333, ans=0.2 2024-09-14 11:21:39,017 INFO [train.py:1198] (0/2) Epoch 7, batch 3950, loss[loss=0.2924, ctc_loss=0.2102, cr_loss=0.411, over 20993.00 frames. ], tot_loss[loss=0.2872, ctc_loss=0.2059, cr_loss=0.4065, over 4076226.97 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:21:58,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=119864.16666666667, ans=0.125 2024-09-14 11:22:00,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=119864.16666666667, ans=0.125 2024-09-14 11:22:51,313 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.127e+02 2.414e+02 2.679e+02 4.173e+02, threshold=4.827e+02, percent-clipped=0.0 2024-09-14 11:22:53,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2024-09-14 11:22:57,403 INFO [train.py:1198] (0/2) Epoch 7, batch 4000, loss[loss=0.2899, ctc_loss=0.2085, cr_loss=0.4071, over 20787.00 frames. ], tot_loss[loss=0.2894, ctc_loss=0.2078, cr_loss=0.408, over 4055531.31 frames. ], batch size: 53, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:22:59,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=119977.5, ans=0.07 2024-09-14 11:23:19,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=120005.83333333333, ans=0.2 2024-09-14 11:23:22,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=120005.83333333333, ans=0.0 2024-09-14 11:23:29,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=120034.16666666667, ans=0.125 2024-09-14 11:23:51,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=120062.5, ans=0.0 2024-09-14 11:23:57,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=120090.83333333333, ans=0.2 2024-09-14 11:24:13,814 INFO [train.py:1198] (0/2) Epoch 7, batch 4050, loss[loss=0.3279, ctc_loss=0.2323, cr_loss=0.4783, over 20941.00 frames. ], tot_loss[loss=0.2897, ctc_loss=0.2079, cr_loss=0.4089, over 4070720.36 frames. ], batch size: 64, lr: 1.19e-02, grad_scale: 32.0 2024-09-14 11:24:15,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=120119.16666666667, ans=0.0 2024-09-14 11:24:19,485 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2024-09-14 11:24:27,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=120147.5, ans=0.1 2024-09-14 11:25:23,639 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.146e+02 2.354e+02 2.750e+02 3.593e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-14 11:25:29,732 INFO [train.py:1198] (0/2) Epoch 7, batch 4100, loss[loss=0.316, ctc_loss=0.2282, cr_loss=0.439, over 19308.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.2071, cr_loss=0.408, over 4075983.79 frames. ], batch size: 90, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:25:50,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=120289.16666666667, ans=0.125 2024-09-14 11:26:08,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=120317.5, ans=0.025 2024-09-14 11:26:08,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=120317.5, ans=0.125 2024-09-14 11:26:20,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=120345.83333333333, ans=0.125 2024-09-14 11:26:20,624 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:26:47,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=120402.5, ans=0.125 2024-09-14 11:26:49,164 INFO [train.py:1198] (0/2) Epoch 7, batch 4150, loss[loss=0.2897, ctc_loss=0.204, cr_loss=0.4283, over 20935.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.2063, cr_loss=0.4074, over 4089412.22 frames. ], batch size: 60, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:28:01,361 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.172e+02 2.521e+02 2.902e+02 4.553e+02, threshold=5.042e+02, percent-clipped=0.0 2024-09-14 11:28:07,616 INFO [train.py:1198] (0/2) Epoch 7, batch 4200, loss[loss=0.2907, ctc_loss=0.2086, cr_loss=0.4105, over 20885.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2064, cr_loss=0.408, over 4091107.44 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:28:17,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=22.5 2024-09-14 11:28:51,582 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:29:22,774 INFO [train.py:1198] (0/2) Epoch 7, batch 4250, loss[loss=0.3053, ctc_loss=0.2118, cr_loss=0.4674, over 20888.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2064, cr_loss=0.4085, over 4094761.97 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:29:38,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=120714.16666666667, ans=0.125 2024-09-14 11:29:59,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=120742.5, ans=0.125 2024-09-14 11:30:11,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=120770.83333333333, ans=0.125 2024-09-14 11:30:25,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=120799.16666666667, ans=0.07 2024-09-14 11:30:26,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=120799.16666666667, ans=0.1 2024-09-14 11:30:32,283 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.136e+02 2.310e+02 2.620e+02 1.080e+03, threshold=4.620e+02, percent-clipped=1.0 2024-09-14 11:30:38,398 INFO [train.py:1198] (0/2) Epoch 7, batch 4300, loss[loss=0.2706, ctc_loss=0.1903, cr_loss=0.4011, over 20885.00 frames. ], tot_loss[loss=0.2888, ctc_loss=0.2069, cr_loss=0.4095, over 4094483.63 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:31:15,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.52 vs. limit=10.0 2024-09-14 11:31:37,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=120912.5, ans=0.2 2024-09-14 11:31:46,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=120940.83333333333, ans=0.125 2024-09-14 11:31:56,613 INFO [train.py:1198] (0/2) Epoch 7, batch 4350, loss[loss=0.3733, ctc_loss=0.2847, cr_loss=0.4431, over 14377.00 frames. ], tot_loss[loss=0.2886, ctc_loss=0.2067, cr_loss=0.4093, over 4096789.03 frames. ], batch size: 149, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:32:07,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=120969.16666666667, ans=0.125 2024-09-14 11:32:28,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=121025.83333333333, ans=0.125 2024-09-14 11:32:42,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=121054.16666666667, ans=0.0 2024-09-14 11:33:05,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.29 vs. limit=22.5 2024-09-14 11:33:06,118 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.123e+02 2.299e+02 2.704e+02 4.882e+02, threshold=4.598e+02, percent-clipped=1.0 2024-09-14 11:33:12,244 INFO [train.py:1198] (0/2) Epoch 7, batch 4400, loss[loss=0.2541, ctc_loss=0.1788, cr_loss=0.3764, over 20970.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2062, cr_loss=0.4083, over 4098038.59 frames. ], batch size: 51, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:33:25,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.10 vs. limit=10.0 2024-09-14 11:33:30,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=121139.16666666667, ans=0.125 2024-09-14 11:33:32,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=121139.16666666667, ans=0.07 2024-09-14 11:33:41,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=121139.16666666667, ans=0.025 2024-09-14 11:33:52,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=121167.5, ans=0.125 2024-09-14 11:33:52,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=121167.5, ans=0.1 2024-09-14 11:34:06,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.16 vs. limit=12.0 2024-09-14 11:34:22,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=121224.16666666667, ans=0.1 2024-09-14 11:34:30,701 INFO [train.py:1198] (0/2) Epoch 7, batch 4450, loss[loss=0.2605, ctc_loss=0.1818, cr_loss=0.3934, over 20314.00 frames. ], tot_loss[loss=0.2875, ctc_loss=0.2059, cr_loss=0.4081, over 4099708.19 frames. ], batch size: 45, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:34:46,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=121280.83333333333, ans=0.2 2024-09-14 11:34:54,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=121280.83333333333, ans=15.0 2024-09-14 11:34:59,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-09-14 11:35:02,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=121309.16666666667, ans=0.125 2024-09-14 11:35:34,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-09-14 11:35:37,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=22.5 2024-09-14 11:35:39,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-14 11:35:41,477 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.113e+02 2.333e+02 2.634e+02 4.859e+02, threshold=4.665e+02, percent-clipped=1.0 2024-09-14 11:35:46,204 INFO [train.py:1198] (0/2) Epoch 7, batch 4500, loss[loss=0.281, ctc_loss=0.1977, cr_loss=0.4165, over 21057.00 frames. ], tot_loss[loss=0.2867, ctc_loss=0.2053, cr_loss=0.407, over 4095028.16 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:36:42,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=121479.16666666667, ans=0.0 2024-09-14 11:36:55,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=121507.5, ans=0.0 2024-09-14 11:37:01,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.05 vs. limit=10.0 2024-09-14 11:37:02,239 INFO [train.py:1198] (0/2) Epoch 7, batch 4550, loss[loss=0.2741, ctc_loss=0.1962, cr_loss=0.3895, over 20967.00 frames. ], tot_loss[loss=0.2877, ctc_loss=0.2063, cr_loss=0.4073, over 4091277.97 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:37:04,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=121535.83333333333, ans=0.125 2024-09-14 11:38:17,413 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.164e+02 2.393e+02 2.614e+02 3.644e+02, threshold=4.786e+02, percent-clipped=0.0 2024-09-14 11:38:21,879 INFO [train.py:1198] (0/2) Epoch 7, batch 4600, loss[loss=0.2898, ctc_loss=0.208, cr_loss=0.409, over 20289.00 frames. ], tot_loss[loss=0.2877, ctc_loss=0.2062, cr_loss=0.4078, over 4093951.66 frames. ], batch size: 74, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:39:03,684 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=12.0 2024-09-14 11:39:23,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=121762.5, ans=0.125 2024-09-14 11:39:32,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=121790.83333333333, ans=0.0 2024-09-14 11:39:41,031 INFO [train.py:1198] (0/2) Epoch 7, batch 4650, loss[loss=0.3007, ctc_loss=0.2163, cr_loss=0.422, over 21016.00 frames. ], tot_loss[loss=0.2871, ctc_loss=0.2057, cr_loss=0.4071, over 4101079.57 frames. ], batch size: 61, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:39:45,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=121819.16666666667, ans=0.1 2024-09-14 11:40:25,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=121904.16666666667, ans=0.0 2024-09-14 11:40:37,767 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=15.0 2024-09-14 11:40:51,943 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.133e+02 2.347e+02 2.729e+02 3.956e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-14 11:40:56,634 INFO [train.py:1198] (0/2) Epoch 7, batch 4700, loss[loss=0.2797, ctc_loss=0.2006, cr_loss=0.3957, over 20774.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2063, cr_loss=0.408, over 4107031.66 frames. ], batch size: 53, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:41:01,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=121960.83333333333, ans=0.0 2024-09-14 11:41:07,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=121960.83333333333, ans=0.07 2024-09-14 11:42:11,396 INFO [train.py:1198] (0/2) Epoch 7, batch 4750, loss[loss=0.3116, ctc_loss=0.2213, cr_loss=0.4511, over 21024.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2062, cr_loss=0.4084, over 4095401.04 frames. ], batch size: 61, lr: 1.18e-02, grad_scale: 16.0 2024-09-14 11:42:11,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=122102.5, ans=0.025 2024-09-14 11:42:19,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=122102.5, ans=0.0 2024-09-14 11:42:32,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2024-09-14 11:42:34,059 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:42:47,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=122159.16666666667, ans=0.05 2024-09-14 11:42:50,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=122159.16666666667, ans=0.0 2024-09-14 11:43:08,984 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2024-09-14 11:43:21,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2024-09-14 11:43:24,758 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.214e+02 2.404e+02 2.623e+02 6.465e+02, threshold=4.809e+02, percent-clipped=1.0 2024-09-14 11:43:29,338 INFO [train.py:1198] (0/2) Epoch 7, batch 4800, loss[loss=0.2477, ctc_loss=0.1733, cr_loss=0.3718, over 20953.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2064, cr_loss=0.4087, over 4100744.87 frames. ], batch size: 49, lr: 1.18e-02, grad_scale: 32.0 2024-09-14 11:43:33,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=122244.16666666667, ans=15.0 2024-09-14 11:43:37,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=122244.16666666667, ans=0.0 2024-09-14 11:43:51,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.06 vs. limit=10.0 2024-09-14 11:44:04,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=122300.83333333333, ans=0.2 2024-09-14 11:44:07,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=122300.83333333333, ans=0.0 2024-09-14 11:44:18,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=122329.16666666667, ans=0.125 2024-09-14 11:44:26,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=122329.16666666667, ans=0.125 2024-09-14 11:44:48,780 INFO [train.py:1198] (0/2) Epoch 7, batch 4850, loss[loss=0.259, ctc_loss=0.1818, cr_loss=0.3861, over 20998.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2063, cr_loss=0.4079, over 4087664.99 frames. ], batch size: 48, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:44:52,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-14 11:45:13,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2024-09-14 11:45:28,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=122442.5, ans=0.0 2024-09-14 11:45:44,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=122470.83333333333, ans=0.025 2024-09-14 11:45:59,180 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.137e+02 2.416e+02 2.758e+02 3.946e+02, threshold=4.833e+02, percent-clipped=0.0 2024-09-14 11:46:03,545 INFO [train.py:1198] (0/2) Epoch 7, batch 4900, loss[loss=0.2792, ctc_loss=0.1984, cr_loss=0.4039, over 20633.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2063, cr_loss=0.4077, over 4083044.36 frames. ], batch size: 68, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:46:11,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=122527.5, ans=0.0 2024-09-14 11:46:15,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=122527.5, ans=0.125 2024-09-14 11:46:29,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.20 vs. limit=6.0 2024-09-14 11:46:59,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=122612.5, ans=0.2 2024-09-14 11:47:10,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122640.83333333333, ans=0.1 2024-09-14 11:47:17,298 INFO [train.py:1198] (0/2) Epoch 7, batch 4950, loss[loss=0.2678, ctc_loss=0.1924, cr_loss=0.3773, over 21031.00 frames. ], tot_loss[loss=0.2867, ctc_loss=0.2055, cr_loss=0.4062, over 4088868.89 frames. ], batch size: 62, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:47:19,546 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-14 11:47:46,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=122725.83333333333, ans=0.2 2024-09-14 11:47:50,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=122725.83333333333, ans=0.125 2024-09-14 11:47:54,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=122725.83333333333, ans=0.125 2024-09-14 11:48:14,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=122754.16666666667, ans=0.05 2024-09-14 11:48:16,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=122782.5, ans=0.125 2024-09-14 11:48:27,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.669e+02 2.097e+02 2.353e+02 2.641e+02 7.950e+02, threshold=4.707e+02, percent-clipped=1.0 2024-09-14 11:48:29,876 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.86 vs. limit=22.5 2024-09-14 11:48:32,122 INFO [train.py:1198] (0/2) Epoch 7, batch 5000, loss[loss=0.283, ctc_loss=0.2004, cr_loss=0.4129, over 20865.00 frames. ], tot_loss[loss=0.2865, ctc_loss=0.2053, cr_loss=0.406, over 4084868.74 frames. ], batch size: 65, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:48:36,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2024-09-14 11:49:11,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=122867.5, ans=0.125 2024-09-14 11:49:29,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=122895.83333333333, ans=0.0 2024-09-14 11:49:46,399 INFO [train.py:1198] (0/2) Epoch 7, batch 5050, loss[loss=0.2311, ctc_loss=0.1623, cr_loss=0.3439, over 20968.00 frames. ], tot_loss[loss=0.2848, ctc_loss=0.2039, cr_loss=0.4048, over 4085890.82 frames. ], batch size: 52, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:49:46,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-14 11:50:43,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=123037.5, ans=0.2 2024-09-14 11:50:48,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=15.0 2024-09-14 11:50:56,423 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.133e+02 2.354e+02 2.967e+02 5.718e+02, threshold=4.709e+02, percent-clipped=3.0 2024-09-14 11:51:00,818 INFO [train.py:1198] (0/2) Epoch 7, batch 5100, loss[loss=0.2732, ctc_loss=0.1893, cr_loss=0.4194, over 21023.00 frames. ], tot_loss[loss=0.2856, ctc_loss=0.2045, cr_loss=0.4053, over 4076746.78 frames. ], batch size: 61, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:51:07,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=123094.16666666667, ans=0.025 2024-09-14 11:51:42,548 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:52:18,032 INFO [train.py:1198] (0/2) Epoch 7, batch 5150, loss[loss=0.3234, ctc_loss=0.2298, cr_loss=0.4679, over 21062.00 frames. ], tot_loss[loss=0.2851, ctc_loss=0.2041, cr_loss=0.405, over 4086746.36 frames. ], batch size: 62, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:52:24,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123235.83333333333, ans=0.1 2024-09-14 11:52:31,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=123264.16666666667, ans=0.0 2024-09-14 11:52:36,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=123264.16666666667, ans=0.125 2024-09-14 11:53:16,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=123349.16666666667, ans=0.0 2024-09-14 11:53:27,492 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.173e+02 2.363e+02 2.756e+02 5.802e+02, threshold=4.725e+02, percent-clipped=3.0 2024-09-14 11:53:32,042 INFO [train.py:1198] (0/2) Epoch 7, batch 5200, loss[loss=0.3067, ctc_loss=0.2186, cr_loss=0.4408, over 20982.00 frames. ], tot_loss[loss=0.2865, ctc_loss=0.2052, cr_loss=0.4063, over 4084275.94 frames. ], batch size: 58, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:54:14,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=123434.16666666667, ans=0.125 2024-09-14 11:54:30,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2024-09-14 11:54:37,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=12.0 2024-09-14 11:54:48,576 INFO [train.py:1198] (0/2) Epoch 7, batch 5250, loss[loss=0.2964, ctc_loss=0.2129, cr_loss=0.4178, over 20816.00 frames. ], tot_loss[loss=0.2867, ctc_loss=0.2053, cr_loss=0.4073, over 4080911.87 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:55:17,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=123575.83333333333, ans=0.125 2024-09-14 11:55:21,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=123575.83333333333, ans=0.2 2024-09-14 11:55:30,742 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:55:57,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123632.5, ans=0.1 2024-09-14 11:55:58,540 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.108e+02 2.293e+02 2.539e+02 4.247e+02, threshold=4.585e+02, percent-clipped=0.0 2024-09-14 11:56:02,962 INFO [train.py:1198] (0/2) Epoch 7, batch 5300, loss[loss=0.2861, ctc_loss=0.2015, cr_loss=0.4234, over 21072.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2063, cr_loss=0.4088, over 4078457.42 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:56:35,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=123717.5, ans=0.025 2024-09-14 11:57:11,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=123774.16666666667, ans=0.025 2024-09-14 11:57:17,110 INFO [train.py:1198] (0/2) Epoch 7, batch 5350, loss[loss=0.2826, ctc_loss=0.2023, cr_loss=0.4012, over 20982.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.2069, cr_loss=0.4091, over 4069582.78 frames. ], batch size: 58, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:57:39,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=123830.83333333333, ans=0.125 2024-09-14 11:58:25,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=123915.83333333333, ans=0.125 2024-09-14 11:58:26,679 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.218e+02 2.456e+02 2.819e+02 4.306e+02, threshold=4.913e+02, percent-clipped=0.0 2024-09-14 11:58:31,055 INFO [train.py:1198] (0/2) Epoch 7, batch 5400, loss[loss=0.3179, ctc_loss=0.2308, cr_loss=0.4355, over 19433.00 frames. ], tot_loss[loss=0.2886, ctc_loss=0.2068, cr_loss=0.409, over 4080821.60 frames. ], batch size: 90, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:58:41,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=123944.16666666667, ans=0.125 2024-09-14 11:59:05,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=124000.83333333333, ans=10.0 2024-09-14 11:59:11,432 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 11:59:17,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=124029.16666666667, ans=0.125 2024-09-14 11:59:18,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=124029.16666666667, ans=0.5 2024-09-14 11:59:36,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=124057.5, ans=0.0 2024-09-14 11:59:41,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=12.0 2024-09-14 11:59:44,991 INFO [train.py:1198] (0/2) Epoch 7, batch 5450, loss[loss=0.3186, ctc_loss=0.2332, cr_loss=0.4271, over 20864.00 frames. ], tot_loss[loss=0.2895, ctc_loss=0.2074, cr_loss=0.4105, over 4084044.52 frames. ], batch size: 65, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 11:59:47,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-14 11:59:58,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=124114.16666666667, ans=0.0 2024-09-14 12:00:33,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=124170.83333333333, ans=0.125 2024-09-14 12:00:35,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=124170.83333333333, ans=0.09899494936611666 2024-09-14 12:00:39,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=124170.83333333333, ans=0.125 2024-09-14 12:00:57,290 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.716e+02 2.125e+02 2.257e+02 2.638e+02 4.621e+02, threshold=4.514e+02, percent-clipped=0.0 2024-09-14 12:01:01,672 INFO [train.py:1198] (0/2) Epoch 7, batch 5500, loss[loss=0.298, ctc_loss=0.2133, cr_loss=0.4234, over 20827.00 frames. ], tot_loss[loss=0.2875, ctc_loss=0.2056, cr_loss=0.4091, over 4099833.60 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 12:01:07,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=124227.5, ans=0.125 2024-09-14 12:01:12,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=124227.5, ans=10.0 2024-09-14 12:01:31,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=124284.16666666667, ans=0.125 2024-09-14 12:02:10,265 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.99 vs. limit=5.0 2024-09-14 12:02:10,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=124340.83333333333, ans=0.125 2024-09-14 12:02:16,241 INFO [train.py:1198] (0/2) Epoch 7, batch 5550, loss[loss=0.319, ctc_loss=0.2286, cr_loss=0.4522, over 17997.00 frames. ], tot_loss[loss=0.2864, ctc_loss=0.2048, cr_loss=0.4083, over 4108615.66 frames. ], batch size: 108, lr: 1.17e-02, grad_scale: 32.0 2024-09-14 12:02:31,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=124397.5, ans=0.125 2024-09-14 12:02:34,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=124397.5, ans=0.1 2024-09-14 12:03:21,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=124482.5, ans=0.0 2024-09-14 12:03:30,123 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.094e+02 2.256e+02 2.546e+02 3.746e+02, threshold=4.513e+02, percent-clipped=0.0 2024-09-14 12:03:33,029 INFO [train.py:1198] (0/2) Epoch 7, batch 5600, loss[loss=0.3168, ctc_loss=0.2291, cr_loss=0.4385, over 20671.00 frames. ], tot_loss[loss=0.2851, ctc_loss=0.2038, cr_loss=0.4063, over 4101252.68 frames. ], batch size: 66, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:03:38,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.95 vs. limit=15.0 2024-09-14 12:03:58,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=124539.16666666667, ans=0.125 2024-09-14 12:04:02,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=124567.5, ans=0.0 2024-09-14 12:04:02,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=124567.5, ans=0.125 2024-09-14 12:04:20,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=124595.83333333333, ans=0.2 2024-09-14 12:04:29,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.11 vs. limit=10.0 2024-09-14 12:04:30,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=124624.16666666667, ans=0.2 2024-09-14 12:04:45,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=124652.5, ans=0.0 2024-09-14 12:04:46,691 INFO [train.py:1198] (0/2) Epoch 7, batch 5650, loss[loss=0.2634, ctc_loss=0.187, cr_loss=0.3819, over 21068.00 frames. ], tot_loss[loss=0.2838, ctc_loss=0.2031, cr_loss=0.4039, over 4101202.00 frames. ], batch size: 53, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:04:52,940 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-44000.pt 2024-09-14 12:04:58,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=12.0 2024-09-14 12:05:17,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=124709.16666666667, ans=0.025 2024-09-14 12:05:58,341 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.077e+02 2.282e+02 2.474e+02 3.865e+02, threshold=4.564e+02, percent-clipped=0.0 2024-09-14 12:06:01,323 INFO [train.py:1198] (0/2) Epoch 7, batch 5700, loss[loss=0.2557, ctc_loss=0.1789, cr_loss=0.3837, over 20776.00 frames. ], tot_loss[loss=0.2853, ctc_loss=0.204, cr_loss=0.4064, over 4103662.86 frames. ], batch size: 53, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:06:15,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=124822.5, ans=0.0 2024-09-14 12:06:25,525 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 12:07:15,887 INFO [train.py:1198] (0/2) Epoch 7, batch 5750, loss[loss=0.3675, ctc_loss=0.2777, cr_loss=0.4489, over 14654.00 frames. ], tot_loss[loss=0.286, ctc_loss=0.2046, cr_loss=0.4069, over 4105280.44 frames. ], batch size: 150, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:07:21,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.59 vs. limit=22.5 2024-09-14 12:07:33,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=124964.16666666667, ans=0.125 2024-09-14 12:08:12,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=125020.83333333333, ans=0.2 2024-09-14 12:08:12,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-09-14 12:08:27,135 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.111e+02 2.305e+02 2.520e+02 6.733e+02, threshold=4.611e+02, percent-clipped=1.0 2024-09-14 12:08:29,921 INFO [train.py:1198] (0/2) Epoch 7, batch 5800, loss[loss=0.3254, ctc_loss=0.2316, cr_loss=0.4691, over 20091.00 frames. ], tot_loss[loss=0.2851, ctc_loss=0.2038, cr_loss=0.4062, over 4112706.20 frames. ], batch size: 80, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:09:09,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=125134.16666666667, ans=0.125 2024-09-14 12:09:46,417 INFO [train.py:1198] (0/2) Epoch 7, batch 5850, loss[loss=0.2973, ctc_loss=0.2165, cr_loss=0.404, over 19486.00 frames. ], tot_loss[loss=0.2853, ctc_loss=0.204, cr_loss=0.4068, over 4114614.41 frames. ], batch size: 90, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:10:13,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=125247.5, ans=0.2 2024-09-14 12:10:25,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.28 vs. limit=15.0 2024-09-14 12:10:49,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-09-14 12:10:57,474 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.185e+02 2.478e+02 2.800e+02 3.985e+02, threshold=4.957e+02, percent-clipped=0.0 2024-09-14 12:11:00,604 INFO [train.py:1198] (0/2) Epoch 7, batch 5900, loss[loss=0.3281, ctc_loss=0.2348, cr_loss=0.4665, over 20669.00 frames. ], tot_loss[loss=0.2847, ctc_loss=0.2036, cr_loss=0.4056, over 4103479.46 frames. ], batch size: 68, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:11:15,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125360.83333333333, ans=0.1 2024-09-14 12:11:46,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=125445.83333333333, ans=0.05 2024-09-14 12:12:17,237 INFO [train.py:1198] (0/2) Epoch 7, batch 5950, loss[loss=0.3042, ctc_loss=0.2163, cr_loss=0.4394, over 20645.00 frames. ], tot_loss[loss=0.2858, ctc_loss=0.2044, cr_loss=0.4072, over 4115149.58 frames. ], batch size: 66, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:12:41,951 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.33 vs. limit=15.0 2024-09-14 12:12:43,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2024-09-14 12:12:51,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=125559.16666666667, ans=0.0 2024-09-14 12:12:55,071 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.46 vs. limit=10.0 2024-09-14 12:12:56,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2024-09-14 12:12:56,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-09-14 12:13:05,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=125587.5, ans=0.025 2024-09-14 12:13:09,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=125587.5, ans=0.125 2024-09-14 12:13:28,738 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.117e+02 2.312e+02 2.797e+02 7.411e+02, threshold=4.624e+02, percent-clipped=1.0 2024-09-14 12:13:31,598 INFO [train.py:1198] (0/2) Epoch 7, batch 6000, loss[loss=0.2654, ctc_loss=0.1934, cr_loss=0.36, over 19884.00 frames. ], tot_loss[loss=0.2849, ctc_loss=0.2037, cr_loss=0.406, over 4120951.37 frames. ], batch size: 44, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:13:31,598 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 12:13:53,196 INFO [train.py:1230] (0/2) Epoch 7, validation: loss=0.06086, ctc_loss=0.06086, cr_loss=9.403e-15, over 944034.00 frames. 2024-09-14 12:13:53,196 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 12:14:08,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=125672.5, ans=0.125 2024-09-14 12:15:08,840 INFO [train.py:1198] (0/2) Epoch 7, batch 6050, loss[loss=0.2923, ctc_loss=0.21, cr_loss=0.4117, over 20835.00 frames. ], tot_loss[loss=0.2843, ctc_loss=0.2032, cr_loss=0.4053, over 4117979.03 frames. ], batch size: 59, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:15:28,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=125814.16666666667, ans=0.125 2024-09-14 12:15:52,460 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.06 vs. limit=10.0 2024-09-14 12:16:04,015 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-14 12:16:19,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.109e+02 2.341e+02 2.718e+02 4.092e+02, threshold=4.682e+02, percent-clipped=0.0 2024-09-14 12:16:22,626 INFO [train.py:1198] (0/2) Epoch 7, batch 6100, loss[loss=0.2702, ctc_loss=0.1912, cr_loss=0.3953, over 20978.00 frames. ], tot_loss[loss=0.2842, ctc_loss=0.2031, cr_loss=0.4055, over 4118816.73 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:17:06,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=126012.5, ans=0.1 2024-09-14 12:17:32,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126040.83333333333, ans=0.1 2024-09-14 12:17:36,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=126069.16666666667, ans=0.125 2024-09-14 12:17:38,035 INFO [train.py:1198] (0/2) Epoch 7, batch 6150, loss[loss=0.2505, ctc_loss=0.1743, cr_loss=0.3811, over 20988.00 frames. ], tot_loss[loss=0.2854, ctc_loss=0.2042, cr_loss=0.4059, over 4097341.40 frames. ], batch size: 51, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:17:57,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=126097.5, ans=0.125 2024-09-14 12:18:19,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=126125.83333333333, ans=0.025 2024-09-14 12:18:49,066 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.159e+02 2.331e+02 2.631e+02 5.856e+02, threshold=4.662e+02, percent-clipped=1.0 2024-09-14 12:18:52,014 INFO [train.py:1198] (0/2) Epoch 7, batch 6200, loss[loss=0.2912, ctc_loss=0.2072, cr_loss=0.4197, over 20967.00 frames. ], tot_loss[loss=0.2877, ctc_loss=0.206, cr_loss=0.4085, over 4100746.83 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:18:59,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=126210.83333333333, ans=0.0 2024-09-14 12:19:14,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=126239.16666666667, ans=0.125 2024-09-14 12:20:04,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=126352.5, ans=0.0 2024-09-14 12:20:06,111 INFO [train.py:1198] (0/2) Epoch 7, batch 6250, loss[loss=0.3206, ctc_loss=0.2302, cr_loss=0.452, over 20854.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.207, cr_loss=0.4088, over 4074876.79 frames. ], batch size: 65, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:20:15,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=126352.5, ans=0.2 2024-09-14 12:20:21,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126380.83333333333, ans=0.1 2024-09-14 12:20:36,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=126409.16666666667, ans=0.125 2024-09-14 12:21:17,119 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.181e+02 2.370e+02 2.742e+02 4.115e+02, threshold=4.741e+02, percent-clipped=0.0 2024-09-14 12:21:20,027 INFO [train.py:1198] (0/2) Epoch 7, batch 6300, loss[loss=0.3543, ctc_loss=0.2694, cr_loss=0.4243, over 14701.00 frames. ], tot_loss[loss=0.2924, ctc_loss=0.2105, cr_loss=0.4095, over 4007346.48 frames. ], batch size: 149, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:21:26,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=126494.16666666667, ans=0.125 2024-09-14 12:21:30,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=126494.16666666667, ans=0.125 2024-09-14 12:21:45,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=126522.5, ans=0.025 2024-09-14 12:21:48,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=126550.83333333333, ans=0.2 2024-09-14 12:22:27,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.57 vs. limit=15.0 2024-09-14 12:22:30,906 INFO [train.py:1198] (0/2) Epoch 7, batch 6350, loss[loss=0.3483, ctc_loss=0.2607, cr_loss=0.438, over 18126.00 frames. ], tot_loss[loss=0.2983, ctc_loss=0.2157, cr_loss=0.4131, over 3912072.76 frames. ], batch size: 108, lr: 1.16e-02, grad_scale: 32.0 2024-09-14 12:22:31,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=126635.83333333333, ans=0.125 2024-09-14 12:22:36,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=126635.83333333333, ans=0.125 2024-09-14 12:23:28,797 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-7.pt 2024-09-14 12:24:19,566 INFO [train.py:1198] (0/2) Epoch 8, batch 0, loss[loss=0.3077, ctc_loss=0.2275, cr_loss=0.4007, over 21012.00 frames. ], tot_loss[loss=0.3077, ctc_loss=0.2275, cr_loss=0.4007, over 21012.00 frames. ], batch size: 63, lr: 1.09e-02, grad_scale: 32.0 2024-09-14 12:24:19,567 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 12:24:37,914 INFO [train.py:1230] (0/2) Epoch 8, validation: loss=0.06254, ctc_loss=0.06254, cr_loss=8.928e-15, over 944034.00 frames. 2024-09-14 12:24:37,915 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 12:24:47,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-09-14 12:24:48,207 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.280e+02 2.532e+02 2.915e+02 3.914e+02, threshold=5.064e+02, percent-clipped=0.0 2024-09-14 12:25:36,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.03 vs. limit=10.0 2024-09-14 12:25:39,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=126865.33333333333, ans=0.0 2024-09-14 12:25:52,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=126865.33333333333, ans=0.125 2024-09-14 12:25:55,314 INFO [train.py:1198] (0/2) Epoch 8, batch 50, loss[loss=0.3109, ctc_loss=0.2235, cr_loss=0.437, over 20694.00 frames. ], tot_loss[loss=0.295, ctc_loss=0.2122, cr_loss=0.4139, over 922214.47 frames. ], batch size: 71, lr: 1.09e-02, grad_scale: 32.0 2024-09-14 12:26:07,713 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 12:26:09,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=22.5 2024-09-14 12:26:29,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2024-09-14 12:27:11,371 INFO [train.py:1198] (0/2) Epoch 8, batch 100, loss[loss=0.3168, ctc_loss=0.2276, cr_loss=0.4461, over 20980.00 frames. ], tot_loss[loss=0.2889, ctc_loss=0.207, cr_loss=0.4093, over 1628673.93 frames. ], batch size: 64, lr: 1.09e-02, grad_scale: 16.0 2024-09-14 12:27:11,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=127035.33333333333, ans=0.1 2024-09-14 12:27:23,367 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.092e+02 2.283e+02 2.878e+02 5.109e+02, threshold=4.566e+02, percent-clipped=1.0 2024-09-14 12:27:46,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=127092.0, ans=0.125 2024-09-14 12:28:06,683 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.81 vs. limit=15.0 2024-09-14 12:28:26,960 INFO [train.py:1198] (0/2) Epoch 8, batch 150, loss[loss=0.2905, ctc_loss=0.2049, cr_loss=0.4279, over 20799.00 frames. ], tot_loss[loss=0.2863, ctc_loss=0.2048, cr_loss=0.4079, over 2181605.77 frames. ], batch size: 53, lr: 1.09e-02, grad_scale: 16.0 2024-09-14 12:29:05,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=127233.66666666667, ans=0.5 2024-09-14 12:29:37,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=127290.33333333333, ans=0.2 2024-09-14 12:29:38,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=127290.33333333333, ans=0.0 2024-09-14 12:29:46,025 INFO [train.py:1198] (0/2) Epoch 8, batch 200, loss[loss=0.3716, ctc_loss=0.2772, cr_loss=0.4725, over 15019.00 frames. ], tot_loss[loss=0.2859, ctc_loss=0.2042, cr_loss=0.4086, over 2604466.75 frames. ], batch size: 149, lr: 1.09e-02, grad_scale: 16.0 2024-09-14 12:29:58,039 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.081e+02 2.253e+02 2.626e+02 5.255e+02, threshold=4.506e+02, percent-clipped=1.0 2024-09-14 12:30:09,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=127347.0, ans=0.1 2024-09-14 12:30:28,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=127375.33333333333, ans=0.0 2024-09-14 12:30:51,780 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2024-09-14 12:31:00,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-09-14 12:31:05,149 INFO [train.py:1198] (0/2) Epoch 8, batch 250, loss[loss=0.3154, ctc_loss=0.2301, cr_loss=0.4265, over 20657.00 frames. ], tot_loss[loss=0.2851, ctc_loss=0.2036, cr_loss=0.4076, over 2931805.05 frames. ], batch size: 66, lr: 1.08e-02, grad_scale: 16.0 2024-09-14 12:31:38,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=127517.0, ans=0.2 2024-09-14 12:32:03,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2024-09-14 12:32:19,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=127602.0, ans=0.125 2024-09-14 12:32:20,665 INFO [train.py:1198] (0/2) Epoch 8, batch 300, loss[loss=0.2846, ctc_loss=0.2028, cr_loss=0.409, over 21004.00 frames. ], tot_loss[loss=0.283, ctc_loss=0.2018, cr_loss=0.406, over 3199870.98 frames. ], batch size: 61, lr: 1.08e-02, grad_scale: 16.0 2024-09-14 12:32:28,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=127602.0, ans=0.025 2024-09-14 12:32:32,489 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.046e+02 2.173e+02 2.391e+02 4.775e+02, threshold=4.346e+02, percent-clipped=1.0 2024-09-14 12:32:43,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=22.5 2024-09-14 12:33:13,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=127687.0, ans=0.025 2024-09-14 12:33:15,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=127687.0, ans=0.2 2024-09-14 12:33:35,994 INFO [train.py:1198] (0/2) Epoch 8, batch 350, loss[loss=0.3603, ctc_loss=0.2727, cr_loss=0.4379, over 13921.00 frames. ], tot_loss[loss=0.2837, ctc_loss=0.2024, cr_loss=0.4064, over 3396477.82 frames. ], batch size: 149, lr: 1.08e-02, grad_scale: 16.0 2024-09-14 12:34:22,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=127800.33333333333, ans=0.125 2024-09-14 12:34:25,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=127828.66666666667, ans=0.0 2024-09-14 12:34:34,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=127828.66666666667, ans=0.0 2024-09-14 12:34:55,075 INFO [train.py:1198] (0/2) Epoch 8, batch 400, loss[loss=0.3315, ctc_loss=0.2409, cr_loss=0.4526, over 18528.00 frames. ], tot_loss[loss=0.2861, ctc_loss=0.2045, cr_loss=0.4078, over 3542263.25 frames. ], batch size: 108, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:35:04,426 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 12:35:07,247 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.102e+02 2.355e+02 2.791e+02 4.628e+02, threshold=4.709e+02, percent-clipped=1.0 2024-09-14 12:35:15,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=127913.66666666667, ans=0.125 2024-09-14 12:35:19,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=127913.66666666667, ans=0.125 2024-09-14 12:35:41,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=127970.33333333333, ans=0.125 2024-09-14 12:36:11,263 INFO [train.py:1198] (0/2) Epoch 8, batch 450, loss[loss=0.3002, ctc_loss=0.2149, cr_loss=0.4266, over 19500.00 frames. ], tot_loss[loss=0.2843, ctc_loss=0.2031, cr_loss=0.4059, over 3664068.88 frames. ], batch size: 90, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:36:32,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=128055.33333333333, ans=0.5 2024-09-14 12:36:34,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=128055.33333333333, ans=0.0 2024-09-14 12:37:30,546 INFO [train.py:1198] (0/2) Epoch 8, batch 500, loss[loss=0.2765, ctc_loss=0.1998, cr_loss=0.3834, over 21025.00 frames. ], tot_loss[loss=0.2842, ctc_loss=0.203, cr_loss=0.4063, over 3765290.14 frames. ], batch size: 61, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:37:32,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=128168.66666666667, ans=0.125 2024-09-14 12:37:35,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128168.66666666667, ans=0.1 2024-09-14 12:37:42,383 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.135e+02 2.293e+02 2.573e+02 3.481e+02, threshold=4.587e+02, percent-clipped=0.0 2024-09-14 12:38:00,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=128225.33333333333, ans=0.125 2024-09-14 12:38:31,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=128282.0, ans=0.0 2024-09-14 12:38:46,163 INFO [train.py:1198] (0/2) Epoch 8, batch 550, loss[loss=0.2908, ctc_loss=0.2097, cr_loss=0.4057, over 20990.00 frames. ], tot_loss[loss=0.2835, ctc_loss=0.2023, cr_loss=0.4059, over 3837779.43 frames. ], batch size: 64, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:39:09,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=128338.66666666667, ans=0.125 2024-09-14 12:39:57,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=128423.66666666667, ans=0.0 2024-09-14 12:40:05,085 INFO [train.py:1198] (0/2) Epoch 8, batch 600, loss[loss=0.2849, ctc_loss=0.202, cr_loss=0.4145, over 21073.00 frames. ], tot_loss[loss=0.2842, ctc_loss=0.2028, cr_loss=0.4071, over 3897740.57 frames. ], batch size: 59, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:40:17,347 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.661e+02 2.068e+02 2.286e+02 2.632e+02 3.343e+02, threshold=4.573e+02, percent-clipped=0.0 2024-09-14 12:40:51,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=128537.0, ans=0.1 2024-09-14 12:40:59,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=128537.0, ans=0.125 2024-09-14 12:41:19,952 INFO [train.py:1198] (0/2) Epoch 8, batch 650, loss[loss=0.2817, ctc_loss=0.2009, cr_loss=0.4044, over 21071.00 frames. ], tot_loss[loss=0.2842, ctc_loss=0.2029, cr_loss=0.4069, over 3942709.93 frames. ], batch size: 59, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:42:25,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=128707.0, ans=0.125 2024-09-14 12:42:38,727 INFO [train.py:1198] (0/2) Epoch 8, batch 700, loss[loss=0.2415, ctc_loss=0.1703, cr_loss=0.3558, over 21025.00 frames. ], tot_loss[loss=0.2846, ctc_loss=0.2031, cr_loss=0.4076, over 3976392.08 frames. ], batch size: 52, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:42:43,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=128735.33333333333, ans=0.125 2024-09-14 12:42:50,662 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.090e+02 2.311e+02 2.574e+02 3.363e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-14 12:42:51,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=128735.33333333333, ans=0.0 2024-09-14 12:43:00,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=128763.66666666667, ans=0.0 2024-09-14 12:43:04,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=128763.66666666667, ans=0.025 2024-09-14 12:43:08,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.87 vs. limit=10.0 2024-09-14 12:43:21,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=128792.0, ans=0.025 2024-09-14 12:43:45,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=128848.66666666667, ans=0.2 2024-09-14 12:43:54,522 INFO [train.py:1198] (0/2) Epoch 8, batch 750, loss[loss=0.2299, ctc_loss=0.1602, cr_loss=0.3487, over 20976.00 frames. ], tot_loss[loss=0.2828, ctc_loss=0.2017, cr_loss=0.4052, over 4003540.54 frames. ], batch size: 51, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:44:10,015 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=12.0 2024-09-14 12:44:25,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=22.5 2024-09-14 12:44:39,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=128962.0, ans=0.0 2024-09-14 12:45:09,422 INFO [train.py:1198] (0/2) Epoch 8, batch 800, loss[loss=0.2892, ctc_loss=0.211, cr_loss=0.3909, over 18241.00 frames. ], tot_loss[loss=0.2837, ctc_loss=0.2024, cr_loss=0.4067, over 4026219.56 frames. ], batch size: 108, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:45:09,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129018.66666666667, ans=0.1 2024-09-14 12:45:21,417 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.080e+02 2.241e+02 2.579e+02 3.337e+02, threshold=4.481e+02, percent-clipped=0.0 2024-09-14 12:46:06,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=129103.66666666667, ans=0.2 2024-09-14 12:46:09,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=129103.66666666667, ans=0.0 2024-09-14 12:46:27,503 INFO [train.py:1198] (0/2) Epoch 8, batch 850, loss[loss=0.2615, ctc_loss=0.1858, cr_loss=0.3785, over 20934.00 frames. ], tot_loss[loss=0.2843, ctc_loss=0.2028, cr_loss=0.4075, over 4035251.07 frames. ], batch size: 60, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:46:44,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=129188.66666666667, ans=0.125 2024-09-14 12:46:50,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=129188.66666666667, ans=0.125 2024-09-14 12:46:53,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=129188.66666666667, ans=0.125 2024-09-14 12:47:03,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.54 vs. limit=15.0 2024-09-14 12:47:11,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=129245.33333333333, ans=0.125 2024-09-14 12:47:11,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=129245.33333333333, ans=0.125 2024-09-14 12:47:21,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2024-09-14 12:47:29,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=129273.66666666667, ans=0.5 2024-09-14 12:47:38,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=129273.66666666667, ans=0.125 2024-09-14 12:47:43,062 INFO [train.py:1198] (0/2) Epoch 8, batch 900, loss[loss=0.2882, ctc_loss=0.2061, cr_loss=0.4102, over 21018.00 frames. ], tot_loss[loss=0.2837, ctc_loss=0.2023, cr_loss=0.407, over 4054491.07 frames. ], batch size: 63, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:47:58,278 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.030e+02 2.200e+02 2.387e+02 3.357e+02, threshold=4.400e+02, percent-clipped=0.0 2024-09-14 12:48:10,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=129330.33333333333, ans=0.125 2024-09-14 12:48:16,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=129358.66666666667, ans=0.125 2024-09-14 12:48:19,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=129358.66666666667, ans=0.125 2024-09-14 12:48:32,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.58 vs. limit=22.5 2024-09-14 12:48:38,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=129387.0, ans=0.125 2024-09-14 12:48:47,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129415.33333333333, ans=0.1 2024-09-14 12:48:50,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=129415.33333333333, ans=0.125 2024-09-14 12:49:01,993 INFO [train.py:1198] (0/2) Epoch 8, batch 950, loss[loss=0.286, ctc_loss=0.2049, cr_loss=0.4054, over 20672.00 frames. ], tot_loss[loss=0.2839, ctc_loss=0.2025, cr_loss=0.4071, over 4059359.89 frames. ], batch size: 71, lr: 1.08e-02, grad_scale: 32.0 2024-09-14 12:49:17,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=129472.0, ans=0.0 2024-09-14 12:49:27,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=129472.0, ans=0.0 2024-09-14 12:49:37,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=129500.33333333333, ans=0.025 2024-09-14 12:49:54,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2024-09-14 12:50:02,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=129557.0, ans=0.125 2024-09-14 12:50:04,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=129557.0, ans=0.0 2024-09-14 12:50:12,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=129557.0, ans=0.025 2024-09-14 12:50:17,675 INFO [train.py:1198] (0/2) Epoch 8, batch 1000, loss[loss=0.305, ctc_loss=0.2201, cr_loss=0.4243, over 21041.00 frames. ], tot_loss[loss=0.282, ctc_loss=0.201, cr_loss=0.4053, over 4084081.80 frames. ], batch size: 62, lr: 1.08e-02, grad_scale: 16.0 2024-09-14 12:50:18,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=129585.33333333333, ans=0.2 2024-09-14 12:50:30,945 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.115e+02 2.257e+02 2.479e+02 3.705e+02, threshold=4.515e+02, percent-clipped=0.0 2024-09-14 12:50:35,842 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2024-09-14 12:50:49,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129642.0, ans=0.1 2024-09-14 12:50:58,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=129642.0, ans=0.2 2024-09-14 12:51:06,429 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 12:51:25,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129698.66666666667, ans=0.1 2024-09-14 12:51:26,458 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=22.5 2024-09-14 12:51:35,667 INFO [train.py:1198] (0/2) Epoch 8, batch 1050, loss[loss=0.294, ctc_loss=0.2106, cr_loss=0.4167, over 20976.00 frames. ], tot_loss[loss=0.2832, ctc_loss=0.202, cr_loss=0.4062, over 4073904.89 frames. ], batch size: 58, lr: 1.08e-02, grad_scale: 16.0 2024-09-14 12:51:46,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=129727.0, ans=0.0 2024-09-14 12:52:06,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-09-14 12:52:27,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=129812.0, ans=0.125 2024-09-14 12:52:31,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=129812.0, ans=0.0 2024-09-14 12:52:41,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.76 vs. limit=10.0 2024-09-14 12:52:49,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=129840.33333333333, ans=0.125 2024-09-14 12:52:51,758 INFO [train.py:1198] (0/2) Epoch 8, batch 1100, loss[loss=0.309, ctc_loss=0.2216, cr_loss=0.4367, over 20665.00 frames. ], tot_loss[loss=0.2831, ctc_loss=0.202, cr_loss=0.4055, over 4068601.82 frames. ], batch size: 68, lr: 1.07e-02, grad_scale: 16.0 2024-09-14 12:52:54,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=129868.66666666667, ans=0.0 2024-09-14 12:53:01,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=22.5 2024-09-14 12:53:04,975 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.164e+02 2.395e+02 2.871e+02 6.856e+02, threshold=4.791e+02, percent-clipped=2.0 2024-09-14 12:53:14,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=129897.0, ans=0.125 2024-09-14 12:53:20,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129925.33333333333, ans=0.1 2024-09-14 12:53:46,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=129953.66666666667, ans=0.2 2024-09-14 12:53:46,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=129953.66666666667, ans=0.125 2024-09-14 12:54:00,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=129982.0, ans=0.125 2024-09-14 12:54:04,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=129982.0, ans=0.125 2024-09-14 12:54:10,440 INFO [train.py:1198] (0/2) Epoch 8, batch 1150, loss[loss=0.2645, ctc_loss=0.1878, cr_loss=0.3833, over 20979.00 frames. ], tot_loss[loss=0.2841, ctc_loss=0.2029, cr_loss=0.4062, over 4062380.38 frames. ], batch size: 51, lr: 1.07e-02, grad_scale: 16.0 2024-09-14 12:54:35,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.26 vs. limit=22.5 2024-09-14 12:55:26,046 INFO [train.py:1198] (0/2) Epoch 8, batch 1200, loss[loss=0.2751, ctc_loss=0.196, cr_loss=0.3953, over 20963.00 frames. ], tot_loss[loss=0.2834, ctc_loss=0.2023, cr_loss=0.4053, over 4067335.26 frames. ], batch size: 55, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 12:55:39,749 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.063e+02 2.289e+02 2.639e+02 4.616e+02, threshold=4.577e+02, percent-clipped=0.0 2024-09-14 12:56:08,818 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 12:56:42,063 INFO [train.py:1198] (0/2) Epoch 8, batch 1250, loss[loss=0.2945, ctc_loss=0.2115, cr_loss=0.4152, over 20927.00 frames. ], tot_loss[loss=0.2833, ctc_loss=0.2021, cr_loss=0.4057, over 4079930.86 frames. ], batch size: 60, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 12:57:21,446 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2024-09-14 12:57:47,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-09-14 12:57:53,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=130407.0, ans=0.125 2024-09-14 12:58:00,535 INFO [train.py:1198] (0/2) Epoch 8, batch 1300, loss[loss=0.2516, ctc_loss=0.1755, cr_loss=0.3805, over 19830.00 frames. ], tot_loss[loss=0.2825, ctc_loss=0.2015, cr_loss=0.4051, over 4092419.44 frames. ], batch size: 44, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 12:58:12,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.79 vs. limit=6.0 2024-09-14 12:58:14,259 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.044e+02 2.200e+02 2.413e+02 2.893e+02, threshold=4.399e+02, percent-clipped=0.0 2024-09-14 12:59:05,639 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-09-14 12:59:16,753 INFO [train.py:1198] (0/2) Epoch 8, batch 1350, loss[loss=0.2257, ctc_loss=0.1607, cr_loss=0.3249, over 20345.00 frames. ], tot_loss[loss=0.2819, ctc_loss=0.2011, cr_loss=0.4042, over 4096150.91 frames. ], batch size: 45, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 12:59:45,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130605.33333333333, ans=0.1 2024-09-14 12:59:53,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=130633.66666666667, ans=0.125 2024-09-14 13:00:11,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=130662.0, ans=0.125 2024-09-14 13:00:31,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=130690.33333333333, ans=0.2 2024-09-14 13:00:35,645 INFO [train.py:1198] (0/2) Epoch 8, batch 1400, loss[loss=0.2794, ctc_loss=0.1996, cr_loss=0.3993, over 21088.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.2008, cr_loss=0.4043, over 4107851.03 frames. ], batch size: 59, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:00:38,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=130718.66666666667, ans=0.07 2024-09-14 13:00:49,093 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.137e+02 2.285e+02 2.550e+02 5.423e+02, threshold=4.570e+02, percent-clipped=1.0 2024-09-14 13:00:49,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=130747.0, ans=0.125 2024-09-14 13:01:27,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=130803.66666666667, ans=0.125 2024-09-14 13:01:43,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=130832.0, ans=0.0 2024-09-14 13:01:50,834 INFO [train.py:1198] (0/2) Epoch 8, batch 1450, loss[loss=0.2351, ctc_loss=0.1594, cr_loss=0.3783, over 20957.00 frames. ], tot_loss[loss=0.2816, ctc_loss=0.2005, cr_loss=0.4054, over 4106513.83 frames. ], batch size: 50, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:01:54,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=130860.33333333333, ans=0.0 2024-09-14 13:01:55,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=130860.33333333333, ans=0.1 2024-09-14 13:02:09,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130888.66666666667, ans=0.1 2024-09-14 13:02:29,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=130917.0, ans=0.5 2024-09-14 13:02:53,639 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0 2024-09-14 13:02:57,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=130973.66666666667, ans=0.0 2024-09-14 13:03:03,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=130973.66666666667, ans=0.125 2024-09-14 13:03:08,958 INFO [train.py:1198] (0/2) Epoch 8, batch 1500, loss[loss=0.2557, ctc_loss=0.1786, cr_loss=0.3853, over 20883.00 frames. ], tot_loss[loss=0.2826, ctc_loss=0.2013, cr_loss=0.4062, over 4086519.50 frames. ], batch size: 54, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:03:18,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=131002.0, ans=0.2 2024-09-14 13:03:22,498 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.027e+02 2.176e+02 2.400e+02 4.808e+02, threshold=4.351e+02, percent-clipped=1.0 2024-09-14 13:04:12,030 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:04:13,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=131115.33333333334, ans=0.125 2024-09-14 13:04:25,287 INFO [train.py:1198] (0/2) Epoch 8, batch 1550, loss[loss=0.275, ctc_loss=0.1925, cr_loss=0.4124, over 21057.00 frames. ], tot_loss[loss=0.2818, ctc_loss=0.2008, cr_loss=0.4048, over 4084666.23 frames. ], batch size: 56, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:04:34,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-14 13:04:59,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=131200.33333333334, ans=0.125 2024-09-14 13:05:21,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=131228.66666666666, ans=0.2 2024-09-14 13:05:37,785 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:05:44,826 INFO [train.py:1198] (0/2) Epoch 8, batch 1600, loss[loss=0.2336, ctc_loss=0.1626, cr_loss=0.355, over 20972.00 frames. ], tot_loss[loss=0.2798, ctc_loss=0.1992, cr_loss=0.4033, over 4104224.12 frames. ], batch size: 48, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:05:58,173 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.124e+02 2.287e+02 2.526e+02 8.912e+02, threshold=4.575e+02, percent-clipped=1.0 2024-09-14 13:06:12,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=131313.66666666666, ans=0.04949747468305833 2024-09-14 13:06:17,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-09-14 13:07:00,189 INFO [train.py:1198] (0/2) Epoch 8, batch 1650, loss[loss=0.3075, ctc_loss=0.2194, cr_loss=0.4407, over 20656.00 frames. ], tot_loss[loss=0.2818, ctc_loss=0.2008, cr_loss=0.4053, over 4094602.26 frames. ], batch size: 66, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:07:00,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=131427.0, ans=0.2 2024-09-14 13:07:31,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=131483.66666666666, ans=0.125 2024-09-14 13:08:16,086 INFO [train.py:1198] (0/2) Epoch 8, batch 1700, loss[loss=0.2484, ctc_loss=0.1729, cr_loss=0.3776, over 20970.00 frames. ], tot_loss[loss=0.2826, ctc_loss=0.2015, cr_loss=0.4057, over 4079780.15 frames. ], batch size: 50, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:08:32,941 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.126e+02 2.303e+02 2.588e+02 5.770e+02, threshold=4.606e+02, percent-clipped=1.0 2024-09-14 13:08:42,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=131597.0, ans=0.125 2024-09-14 13:08:48,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=131625.33333333334, ans=0.125 2024-09-14 13:09:07,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=131653.66666666666, ans=0.2 2024-09-14 13:09:29,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=131682.0, ans=0.2 2024-09-14 13:09:34,881 INFO [train.py:1198] (0/2) Epoch 8, batch 1750, loss[loss=0.2362, ctc_loss=0.1644, cr_loss=0.359, over 20974.00 frames. ], tot_loss[loss=0.2825, ctc_loss=0.2014, cr_loss=0.4054, over 4077755.27 frames. ], batch size: 50, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:10:17,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=131767.0, ans=0.0 2024-09-14 13:10:37,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=131823.66666666666, ans=0.1 2024-09-14 13:10:45,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=131823.66666666666, ans=0.1 2024-09-14 13:10:45,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=131823.66666666666, ans=0.025 2024-09-14 13:10:50,884 INFO [train.py:1198] (0/2) Epoch 8, batch 1800, loss[loss=0.3107, ctc_loss=0.2196, cr_loss=0.4555, over 20841.00 frames. ], tot_loss[loss=0.2812, ctc_loss=0.2003, cr_loss=0.4046, over 4080675.48 frames. ], batch size: 59, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:11:05,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=131852.0, ans=0.015 2024-09-14 13:11:07,984 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.117e+02 2.382e+02 2.705e+02 4.221e+02, threshold=4.764e+02, percent-clipped=0.0 2024-09-14 13:11:32,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131908.66666666666, ans=0.1 2024-09-14 13:12:01,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=131965.33333333334, ans=0.125 2024-09-14 13:12:07,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=131965.33333333334, ans=0.2 2024-09-14 13:12:09,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=12.0 2024-09-14 13:12:09,957 INFO [train.py:1198] (0/2) Epoch 8, batch 1850, loss[loss=0.2568, ctc_loss=0.1811, cr_loss=0.3785, over 20889.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.2005, cr_loss=0.4049, over 4082870.10 frames. ], batch size: 57, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:12:13,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=131993.66666666666, ans=0.125 2024-09-14 13:12:17,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=131993.66666666666, ans=0.2 2024-09-14 13:12:38,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=132050.33333333334, ans=0.125 2024-09-14 13:12:41,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=132050.33333333334, ans=0.1 2024-09-14 13:13:11,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2024-09-14 13:13:25,036 INFO [train.py:1198] (0/2) Epoch 8, batch 1900, loss[loss=0.284, ctc_loss=0.2005, cr_loss=0.4172, over 20795.00 frames. ], tot_loss[loss=0.2812, ctc_loss=0.2002, cr_loss=0.4053, over 4084367.47 frames. ], batch size: 53, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:13:38,736 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.078e+02 2.357e+02 2.693e+02 3.869e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-14 13:13:55,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=132192.0, ans=0.0 2024-09-14 13:14:28,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=132248.66666666666, ans=0.125 2024-09-14 13:14:29,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=132248.66666666666, ans=0.125 2024-09-14 13:14:31,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-09-14 13:14:42,910 INFO [train.py:1198] (0/2) Epoch 8, batch 1950, loss[loss=0.3104, ctc_loss=0.2203, cr_loss=0.4505, over 20106.00 frames. ], tot_loss[loss=0.2811, ctc_loss=0.2002, cr_loss=0.4046, over 4089715.57 frames. ], batch size: 80, lr: 1.07e-02, grad_scale: 32.0 2024-09-14 13:14:44,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=132277.0, ans=0.125 2024-09-14 13:14:46,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=132277.0, ans=0.125 2024-09-14 13:14:47,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=132277.0, ans=0.025 2024-09-14 13:15:33,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=132362.0, ans=0.125 2024-09-14 13:15:58,829 INFO [train.py:1198] (0/2) Epoch 8, batch 2000, loss[loss=0.2943, ctc_loss=0.2081, cr_loss=0.4309, over 20978.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.2006, cr_loss=0.4046, over 4087607.37 frames. ], batch size: 58, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:16:03,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=132418.66666666666, ans=0.125 2024-09-14 13:16:12,132 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.034e+02 2.190e+02 2.424e+02 4.015e+02, threshold=4.381e+02, percent-clipped=0.0 2024-09-14 13:16:58,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=132503.66666666666, ans=0.125 2024-09-14 13:17:09,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-09-14 13:17:11,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.09 vs. limit=10.0 2024-09-14 13:17:17,442 INFO [train.py:1198] (0/2) Epoch 8, batch 2050, loss[loss=0.2758, ctc_loss=0.1949, cr_loss=0.4043, over 20877.00 frames. ], tot_loss[loss=0.2814, ctc_loss=0.2004, cr_loss=0.4047, over 4094400.88 frames. ], batch size: 57, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:17:38,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=132588.66666666666, ans=0.0 2024-09-14 13:17:44,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=132588.66666666666, ans=0.125 2024-09-14 13:18:08,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-14 13:18:25,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=132673.66666666666, ans=0.125 2024-09-14 13:18:27,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=132673.66666666666, ans=0.05 2024-09-14 13:18:33,112 INFO [train.py:1198] (0/2) Epoch 8, batch 2100, loss[loss=0.2325, ctc_loss=0.16, cr_loss=0.3626, over 20933.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.2008, cr_loss=0.4045, over 4079374.87 frames. ], batch size: 49, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:18:36,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=132702.0, ans=0.0 2024-09-14 13:18:46,680 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.167e+02 2.422e+02 2.724e+02 4.702e+02, threshold=4.844e+02, percent-clipped=1.0 2024-09-14 13:18:48,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=132730.33333333334, ans=0.025 2024-09-14 13:19:02,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.24 vs. limit=10.0 2024-09-14 13:19:11,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=132758.66666666666, ans=0.0 2024-09-14 13:19:20,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-14 13:19:21,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=132787.0, ans=0.125 2024-09-14 13:19:43,030 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:19:51,612 INFO [train.py:1198] (0/2) Epoch 8, batch 2150, loss[loss=0.2684, ctc_loss=0.1893, cr_loss=0.3958, over 20865.00 frames. ], tot_loss[loss=0.282, ctc_loss=0.201, cr_loss=0.4051, over 4084594.78 frames. ], batch size: 57, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:20:06,239 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-14 13:20:16,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=132872.0, ans=0.0 2024-09-14 13:20:50,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2024-09-14 13:21:07,109 INFO [train.py:1198] (0/2) Epoch 8, batch 2200, loss[loss=0.2468, ctc_loss=0.1728, cr_loss=0.3702, over 20974.00 frames. ], tot_loss[loss=0.282, ctc_loss=0.2009, cr_loss=0.4056, over 4094404.41 frames. ], batch size: 52, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:21:20,679 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.645e+02 2.092e+02 2.250e+02 2.431e+02 3.576e+02, threshold=4.499e+02, percent-clipped=0.0 2024-09-14 13:21:21,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=133013.66666666666, ans=0.125 2024-09-14 13:21:23,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=133013.66666666666, ans=0.0 2024-09-14 13:22:26,040 INFO [train.py:1198] (0/2) Epoch 8, batch 2250, loss[loss=0.2935, ctc_loss=0.2076, cr_loss=0.4292, over 20704.00 frames. ], tot_loss[loss=0.2797, ctc_loss=0.1991, cr_loss=0.4032, over 4092271.83 frames. ], batch size: 71, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:22:36,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=133127.0, ans=0.2 2024-09-14 13:22:41,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=22.5 2024-09-14 13:23:21,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=133212.0, ans=0.0 2024-09-14 13:23:41,202 INFO [train.py:1198] (0/2) Epoch 8, batch 2300, loss[loss=0.2502, ctc_loss=0.1729, cr_loss=0.3864, over 21069.00 frames. ], tot_loss[loss=0.2806, ctc_loss=0.1997, cr_loss=0.4045, over 4082990.01 frames. ], batch size: 53, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:23:41,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=133268.66666666666, ans=0.2 2024-09-14 13:23:54,888 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.101e+02 2.369e+02 2.709e+02 3.780e+02, threshold=4.739e+02, percent-clipped=0.0 2024-09-14 13:24:07,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=133297.0, ans=0.125 2024-09-14 13:24:10,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=133325.33333333334, ans=0.07 2024-09-14 13:24:22,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=133325.33333333334, ans=0.125 2024-09-14 13:24:27,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=133353.66666666666, ans=0.0 2024-09-14 13:24:33,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=133353.66666666666, ans=0.025 2024-09-14 13:24:36,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=133353.66666666666, ans=0.0 2024-09-14 13:24:56,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-09-14 13:24:57,191 INFO [train.py:1198] (0/2) Epoch 8, batch 2350, loss[loss=0.3075, ctc_loss=0.2263, cr_loss=0.4061, over 18347.00 frames. ], tot_loss[loss=0.2814, ctc_loss=0.2004, cr_loss=0.4049, over 4084740.70 frames. ], batch size: 108, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:24:57,958 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=22.5 2024-09-14 13:25:14,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=133438.66666666666, ans=0.125 2024-09-14 13:25:14,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=133438.66666666666, ans=0.125 2024-09-14 13:26:05,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=133523.66666666666, ans=0.2 2024-09-14 13:26:15,920 INFO [train.py:1198] (0/2) Epoch 8, batch 2400, loss[loss=0.3004, ctc_loss=0.2153, cr_loss=0.4256, over 20934.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.1998, cr_loss=0.4036, over 4091368.43 frames. ], batch size: 60, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:26:28,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2024-09-14 13:26:29,507 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.062e+02 2.325e+02 2.560e+02 3.645e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-14 13:26:49,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2024-09-14 13:26:52,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=133608.66666666666, ans=0.125 2024-09-14 13:27:00,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=133637.0, ans=0.125 2024-09-14 13:27:18,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=133665.33333333334, ans=0.125 2024-09-14 13:27:26,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2024-09-14 13:27:27,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=133665.33333333334, ans=0.0 2024-09-14 13:27:27,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=133665.33333333334, ans=0.0 2024-09-14 13:27:31,804 INFO [train.py:1198] (0/2) Epoch 8, batch 2450, loss[loss=0.2418, ctc_loss=0.1683, cr_loss=0.3671, over 19835.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1989, cr_loss=0.4029, over 4092775.49 frames. ], batch size: 44, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:27:35,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=133693.66666666666, ans=0.125 2024-09-14 13:27:46,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-14 13:28:30,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-09-14 13:28:52,450 INFO [train.py:1198] (0/2) Epoch 8, batch 2500, loss[loss=0.314, ctc_loss=0.2247, cr_loss=0.4465, over 20649.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.1994, cr_loss=0.4033, over 4085306.24 frames. ], batch size: 71, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:28:57,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.24 vs. limit=22.5 2024-09-14 13:29:00,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133835.33333333334, ans=0.1 2024-09-14 13:29:05,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.145e+02 2.275e+02 2.498e+02 4.011e+02, threshold=4.549e+02, percent-clipped=0.0 2024-09-14 13:29:09,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=133863.66666666666, ans=0.0 2024-09-14 13:30:08,588 INFO [train.py:1198] (0/2) Epoch 8, batch 2550, loss[loss=0.2711, ctc_loss=0.1892, cr_loss=0.4094, over 20765.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1995, cr_loss=0.4046, over 4099843.44 frames. ], batch size: 53, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:30:19,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=133977.0, ans=0.0 2024-09-14 13:30:52,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=134062.0, ans=0.025 2024-09-14 13:31:26,741 INFO [train.py:1198] (0/2) Epoch 8, batch 2600, loss[loss=0.3325, ctc_loss=0.2435, cr_loss=0.4449, over 20274.00 frames. ], tot_loss[loss=0.2803, ctc_loss=0.1994, cr_loss=0.4046, over 4105582.07 frames. ], batch size: 74, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:31:31,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=134118.66666666666, ans=0.1 2024-09-14 13:31:40,245 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.108e+02 2.323e+02 2.656e+02 4.347e+02, threshold=4.645e+02, percent-clipped=0.0 2024-09-14 13:32:08,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=134175.33333333334, ans=0.125 2024-09-14 13:32:42,102 INFO [train.py:1198] (0/2) Epoch 8, batch 2650, loss[loss=0.3342, ctc_loss=0.2451, cr_loss=0.4458, over 18132.00 frames. ], tot_loss[loss=0.2811, ctc_loss=0.2001, cr_loss=0.4047, over 4081409.09 frames. ], batch size: 108, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:33:02,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=134288.66666666666, ans=0.2 2024-09-14 13:33:22,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=134317.0, ans=0.0 2024-09-14 13:33:55,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=134373.66666666666, ans=0.2 2024-09-14 13:34:02,474 INFO [train.py:1198] (0/2) Epoch 8, batch 2700, loss[loss=0.2452, ctc_loss=0.1748, cr_loss=0.3521, over 20959.00 frames. ], tot_loss[loss=0.28, ctc_loss=0.1993, cr_loss=0.4037, over 4088410.39 frames. ], batch size: 49, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:34:11,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=134402.0, ans=0.025 2024-09-14 13:34:16,090 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.096e+02 2.304e+02 2.611e+02 4.523e+02, threshold=4.607e+02, percent-clipped=0.0 2024-09-14 13:34:55,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=134487.0, ans=10.0 2024-09-14 13:35:07,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=134515.33333333334, ans=0.2 2024-09-14 13:35:13,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=134515.33333333334, ans=0.125 2024-09-14 13:35:17,965 INFO [train.py:1198] (0/2) Epoch 8, batch 2750, loss[loss=0.2661, ctc_loss=0.191, cr_loss=0.3755, over 20769.00 frames. ], tot_loss[loss=0.28, ctc_loss=0.1994, cr_loss=0.403, over 4077412.72 frames. ], batch size: 56, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:35:47,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-09-14 13:36:04,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=134628.66666666666, ans=0.1 2024-09-14 13:36:04,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=134628.66666666666, ans=0.125 2024-09-14 13:36:34,262 INFO [train.py:1198] (0/2) Epoch 8, batch 2800, loss[loss=0.2734, ctc_loss=0.1932, cr_loss=0.4012, over 20971.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1997, cr_loss=0.4035, over 4079305.62 frames. ], batch size: 55, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:36:48,271 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.114e+02 2.371e+02 2.689e+02 4.577e+02, threshold=4.741e+02, percent-clipped=0.0 2024-09-14 13:37:24,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=134770.33333333334, ans=0.0 2024-09-14 13:37:35,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=134770.33333333334, ans=0.125 2024-09-14 13:37:53,364 INFO [train.py:1198] (0/2) Epoch 8, batch 2850, loss[loss=0.2672, ctc_loss=0.1901, cr_loss=0.3854, over 20962.00 frames. ], tot_loss[loss=0.2803, ctc_loss=0.1995, cr_loss=0.404, over 4090733.86 frames. ], batch size: 50, lr: 1.06e-02, grad_scale: 32.0 2024-09-14 13:38:24,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=134883.66666666666, ans=0.125 2024-09-14 13:38:39,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=134912.0, ans=0.0 2024-09-14 13:38:53,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=134940.33333333334, ans=0.025 2024-09-14 13:38:57,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=134940.33333333334, ans=0.2 2024-09-14 13:39:01,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=134940.33333333334, ans=0.2 2024-09-14 13:39:02,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=134940.33333333334, ans=0.2 2024-09-14 13:39:09,890 INFO [train.py:1198] (0/2) Epoch 8, batch 2900, loss[loss=0.3221, ctc_loss=0.2355, cr_loss=0.433, over 18211.00 frames. ], tot_loss[loss=0.2811, ctc_loss=0.2, cr_loss=0.4051, over 4088083.32 frames. ], batch size: 108, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:39:13,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=134968.66666666666, ans=0.125 2024-09-14 13:39:23,489 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.121e+02 2.307e+02 2.594e+02 4.069e+02, threshold=4.614e+02, percent-clipped=0.0 2024-09-14 13:39:29,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=134997.0, ans=0.0 2024-09-14 13:40:00,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=135053.66666666666, ans=0.2 2024-09-14 13:40:29,827 INFO [train.py:1198] (0/2) Epoch 8, batch 2950, loss[loss=0.2647, ctc_loss=0.1878, cr_loss=0.3845, over 20791.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1994, cr_loss=0.4048, over 4096733.87 frames. ], batch size: 56, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:40:41,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=12.0 2024-09-14 13:40:43,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=135138.66666666666, ans=0.0 2024-09-14 13:41:21,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=135195.33333333334, ans=0.0 2024-09-14 13:41:37,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2024-09-14 13:41:45,800 INFO [train.py:1198] (0/2) Epoch 8, batch 3000, loss[loss=0.2918, ctc_loss=0.2103, cr_loss=0.4075, over 20754.00 frames. ], tot_loss[loss=0.2826, ctc_loss=0.2011, cr_loss=0.4074, over 4095248.78 frames. ], batch size: 56, lr: 1.05e-02, grad_scale: 64.0 2024-09-14 13:41:45,801 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 13:42:16,210 INFO [train.py:1230] (0/2) Epoch 8, validation: loss=0.05931, ctc_loss=0.05931, cr_loss=9.029e-15, over 944034.00 frames. 2024-09-14 13:42:16,210 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 13:42:29,789 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.154e+02 2.318e+02 2.651e+02 3.722e+02, threshold=4.636e+02, percent-clipped=0.0 2024-09-14 13:42:40,962 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=22.5 2024-09-14 13:42:51,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=135308.66666666666, ans=0.2 2024-09-14 13:43:20,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=135365.33333333334, ans=0.0 2024-09-14 13:43:26,705 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=22.5 2024-09-14 13:43:33,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=22.5 2024-09-14 13:43:34,679 INFO [train.py:1198] (0/2) Epoch 8, batch 3050, loss[loss=0.2648, ctc_loss=0.1874, cr_loss=0.3868, over 21001.00 frames. ], tot_loss[loss=0.2819, ctc_loss=0.2006, cr_loss=0.4066, over 4097698.76 frames. ], batch size: 63, lr: 1.05e-02, grad_scale: 64.0 2024-09-14 13:43:50,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-09-14 13:43:55,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=135422.0, ans=0.125 2024-09-14 13:44:38,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=135507.0, ans=0.02 2024-09-14 13:44:50,751 INFO [train.py:1198] (0/2) Epoch 8, batch 3100, loss[loss=0.2994, ctc_loss=0.2141, cr_loss=0.4264, over 20633.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.2002, cr_loss=0.4064, over 4102925.03 frames. ], batch size: 71, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:45:01,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=135535.33333333334, ans=0.0 2024-09-14 13:45:05,713 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.084e+02 2.281e+02 2.519e+02 3.592e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-14 13:45:41,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-09-14 13:45:42,723 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2024-09-14 13:45:53,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2024-09-14 13:46:08,445 INFO [train.py:1198] (0/2) Epoch 8, batch 3150, loss[loss=0.3069, ctc_loss=0.2232, cr_loss=0.4185, over 20064.00 frames. ], tot_loss[loss=0.2812, ctc_loss=0.2, cr_loss=0.4062, over 4107281.32 frames. ], batch size: 80, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:46:10,722 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.17 vs. limit=22.5 2024-09-14 13:47:15,617 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=15.0 2024-09-14 13:47:23,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=135818.66666666666, ans=0.125 2024-09-14 13:47:24,223 INFO [train.py:1198] (0/2) Epoch 8, batch 3200, loss[loss=0.2283, ctc_loss=0.1601, cr_loss=0.3409, over 21042.00 frames. ], tot_loss[loss=0.2809, ctc_loss=0.1998, cr_loss=0.4052, over 4104981.82 frames. ], batch size: 53, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:47:24,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=135818.66666666666, ans=0.125 2024-09-14 13:47:39,148 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.091e+02 2.295e+02 2.588e+02 5.097e+02, threshold=4.590e+02, percent-clipped=1.0 2024-09-14 13:48:42,724 INFO [train.py:1198] (0/2) Epoch 8, batch 3250, loss[loss=0.2811, ctc_loss=0.201, cr_loss=0.4003, over 21023.00 frames. ], tot_loss[loss=0.2797, ctc_loss=0.199, cr_loss=0.4038, over 4103105.98 frames. ], batch size: 63, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:49:02,232 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-48000.pt 2024-09-14 13:49:23,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=136017.0, ans=0.0 2024-09-14 13:49:26,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136017.0, ans=0.125 2024-09-14 13:49:51,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-09-14 13:49:58,957 INFO [train.py:1198] (0/2) Epoch 8, batch 3300, loss[loss=0.2596, ctc_loss=0.1824, cr_loss=0.3859, over 20770.00 frames. ], tot_loss[loss=0.2792, ctc_loss=0.1985, cr_loss=0.4035, over 4105442.56 frames. ], batch size: 53, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:50:14,086 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.100e+02 2.272e+02 2.491e+02 3.171e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-14 13:50:20,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=136130.33333333334, ans=0.0 2024-09-14 13:50:36,142 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.31 vs. limit=22.5 2024-09-14 13:50:55,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=136187.0, ans=0.1 2024-09-14 13:50:58,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=136215.33333333334, ans=0.0 2024-09-14 13:51:15,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-14 13:51:17,239 INFO [train.py:1198] (0/2) Epoch 8, batch 3350, loss[loss=0.3056, ctc_loss=0.2219, cr_loss=0.4185, over 20668.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1987, cr_loss=0.4039, over 4104453.69 frames. ], batch size: 68, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:51:22,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=136243.66666666666, ans=0.07 2024-09-14 13:51:43,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136272.0, ans=0.1 2024-09-14 13:51:53,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=136300.33333333334, ans=0.125 2024-09-14 13:52:05,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=136328.66666666666, ans=0.125 2024-09-14 13:52:32,978 INFO [train.py:1198] (0/2) Epoch 8, batch 3400, loss[loss=0.305, ctc_loss=0.2141, cr_loss=0.4548, over 20945.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1988, cr_loss=0.4036, over 4112184.16 frames. ], batch size: 60, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:52:42,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136385.33333333334, ans=0.125 2024-09-14 13:52:48,276 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.158e+02 2.412e+02 2.883e+02 4.591e+02, threshold=4.824e+02, percent-clipped=1.0 2024-09-14 13:53:32,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=136498.66666666666, ans=0.125 2024-09-14 13:53:36,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=136498.66666666666, ans=0.2 2024-09-14 13:53:41,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2024-09-14 13:53:48,360 INFO [train.py:1198] (0/2) Epoch 8, batch 3450, loss[loss=0.2334, ctc_loss=0.1631, cr_loss=0.3515, over 20961.00 frames. ], tot_loss[loss=0.2757, ctc_loss=0.1958, cr_loss=0.3996, over 4122794.61 frames. ], batch size: 49, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:54:09,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=136555.33333333334, ans=0.05 2024-09-14 13:54:32,051 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:54:35,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=136612.0, ans=0.07 2024-09-14 13:55:06,224 INFO [train.py:1198] (0/2) Epoch 8, batch 3500, loss[loss=0.2556, ctc_loss=0.1803, cr_loss=0.3761, over 21051.00 frames. ], tot_loss[loss=0.2782, ctc_loss=0.1976, cr_loss=0.403, over 4115708.21 frames. ], batch size: 53, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:55:21,592 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.050e+02 2.209e+02 2.507e+02 3.525e+02, threshold=4.418e+02, percent-clipped=0.0 2024-09-14 13:55:28,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=136697.0, ans=0.0 2024-09-14 13:55:29,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=136697.0, ans=0.125 2024-09-14 13:55:29,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=136697.0, ans=0.125 2024-09-14 13:56:02,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=136753.66666666666, ans=0.0 2024-09-14 13:56:22,290 INFO [train.py:1198] (0/2) Epoch 8, batch 3550, loss[loss=0.2764, ctc_loss=0.1966, cr_loss=0.399, over 20938.00 frames. ], tot_loss[loss=0.2771, ctc_loss=0.1968, cr_loss=0.4013, over 4114631.93 frames. ], batch size: 60, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:56:24,338 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 13:57:38,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-14 13:57:40,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.87 vs. limit=10.0 2024-09-14 13:57:40,699 INFO [train.py:1198] (0/2) Epoch 8, batch 3600, loss[loss=0.2469, ctc_loss=0.1757, cr_loss=0.3556, over 20963.00 frames. ], tot_loss[loss=0.2764, ctc_loss=0.1963, cr_loss=0.4005, over 4105576.60 frames. ], batch size: 51, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:57:55,639 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.061e+02 2.168e+02 2.358e+02 4.249e+02, threshold=4.337e+02, percent-clipped=0.0 2024-09-14 13:57:55,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=136980.33333333334, ans=0.125 2024-09-14 13:58:07,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=136980.33333333334, ans=0.0 2024-09-14 13:58:15,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=137008.66666666666, ans=0.125 2024-09-14 13:58:55,979 INFO [train.py:1198] (0/2) Epoch 8, batch 3650, loss[loss=0.2344, ctc_loss=0.1683, cr_loss=0.3304, over 21013.00 frames. ], tot_loss[loss=0.2773, ctc_loss=0.197, cr_loss=0.4016, over 4097684.01 frames. ], batch size: 52, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 13:59:09,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=137122.0, ans=0.0 2024-09-14 13:59:37,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=137150.33333333334, ans=0.1 2024-09-14 13:59:58,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=137207.0, ans=0.125 2024-09-14 14:00:07,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=137207.0, ans=0.04949747468305833 2024-09-14 14:00:14,605 INFO [train.py:1198] (0/2) Epoch 8, batch 3700, loss[loss=0.2748, ctc_loss=0.1913, cr_loss=0.4179, over 21020.00 frames. ], tot_loss[loss=0.2775, ctc_loss=0.1972, cr_loss=0.4014, over 4097144.25 frames. ], batch size: 63, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 14:00:29,412 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.163e+02 2.326e+02 2.877e+02 5.372e+02, threshold=4.652e+02, percent-clipped=3.0 2024-09-14 14:01:04,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=137320.33333333334, ans=0.0 2024-09-14 14:01:09,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-14 14:01:26,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=137348.66666666666, ans=0.5 2024-09-14 14:01:29,718 INFO [train.py:1198] (0/2) Epoch 8, batch 3750, loss[loss=0.3203, ctc_loss=0.2322, cr_loss=0.4407, over 19380.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1996, cr_loss=0.4042, over 4078890.67 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 14:01:54,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=137405.33333333334, ans=0.125 2024-09-14 14:02:32,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=137490.33333333334, ans=0.025 2024-09-14 14:02:48,066 INFO [train.py:1198] (0/2) Epoch 8, batch 3800, loss[loss=0.2564, ctc_loss=0.1782, cr_loss=0.3912, over 21062.00 frames. ], tot_loss[loss=0.282, ctc_loss=0.2009, cr_loss=0.4053, over 4074389.51 frames. ], batch size: 53, lr: 1.05e-02, grad_scale: 32.0 2024-09-14 14:03:03,266 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.171e+02 2.365e+02 2.666e+02 4.854e+02, threshold=4.729e+02, percent-clipped=1.0 2024-09-14 14:03:06,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=137547.0, ans=0.0 2024-09-14 14:04:04,221 INFO [train.py:1198] (0/2) Epoch 8, batch 3850, loss[loss=0.3102, ctc_loss=0.2196, cr_loss=0.4528, over 20952.00 frames. ], tot_loss[loss=0.2834, ctc_loss=0.2019, cr_loss=0.4073, over 4078653.09 frames. ], batch size: 58, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:04:24,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=137688.66666666666, ans=0.1 2024-09-14 14:04:50,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=137745.33333333334, ans=0.125 2024-09-14 14:04:51,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=137745.33333333334, ans=0.1 2024-09-14 14:05:02,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=137745.33333333334, ans=0.05 2024-09-14 14:05:20,415 INFO [train.py:1198] (0/2) Epoch 8, batch 3900, loss[loss=0.283, ctc_loss=0.2014, cr_loss=0.4081, over 21057.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.2004, cr_loss=0.4055, over 4084052.66 frames. ], batch size: 59, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:05:35,565 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.096e+02 2.322e+02 2.626e+02 3.932e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-14 14:05:38,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=137830.33333333334, ans=0.04949747468305833 2024-09-14 14:06:03,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=137858.66666666666, ans=0.125 2024-09-14 14:06:07,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=137887.0, ans=0.125 2024-09-14 14:06:11,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=137887.0, ans=0.1 2024-09-14 14:06:22,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=137915.33333333334, ans=0.0 2024-09-14 14:06:25,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-14 14:06:38,420 INFO [train.py:1198] (0/2) Epoch 8, batch 3950, loss[loss=0.2523, ctc_loss=0.1778, cr_loss=0.3728, over 20956.00 frames. ], tot_loss[loss=0.282, ctc_loss=0.2008, cr_loss=0.4059, over 4083691.47 frames. ], batch size: 49, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:06:51,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=137943.66666666666, ans=0.125 2024-09-14 14:07:54,288 INFO [train.py:1198] (0/2) Epoch 8, batch 4000, loss[loss=0.2277, ctc_loss=0.1603, cr_loss=0.3371, over 19920.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.1997, cr_loss=0.4041, over 4092625.14 frames. ], batch size: 44, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:08:12,492 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.071e+02 2.262e+02 2.472e+02 3.920e+02, threshold=4.525e+02, percent-clipped=0.0 2024-09-14 14:08:27,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=138142.0, ans=0.125 2024-09-14 14:08:39,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=138142.0, ans=15.0 2024-09-14 14:09:08,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2024-09-14 14:09:13,522 INFO [train.py:1198] (0/2) Epoch 8, batch 4050, loss[loss=0.2916, ctc_loss=0.2073, cr_loss=0.4215, over 20951.00 frames. ], tot_loss[loss=0.2812, ctc_loss=0.2003, cr_loss=0.4045, over 4087601.05 frames. ], batch size: 58, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:09:18,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=138227.0, ans=0.2 2024-09-14 14:09:18,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=138227.0, ans=0.125 2024-09-14 14:09:41,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=138255.33333333334, ans=0.0 2024-09-14 14:09:52,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138283.66666666666, ans=0.1 2024-09-14 14:10:29,537 INFO [train.py:1198] (0/2) Epoch 8, batch 4100, loss[loss=0.2885, ctc_loss=0.2065, cr_loss=0.41, over 20630.00 frames. ], tot_loss[loss=0.281, ctc_loss=0.2, cr_loss=0.4046, over 4080237.77 frames. ], batch size: 68, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:10:44,900 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.218e+02 2.557e+02 2.938e+02 5.683e+02, threshold=5.114e+02, percent-clipped=2.0 2024-09-14 14:11:09,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=138425.33333333334, ans=0.2 2024-09-14 14:11:18,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=138453.66666666666, ans=0.0 2024-09-14 14:11:42,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=138482.0, ans=0.125 2024-09-14 14:11:48,044 INFO [train.py:1198] (0/2) Epoch 8, batch 4150, loss[loss=0.2609, ctc_loss=0.1855, cr_loss=0.3773, over 20784.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1998, cr_loss=0.4049, over 4092646.32 frames. ], batch size: 53, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:11:49,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.74 vs. limit=15.0 2024-09-14 14:11:58,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=138510.33333333334, ans=0.5 2024-09-14 14:12:06,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=138538.66666666666, ans=0.025 2024-09-14 14:13:03,917 INFO [train.py:1198] (0/2) Epoch 8, batch 4200, loss[loss=0.2813, ctc_loss=0.2007, cr_loss=0.4031, over 20941.00 frames. ], tot_loss[loss=0.2794, ctc_loss=0.1986, cr_loss=0.4039, over 4106094.98 frames. ], batch size: 60, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:13:20,561 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.160e+02 2.378e+02 2.746e+02 4.623e+02, threshold=4.757e+02, percent-clipped=0.0 2024-09-14 14:14:04,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=138737.0, ans=0.125 2024-09-14 14:14:23,007 INFO [train.py:1198] (0/2) Epoch 8, batch 4250, loss[loss=0.2398, ctc_loss=0.1691, cr_loss=0.3535, over 20962.00 frames. ], tot_loss[loss=0.2776, ctc_loss=0.1971, cr_loss=0.4025, over 4113790.43 frames. ], batch size: 49, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:14:29,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=138793.66666666666, ans=0.025 2024-09-14 14:15:05,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=138850.33333333334, ans=0.0 2024-09-14 14:15:38,155 INFO [train.py:1198] (0/2) Epoch 8, batch 4300, loss[loss=0.3078, ctc_loss=0.2171, cr_loss=0.4537, over 20846.00 frames. ], tot_loss[loss=0.2786, ctc_loss=0.198, cr_loss=0.4033, over 4109016.93 frames. ], batch size: 59, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:15:41,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=138935.33333333334, ans=0.0 2024-09-14 14:15:54,789 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.101e+02 2.235e+02 2.576e+02 3.487e+02, threshold=4.469e+02, percent-clipped=0.0 2024-09-14 14:16:02,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=138963.66666666666, ans=0.2 2024-09-14 14:16:18,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.64 vs. limit=15.0 2024-09-14 14:16:39,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=139048.66666666666, ans=0.0 2024-09-14 14:16:40,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=139048.66666666666, ans=0.125 2024-09-14 14:16:53,803 INFO [train.py:1198] (0/2) Epoch 8, batch 4350, loss[loss=0.2681, ctc_loss=0.1828, cr_loss=0.4265, over 20989.00 frames. ], tot_loss[loss=0.2792, ctc_loss=0.1984, cr_loss=0.4038, over 4102551.41 frames. ], batch size: 55, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:16:56,044 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-09-14 14:16:56,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.22 vs. limit=22.5 2024-09-14 14:17:04,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139077.0, ans=0.1 2024-09-14 14:17:27,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2024-09-14 14:17:34,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=139133.66666666666, ans=0.0 2024-09-14 14:17:48,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=12.0 2024-09-14 14:18:12,243 INFO [train.py:1198] (0/2) Epoch 8, batch 4400, loss[loss=0.2852, ctc_loss=0.2037, cr_loss=0.4076, over 21077.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1986, cr_loss=0.4044, over 4112274.20 frames. ], batch size: 59, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:18:29,132 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.208e+02 2.365e+02 2.855e+02 4.961e+02, threshold=4.731e+02, percent-clipped=3.0 2024-09-14 14:18:43,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=139275.33333333334, ans=0.125 2024-09-14 14:19:02,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=139303.66666666666, ans=0.025 2024-09-14 14:19:05,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=139303.66666666666, ans=0.125 2024-09-14 14:19:31,121 INFO [train.py:1198] (0/2) Epoch 8, batch 4450, loss[loss=0.3646, ctc_loss=0.2726, cr_loss=0.4599, over 14576.00 frames. ], tot_loss[loss=0.2789, ctc_loss=0.1982, cr_loss=0.4034, over 4104787.32 frames. ], batch size: 149, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:19:55,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-09-14 14:20:35,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=139473.66666666666, ans=10.0 2024-09-14 14:20:39,230 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=22.5 2024-09-14 14:20:45,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139502.0, ans=0.1 2024-09-14 14:20:46,231 INFO [train.py:1198] (0/2) Epoch 8, batch 4500, loss[loss=0.2266, ctc_loss=0.1582, cr_loss=0.3418, over 19991.00 frames. ], tot_loss[loss=0.2794, ctc_loss=0.1986, cr_loss=0.4039, over 4099706.10 frames. ], batch size: 44, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:21:02,856 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.138e+02 2.348e+02 2.604e+02 6.128e+02, threshold=4.697e+02, percent-clipped=1.0 2024-09-14 14:21:10,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=139530.33333333334, ans=0.125 2024-09-14 14:21:22,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=139558.66666666666, ans=0.125 2024-09-14 14:21:35,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=139587.0, ans=0.125 2024-09-14 14:22:01,675 INFO [train.py:1198] (0/2) Epoch 8, batch 4550, loss[loss=0.282, ctc_loss=0.2025, cr_loss=0.3978, over 20965.00 frames. ], tot_loss[loss=0.2775, ctc_loss=0.1971, cr_loss=0.402, over 4098169.33 frames. ], batch size: 58, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:22:05,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=139643.66666666666, ans=0.0 2024-09-14 14:22:25,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2024-09-14 14:22:41,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=139700.33333333334, ans=0.0 2024-09-14 14:22:56,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=22.5 2024-09-14 14:23:03,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=139757.0, ans=22.5 2024-09-14 14:23:20,278 INFO [train.py:1198] (0/2) Epoch 8, batch 4600, loss[loss=0.3112, ctc_loss=0.2246, cr_loss=0.4329, over 19430.00 frames. ], tot_loss[loss=0.2771, ctc_loss=0.1968, cr_loss=0.4016, over 4092067.40 frames. ], batch size: 90, lr: 1.04e-02, grad_scale: 32.0 2024-09-14 14:23:32,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=139785.33333333334, ans=0.125 2024-09-14 14:23:37,059 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.082e+02 2.376e+02 2.603e+02 3.739e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-14 14:23:43,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=139813.66666666666, ans=0.0 2024-09-14 14:24:00,330 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:24:16,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=139870.33333333334, ans=0.125 2024-09-14 14:24:21,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=139898.66666666666, ans=0.125 2024-09-14 14:24:28,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=139898.66666666666, ans=0.125 2024-09-14 14:24:33,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=139898.66666666666, ans=0.04949747468305833 2024-09-14 14:24:36,057 INFO [train.py:1198] (0/2) Epoch 8, batch 4650, loss[loss=0.3046, ctc_loss=0.2177, cr_loss=0.4348, over 20657.00 frames. ], tot_loss[loss=0.2779, ctc_loss=0.1973, cr_loss=0.403, over 4099707.82 frames. ], batch size: 68, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:24:48,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=139927.0, ans=0.125 2024-09-14 14:24:50,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=139955.33333333334, ans=0.125 2024-09-14 14:25:02,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-14 14:25:54,671 INFO [train.py:1198] (0/2) Epoch 8, batch 4700, loss[loss=0.3013, ctc_loss=0.2186, cr_loss=0.4137, over 19843.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.1977, cr_loss=0.4038, over 4093000.30 frames. ], batch size: 80, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:26:12,622 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.675e+02 2.076e+02 2.230e+02 2.548e+02 4.187e+02, threshold=4.461e+02, percent-clipped=0.0 2024-09-14 14:26:43,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.34 vs. limit=15.0 2024-09-14 14:26:47,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140153.66666666666, ans=0.1 2024-09-14 14:27:08,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=140210.33333333334, ans=0.125 2024-09-14 14:27:09,566 INFO [train.py:1198] (0/2) Epoch 8, batch 4750, loss[loss=0.2978, ctc_loss=0.2099, cr_loss=0.4393, over 20924.00 frames. ], tot_loss[loss=0.2796, ctc_loss=0.1988, cr_loss=0.404, over 4079638.98 frames. ], batch size: 60, lr: 1.04e-02, grad_scale: 16.0 2024-09-14 14:27:21,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=140210.33333333334, ans=0.125 2024-09-14 14:27:57,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-09-14 14:27:58,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140295.33333333334, ans=0.125 2024-09-14 14:28:05,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140295.33333333334, ans=0.125 2024-09-14 14:28:10,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=140323.66666666666, ans=0.2 2024-09-14 14:28:19,485 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.65 vs. limit=22.5 2024-09-14 14:28:24,535 INFO [train.py:1198] (0/2) Epoch 8, batch 4800, loss[loss=0.2597, ctc_loss=0.1794, cr_loss=0.4017, over 21081.00 frames. ], tot_loss[loss=0.2798, ctc_loss=0.1989, cr_loss=0.4045, over 4089396.69 frames. ], batch size: 59, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:28:28,061 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=22.5 2024-09-14 14:28:42,536 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.125e+02 2.335e+02 2.659e+02 3.866e+02, threshold=4.671e+02, percent-clipped=0.0 2024-09-14 14:29:09,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=140408.66666666666, ans=0.125 2024-09-14 14:29:14,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=140437.0, ans=0.125 2024-09-14 14:29:30,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-14 14:29:43,054 INFO [train.py:1198] (0/2) Epoch 8, batch 4850, loss[loss=0.2848, ctc_loss=0.1989, cr_loss=0.4297, over 20623.00 frames. ], tot_loss[loss=0.2789, ctc_loss=0.1981, cr_loss=0.4036, over 4089240.22 frames. ], batch size: 68, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:29:45,206 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-14 14:29:46,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=140493.66666666666, ans=0.95 2024-09-14 14:30:19,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=140550.33333333334, ans=10.0 2024-09-14 14:30:58,700 INFO [train.py:1198] (0/2) Epoch 8, batch 4900, loss[loss=0.2589, ctc_loss=0.1812, cr_loss=0.3886, over 20992.00 frames. ], tot_loss[loss=0.278, ctc_loss=0.1973, cr_loss=0.4033, over 4092901.85 frames. ], batch size: 51, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:31:19,767 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.074e+02 2.271e+02 2.512e+02 3.637e+02, threshold=4.542e+02, percent-clipped=0.0 2024-09-14 14:31:57,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=140720.33333333334, ans=0.125 2024-09-14 14:32:16,006 INFO [train.py:1198] (0/2) Epoch 8, batch 4950, loss[loss=0.3038, ctc_loss=0.2188, cr_loss=0.4251, over 20847.00 frames. ], tot_loss[loss=0.2781, ctc_loss=0.1974, cr_loss=0.4037, over 4103195.29 frames. ], batch size: 65, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:33:06,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=140862.0, ans=0.125 2024-09-14 14:33:30,065 INFO [train.py:1198] (0/2) Epoch 8, batch 5000, loss[loss=0.2855, ctc_loss=0.202, cr_loss=0.4173, over 21029.00 frames. ], tot_loss[loss=0.2789, ctc_loss=0.198, cr_loss=0.4044, over 4105667.73 frames. ], batch size: 62, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:33:31,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=140918.66666666666, ans=0.0 2024-09-14 14:33:34,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=140918.66666666666, ans=0.2 2024-09-14 14:33:39,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=140918.66666666666, ans=0.2 2024-09-14 14:33:39,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=140918.66666666666, ans=15.0 2024-09-14 14:33:47,721 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.139e+02 2.387e+02 2.772e+02 5.760e+02, threshold=4.774e+02, percent-clipped=2.0 2024-09-14 14:34:02,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=140975.33333333334, ans=0.125 2024-09-14 14:34:05,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=140975.33333333334, ans=0.0 2024-09-14 14:34:12,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=140975.33333333334, ans=15.0 2024-09-14 14:34:44,100 INFO [train.py:1198] (0/2) Epoch 8, batch 5050, loss[loss=0.3251, ctc_loss=0.2361, cr_loss=0.445, over 20688.00 frames. ], tot_loss[loss=0.279, ctc_loss=0.1981, cr_loss=0.4042, over 4110817.13 frames. ], batch size: 71, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:34:50,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=141060.33333333334, ans=0.0 2024-09-14 14:35:01,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.01 vs. limit=10.0 2024-09-14 14:35:58,530 INFO [train.py:1198] (0/2) Epoch 8, batch 5100, loss[loss=0.2918, ctc_loss=0.2071, cr_loss=0.4239, over 20969.00 frames. ], tot_loss[loss=0.2792, ctc_loss=0.1983, cr_loss=0.4044, over 4096448.93 frames. ], batch size: 64, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:36:16,362 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.119e+02 2.302e+02 2.585e+02 6.006e+02, threshold=4.603e+02, percent-clipped=1.0 2024-09-14 14:36:16,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=141230.33333333334, ans=0.125 2024-09-14 14:36:18,366 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:36:48,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=141287.0, ans=0.0 2024-09-14 14:37:12,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.22 vs. limit=22.5 2024-09-14 14:37:13,429 INFO [train.py:1198] (0/2) Epoch 8, batch 5150, loss[loss=0.2916, ctc_loss=0.21, cr_loss=0.408, over 20318.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1985, cr_loss=0.405, over 4106354.47 frames. ], batch size: 80, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:37:56,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141400.33333333334, ans=0.1 2024-09-14 14:38:30,594 INFO [train.py:1198] (0/2) Epoch 8, batch 5200, loss[loss=0.2564, ctc_loss=0.1784, cr_loss=0.3896, over 20928.00 frames. ], tot_loss[loss=0.2782, ctc_loss=0.1973, cr_loss=0.4048, over 4116656.20 frames. ], batch size: 60, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:38:31,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.73 vs. limit=15.0 2024-09-14 14:38:36,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=141485.33333333334, ans=0.0 2024-09-14 14:38:48,194 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.148e+02 2.359e+02 2.888e+02 3.578e+02, threshold=4.717e+02, percent-clipped=0.0 2024-09-14 14:38:52,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141513.66666666666, ans=0.1 2024-09-14 14:39:12,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=141542.0, ans=22.5 2024-09-14 14:39:22,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141570.33333333334, ans=0.1 2024-09-14 14:39:29,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=22.5 2024-09-14 14:39:29,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141598.66666666666, ans=0.1 2024-09-14 14:39:33,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=141598.66666666666, ans=0.0 2024-09-14 14:39:44,523 INFO [train.py:1198] (0/2) Epoch 8, batch 5250, loss[loss=0.2985, ctc_loss=0.2145, cr_loss=0.4198, over 20879.00 frames. ], tot_loss[loss=0.2781, ctc_loss=0.1972, cr_loss=0.4042, over 4116185.43 frames. ], batch size: 65, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:39:47,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=141627.0, ans=0.95 2024-09-14 14:40:04,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=141655.33333333334, ans=0.07 2024-09-14 14:40:18,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=141683.66666666666, ans=0.125 2024-09-14 14:40:47,215 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-09-14 14:41:00,966 INFO [train.py:1198] (0/2) Epoch 8, batch 5300, loss[loss=0.2473, ctc_loss=0.1706, cr_loss=0.3838, over 20969.00 frames. ], tot_loss[loss=0.2773, ctc_loss=0.1968, cr_loss=0.4028, over 4108996.46 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:41:18,650 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.129e+02 2.379e+02 2.832e+02 5.826e+02, threshold=4.758e+02, percent-clipped=1.0 2024-09-14 14:41:28,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=141797.0, ans=0.2 2024-09-14 14:41:32,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=141825.33333333334, ans=0.125 2024-09-14 14:41:44,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=141853.66666666666, ans=0.2 2024-09-14 14:41:53,027 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:42:14,514 INFO [train.py:1198] (0/2) Epoch 8, batch 5350, loss[loss=0.2118, ctc_loss=0.1425, cr_loss=0.3466, over 20927.00 frames. ], tot_loss[loss=0.2781, ctc_loss=0.1973, cr_loss=0.4039, over 4107837.26 frames. ], batch size: 48, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:42:56,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141967.0, ans=0.1 2024-09-14 14:43:28,601 INFO [train.py:1198] (0/2) Epoch 8, batch 5400, loss[loss=0.268, ctc_loss=0.1901, cr_loss=0.3898, over 20895.00 frames. ], tot_loss[loss=0.2793, ctc_loss=0.1983, cr_loss=0.4047, over 4107721.54 frames. ], batch size: 54, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:43:29,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.35 vs. limit=22.5 2024-09-14 14:43:45,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=142080.33333333334, ans=0.125 2024-09-14 14:43:46,502 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.263e+02 2.496e+02 2.892e+02 4.621e+02, threshold=4.993e+02, percent-clipped=0.0 2024-09-14 14:44:00,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=142108.66666666666, ans=0.125 2024-09-14 14:44:39,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=142165.33333333334, ans=0.0 2024-09-14 14:44:42,440 INFO [train.py:1198] (0/2) Epoch 8, batch 5450, loss[loss=0.2463, ctc_loss=0.1716, cr_loss=0.3734, over 21056.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1991, cr_loss=0.4062, over 4104843.74 frames. ], batch size: 53, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:44:54,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=142193.66666666666, ans=0.025 2024-09-14 14:45:04,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142222.0, ans=0.1 2024-09-14 14:45:17,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=142250.33333333334, ans=0.0 2024-09-14 14:45:45,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=142307.0, ans=0.0 2024-09-14 14:45:51,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=142307.0, ans=15.0 2024-09-14 14:45:56,727 INFO [train.py:1198] (0/2) Epoch 8, batch 5500, loss[loss=0.2756, ctc_loss=0.1974, cr_loss=0.3913, over 20888.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.1989, cr_loss=0.4062, over 4103939.00 frames. ], batch size: 54, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:46:14,265 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.715e+02 2.071e+02 2.368e+02 2.692e+02 7.477e+02, threshold=4.736e+02, percent-clipped=3.0 2024-09-14 14:46:16,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=142363.66666666666, ans=0.0 2024-09-14 14:46:32,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=142392.0, ans=0.025 2024-09-14 14:46:36,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=142392.0, ans=0.07 2024-09-14 14:47:03,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=142448.66666666666, ans=0.125 2024-09-14 14:47:07,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=142448.66666666666, ans=0.125 2024-09-14 14:47:09,295 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2024-09-14 14:47:11,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=142477.0, ans=0.0 2024-09-14 14:47:12,895 INFO [train.py:1198] (0/2) Epoch 8, batch 5550, loss[loss=0.307, ctc_loss=0.2153, cr_loss=0.4585, over 20877.00 frames. ], tot_loss[loss=0.2791, ctc_loss=0.1982, cr_loss=0.4046, over 4098848.75 frames. ], batch size: 57, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:47:16,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=142477.0, ans=0.125 2024-09-14 14:47:32,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=142505.33333333334, ans=0.0 2024-09-14 14:47:35,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=142505.33333333334, ans=0.125 2024-09-14 14:47:47,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=142533.66666666666, ans=0.025 2024-09-14 14:48:27,609 INFO [train.py:1198] (0/2) Epoch 8, batch 5600, loss[loss=0.2915, ctc_loss=0.2074, cr_loss=0.4204, over 21001.00 frames. ], tot_loss[loss=0.279, ctc_loss=0.1983, cr_loss=0.4032, over 4091362.43 frames. ], batch size: 61, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:48:41,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2024-09-14 14:48:45,240 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.120e+02 2.314e+02 2.615e+02 3.772e+02, threshold=4.627e+02, percent-clipped=0.0 2024-09-14 14:48:49,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=142647.0, ans=0.0 2024-09-14 14:49:12,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=142703.66666666666, ans=0.125 2024-09-14 14:49:23,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=142703.66666666666, ans=0.125 2024-09-14 14:49:44,220 INFO [train.py:1198] (0/2) Epoch 8, batch 5650, loss[loss=0.308, ctc_loss=0.2208, cr_loss=0.436, over 20623.00 frames. ], tot_loss[loss=0.2774, ctc_loss=0.1971, cr_loss=0.4016, over 4083722.66 frames. ], batch size: 71, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:50:58,378 INFO [train.py:1198] (0/2) Epoch 8, batch 5700, loss[loss=0.2609, ctc_loss=0.1819, cr_loss=0.3948, over 21053.00 frames. ], tot_loss[loss=0.2781, ctc_loss=0.1975, cr_loss=0.4028, over 4080020.04 frames. ], batch size: 53, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:51:16,121 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.109e+02 2.284e+02 2.557e+02 3.936e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-14 14:51:16,786 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-14 14:51:38,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142958.66666666666, ans=0.1 2024-09-14 14:51:39,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=142958.66666666666, ans=0.125 2024-09-14 14:52:00,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=143015.33333333334, ans=0.2 2024-09-14 14:52:12,178 INFO [train.py:1198] (0/2) Epoch 8, batch 5750, loss[loss=0.2502, ctc_loss=0.1798, cr_loss=0.3523, over 20968.00 frames. ], tot_loss[loss=0.2792, ctc_loss=0.1984, cr_loss=0.4039, over 4090357.06 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 32.0 2024-09-14 14:52:37,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2024-09-14 14:52:44,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=143100.33333333334, ans=0.07 2024-09-14 14:52:45,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143100.33333333334, ans=0.1 2024-09-14 14:53:15,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=143157.0, ans=0.0 2024-09-14 14:53:25,572 INFO [train.py:1198] (0/2) Epoch 8, batch 5800, loss[loss=0.2461, ctc_loss=0.17, cr_loss=0.3806, over 21051.00 frames. ], tot_loss[loss=0.2782, ctc_loss=0.1976, cr_loss=0.403, over 4095305.16 frames. ], batch size: 53, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 14:53:27,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=143185.33333333334, ans=0.125 2024-09-14 14:53:43,141 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.048e+02 2.178e+02 2.439e+02 3.789e+02, threshold=4.356e+02, percent-clipped=0.0 2024-09-14 14:53:55,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=143242.0, ans=0.125 2024-09-14 14:54:39,507 INFO [train.py:1198] (0/2) Epoch 8, batch 5850, loss[loss=0.2729, ctc_loss=0.1894, cr_loss=0.4176, over 20963.00 frames. ], tot_loss[loss=0.2799, ctc_loss=0.1989, cr_loss=0.4051, over 4099532.51 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 14:55:23,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=143412.0, ans=6.0 2024-09-14 14:55:55,940 INFO [train.py:1198] (0/2) Epoch 8, batch 5900, loss[loss=0.2752, ctc_loss=0.1932, cr_loss=0.4099, over 21053.00 frames. ], tot_loss[loss=0.2798, ctc_loss=0.1986, cr_loss=0.4059, over 4115014.20 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 14:56:13,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.113e+02 2.342e+02 2.739e+02 4.298e+02, threshold=4.684e+02, percent-clipped=0.0 2024-09-14 14:56:31,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=143525.33333333334, ans=0.09899494936611666 2024-09-14 14:56:39,277 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 14:57:10,304 INFO [train.py:1198] (0/2) Epoch 8, batch 5950, loss[loss=0.2333, ctc_loss=0.1616, cr_loss=0.3582, over 20960.00 frames. ], tot_loss[loss=0.2783, ctc_loss=0.1975, cr_loss=0.4042, over 4111762.59 frames. ], batch size: 49, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 14:57:18,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=143610.33333333334, ans=0.2 2024-09-14 14:57:33,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2024-09-14 14:58:04,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=143695.33333333334, ans=0.125 2024-09-14 14:58:26,697 INFO [train.py:1198] (0/2) Epoch 8, batch 6000, loss[loss=0.2321, ctc_loss=0.1592, cr_loss=0.3644, over 20938.00 frames. ], tot_loss[loss=0.2789, ctc_loss=0.1979, cr_loss=0.4049, over 4107986.03 frames. ], batch size: 49, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 14:58:26,698 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 14:58:52,714 INFO [train.py:1230] (0/2) Epoch 8, validation: loss=0.05749, ctc_loss=0.05749, cr_loss=9.447e-15, over 944034.00 frames. 2024-09-14 14:58:52,715 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 14:59:10,405 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.175e+02 2.381e+02 2.748e+02 4.326e+02, threshold=4.762e+02, percent-clipped=0.0 2024-09-14 14:59:41,880 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-14 14:59:47,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=143837.0, ans=0.2 2024-09-14 15:00:01,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=143865.33333333334, ans=0.2 2024-09-14 15:00:07,238 INFO [train.py:1198] (0/2) Epoch 8, batch 6050, loss[loss=0.2386, ctc_loss=0.1663, cr_loss=0.3618, over 19510.00 frames. ], tot_loss[loss=0.2786, ctc_loss=0.1978, cr_loss=0.4042, over 4093318.03 frames. ], batch size: 43, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:00:28,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=143922.0, ans=0.025 2024-09-14 15:01:17,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=144007.0, ans=0.2 2024-09-14 15:01:21,782 INFO [train.py:1198] (0/2) Epoch 8, batch 6100, loss[loss=0.2797, ctc_loss=0.1978, cr_loss=0.4094, over 20850.00 frames. ], tot_loss[loss=0.2778, ctc_loss=0.1971, cr_loss=0.404, over 4098414.94 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:01:39,508 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.110e+02 2.250e+02 2.560e+02 3.709e+02, threshold=4.501e+02, percent-clipped=0.0 2024-09-14 15:01:55,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=144092.0, ans=0.2 2024-09-14 15:02:26,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=144148.66666666666, ans=0.025 2024-09-14 15:02:28,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=144148.66666666666, ans=0.0 2024-09-14 15:02:35,233 INFO [train.py:1198] (0/2) Epoch 8, batch 6150, loss[loss=0.2391, ctc_loss=0.1668, cr_loss=0.3612, over 20950.00 frames. ], tot_loss[loss=0.2788, ctc_loss=0.1979, cr_loss=0.4045, over 4084586.96 frames. ], batch size: 48, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:02:45,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=144177.0, ans=0.125 2024-09-14 15:03:14,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2024-09-14 15:03:48,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-14 15:03:50,244 INFO [train.py:1198] (0/2) Epoch 8, batch 6200, loss[loss=0.2527, ctc_loss=0.1741, cr_loss=0.3929, over 21053.00 frames. ], tot_loss[loss=0.2779, ctc_loss=0.1972, cr_loss=0.4034, over 4074174.19 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:03:50,951 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2024-09-14 15:04:08,071 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.046e+02 2.184e+02 2.414e+02 3.936e+02, threshold=4.368e+02, percent-clipped=0.0 2024-09-14 15:05:03,862 INFO [train.py:1198] (0/2) Epoch 8, batch 6250, loss[loss=0.2833, ctc_loss=0.1994, cr_loss=0.4194, over 21037.00 frames. ], tot_loss[loss=0.2758, ctc_loss=0.1957, cr_loss=0.4005, over 4063674.83 frames. ], batch size: 63, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:05:11,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=144460.33333333334, ans=0.125 2024-09-14 15:05:41,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=144517.0, ans=0.0 2024-09-14 15:05:57,304 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=22.5 2024-09-14 15:06:17,045 INFO [train.py:1198] (0/2) Epoch 8, batch 6300, loss[loss=0.274, ctc_loss=0.2003, cr_loss=0.3686, over 21049.00 frames. ], tot_loss[loss=0.2779, ctc_loss=0.1975, cr_loss=0.4018, over 4048558.98 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:06:32,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2024-09-14 15:06:34,363 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.136e+02 2.374e+02 2.738e+02 4.102e+02, threshold=4.749e+02, percent-clipped=0.0 2024-09-14 15:07:02,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=144687.0, ans=0.0 2024-09-14 15:07:06,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=144687.0, ans=10.0 2024-09-14 15:07:12,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=15.0 2024-09-14 15:07:17,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=144715.33333333334, ans=0.0 2024-09-14 15:07:18,145 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2024-09-14 15:07:21,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=22.5 2024-09-14 15:07:28,678 INFO [train.py:1198] (0/2) Epoch 8, batch 6350, loss[loss=0.3448, ctc_loss=0.2637, cr_loss=0.4052, over 14378.00 frames. ], tot_loss[loss=0.2862, ctc_loss=0.2052, cr_loss=0.4049, over 3860240.00 frames. ], batch size: 149, lr: 1.02e-02, grad_scale: 32.0 2024-09-14 15:07:59,325 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-09-14 15:08:26,945 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-8.pt 2024-09-14 15:09:14,796 INFO [train.py:1198] (0/2) Epoch 9, batch 0, loss[loss=0.3086, ctc_loss=0.2222, cr_loss=0.432, over 20672.00 frames. ], tot_loss[loss=0.3086, ctc_loss=0.2222, cr_loss=0.432, over 20672.00 frames. ], batch size: 66, lr: 9.65e-03, grad_scale: 32.0 2024-09-14 15:09:14,797 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 15:09:33,292 INFO [train.py:1230] (0/2) Epoch 9, validation: loss=0.05921, ctc_loss=0.05921, cr_loss=9.606e-15, over 944034.00 frames. 2024-09-14 15:09:33,292 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 15:09:54,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=144888.16666666666, ans=0.125 2024-09-14 15:10:00,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=144888.16666666666, ans=0.07 2024-09-14 15:10:04,771 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.383e+02 2.585e+02 2.854e+02 4.318e+02, threshold=5.171e+02, percent-clipped=0.0 2024-09-14 15:10:14,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=144916.5, ans=0.125 2024-09-14 15:10:17,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=144944.83333333334, ans=0.125 2024-09-14 15:10:36,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=144973.16666666666, ans=0.035 2024-09-14 15:10:51,206 INFO [train.py:1198] (0/2) Epoch 9, batch 50, loss[loss=0.2653, ctc_loss=0.1851, cr_loss=0.401, over 20979.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.1993, cr_loss=0.4061, over 928294.94 frames. ], batch size: 58, lr: 9.64e-03, grad_scale: 32.0 2024-09-14 15:11:08,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145029.83333333334, ans=0.1 2024-09-14 15:11:35,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145086.5, ans=0.1 2024-09-14 15:11:40,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=145086.5, ans=0.125 2024-09-14 15:11:52,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=145114.83333333334, ans=0.125 2024-09-14 15:12:06,013 INFO [train.py:1198] (0/2) Epoch 9, batch 100, loss[loss=0.3126, ctc_loss=0.226, cr_loss=0.4332, over 20975.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1992, cr_loss=0.4061, over 1625633.84 frames. ], batch size: 64, lr: 9.64e-03, grad_scale: 32.0 2024-09-14 15:12:08,167 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-09-14 15:12:36,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=145199.83333333334, ans=0.5 2024-09-14 15:12:37,459 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.709e+02 2.003e+02 2.113e+02 2.321e+02 3.266e+02, threshold=4.227e+02, percent-clipped=0.0 2024-09-14 15:12:51,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=145199.83333333334, ans=0.0 2024-09-14 15:12:57,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145228.16666666666, ans=0.1 2024-09-14 15:13:20,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=145256.5, ans=0.2 2024-09-14 15:13:24,330 INFO [train.py:1198] (0/2) Epoch 9, batch 150, loss[loss=0.2878, ctc_loss=0.2061, cr_loss=0.4087, over 21008.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1985, cr_loss=0.4051, over 2168325.40 frames. ], batch size: 63, lr: 9.63e-03, grad_scale: 32.0 2024-09-14 15:13:35,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145284.83333333334, ans=0.1 2024-09-14 15:13:45,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2024-09-14 15:14:17,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2024-09-14 15:14:39,564 INFO [train.py:1198] (0/2) Epoch 9, batch 200, loss[loss=0.2509, ctc_loss=0.1776, cr_loss=0.3666, over 20960.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1951, cr_loss=0.4012, over 2613502.00 frames. ], batch size: 50, lr: 9.63e-03, grad_scale: 32.0 2024-09-14 15:14:53,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=145454.83333333334, ans=0.2 2024-09-14 15:15:11,410 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.054e+02 2.202e+02 2.486e+02 5.208e+02, threshold=4.404e+02, percent-clipped=1.0 2024-09-14 15:15:34,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=145511.5, ans=0.0 2024-09-14 15:15:39,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=145539.83333333334, ans=0.0 2024-09-14 15:15:55,448 INFO [train.py:1198] (0/2) Epoch 9, batch 250, loss[loss=0.2549, ctc_loss=0.1797, cr_loss=0.3763, over 20934.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.195, cr_loss=0.4012, over 2940968.57 frames. ], batch size: 51, lr: 9.62e-03, grad_scale: 32.0 2024-09-14 15:16:07,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=145568.16666666666, ans=0.125 2024-09-14 15:16:18,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145596.5, ans=0.1 2024-09-14 15:16:24,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=145596.5, ans=0.125 2024-09-14 15:16:39,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=145624.83333333334, ans=0.0 2024-09-14 15:16:42,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=145653.16666666666, ans=0.125 2024-09-14 15:16:47,095 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-09-14 15:17:03,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=145681.5, ans=0.0 2024-09-14 15:17:06,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=145681.5, ans=0.125 2024-09-14 15:17:08,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=145681.5, ans=0.2 2024-09-14 15:17:14,112 INFO [train.py:1198] (0/2) Epoch 9, batch 300, loss[loss=0.2708, ctc_loss=0.1901, cr_loss=0.4035, over 20765.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1947, cr_loss=0.4004, over 3205807.00 frames. ], batch size: 53, lr: 9.62e-03, grad_scale: 64.0 2024-09-14 15:17:22,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2024-09-14 15:17:23,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=145709.83333333334, ans=0.125 2024-09-14 15:17:35,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=145738.16666666666, ans=0.125 2024-09-14 15:17:44,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.42 vs. limit=15.0 2024-09-14 15:17:45,565 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.047e+02 2.187e+02 2.461e+02 3.338e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-14 15:17:49,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-09-14 15:18:03,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145794.83333333334, ans=0.1 2024-09-14 15:18:16,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145823.16666666666, ans=0.1 2024-09-14 15:18:17,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=145823.16666666666, ans=0.0 2024-09-14 15:18:22,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.72 vs. limit=12.0 2024-09-14 15:18:26,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=145823.16666666666, ans=0.125 2024-09-14 15:18:26,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=145823.16666666666, ans=0.125 2024-09-14 15:18:28,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=145851.5, ans=0.2 2024-09-14 15:18:29,311 INFO [train.py:1198] (0/2) Epoch 9, batch 350, loss[loss=0.2763, ctc_loss=0.1963, cr_loss=0.4002, over 20768.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1945, cr_loss=0.401, over 3412150.70 frames. ], batch size: 56, lr: 9.62e-03, grad_scale: 64.0 2024-09-14 15:18:47,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145879.83333333334, ans=0.1 2024-09-14 15:19:10,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=145908.16666666666, ans=0.125 2024-09-14 15:19:30,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=145964.83333333334, ans=0.125 2024-09-14 15:19:46,961 INFO [train.py:1198] (0/2) Epoch 9, batch 400, loss[loss=0.2918, ctc_loss=0.2079, cr_loss=0.4195, over 20869.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.195, cr_loss=0.4014, over 3552592.53 frames. ], batch size: 65, lr: 9.61e-03, grad_scale: 64.0 2024-09-14 15:20:02,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=146021.5, ans=0.0 2024-09-14 15:20:20,200 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.122e+02 2.280e+02 2.528e+02 3.565e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-14 15:20:35,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=146078.16666666666, ans=0.125 2024-09-14 15:20:56,822 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2024-09-14 15:20:57,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=146106.5, ans=0.125 2024-09-14 15:21:01,978 INFO [train.py:1198] (0/2) Epoch 9, batch 450, loss[loss=0.2748, ctc_loss=0.194, cr_loss=0.4039, over 21048.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1945, cr_loss=0.4017, over 3682201.39 frames. ], batch size: 61, lr: 9.61e-03, grad_scale: 32.0 2024-09-14 15:21:35,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=146191.5, ans=0.0 2024-09-14 15:21:36,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=146191.5, ans=0.125 2024-09-14 15:22:20,018 INFO [train.py:1198] (0/2) Epoch 9, batch 500, loss[loss=0.2521, ctc_loss=0.1768, cr_loss=0.3768, over 20772.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1947, cr_loss=0.4016, over 3773620.49 frames. ], batch size: 56, lr: 9.60e-03, grad_scale: 16.0 2024-09-14 15:22:29,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=146276.5, ans=0.2 2024-09-14 15:22:42,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=146304.83333333334, ans=0.125 2024-09-14 15:22:45,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=146304.83333333334, ans=0.0 2024-09-14 15:22:47,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=146304.83333333334, ans=0.0 2024-09-14 15:22:54,600 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.700e+02 2.000e+02 2.260e+02 2.487e+02 3.645e+02, threshold=4.520e+02, percent-clipped=0.0 2024-09-14 15:23:35,113 INFO [train.py:1198] (0/2) Epoch 9, batch 550, loss[loss=0.2569, ctc_loss=0.1763, cr_loss=0.4034, over 20947.00 frames. ], tot_loss[loss=0.276, ctc_loss=0.1955, cr_loss=0.4026, over 3850419.04 frames. ], batch size: 49, lr: 9.60e-03, grad_scale: 16.0 2024-09-14 15:23:51,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=146446.5, ans=0.2 2024-09-14 15:23:53,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=146446.5, ans=0.0 2024-09-14 15:24:37,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2024-09-14 15:24:52,696 INFO [train.py:1198] (0/2) Epoch 9, batch 600, loss[loss=0.2707, ctc_loss=0.1888, cr_loss=0.4093, over 20799.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1947, cr_loss=0.4017, over 3904536.33 frames. ], batch size: 56, lr: 9.59e-03, grad_scale: 16.0 2024-09-14 15:24:59,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=146559.83333333334, ans=0.125 2024-09-14 15:25:17,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-09-14 15:25:19,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146588.16666666666, ans=0.1 2024-09-14 15:25:26,992 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.084e+02 2.253e+02 2.578e+02 7.002e+02, threshold=4.507e+02, percent-clipped=1.0 2024-09-14 15:25:33,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=146616.5, ans=0.125 2024-09-14 15:26:04,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2024-09-14 15:26:07,747 INFO [train.py:1198] (0/2) Epoch 9, batch 650, loss[loss=0.2619, ctc_loss=0.1809, cr_loss=0.4046, over 21037.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1949, cr_loss=0.4019, over 3941478.73 frames. ], batch size: 56, lr: 9.59e-03, grad_scale: 16.0 2024-09-14 15:26:50,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=146758.16666666666, ans=0.1 2024-09-14 15:27:23,150 INFO [train.py:1198] (0/2) Epoch 9, batch 700, loss[loss=0.3091, ctc_loss=0.2264, cr_loss=0.4135, over 20994.00 frames. ], tot_loss[loss=0.2759, ctc_loss=0.1955, cr_loss=0.4023, over 3978594.31 frames. ], batch size: 61, lr: 9.58e-03, grad_scale: 16.0 2024-09-14 15:27:49,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=146871.5, ans=0.125 2024-09-14 15:28:00,573 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.100e+02 2.307e+02 2.630e+02 5.153e+02, threshold=4.614e+02, percent-clipped=1.0 2024-09-14 15:28:35,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=146956.5, ans=0.5 2024-09-14 15:28:40,995 INFO [train.py:1198] (0/2) Epoch 9, batch 750, loss[loss=0.27, ctc_loss=0.1906, cr_loss=0.3974, over 20983.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1948, cr_loss=0.4008, over 4011453.90 frames. ], batch size: 58, lr: 9.58e-03, grad_scale: 16.0 2024-09-14 15:29:07,479 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2024-09-14 15:29:39,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=147098.16666666666, ans=0.0 2024-09-14 15:29:43,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=147098.16666666666, ans=0.07 2024-09-14 15:29:59,099 INFO [train.py:1198] (0/2) Epoch 9, batch 800, loss[loss=0.2358, ctc_loss=0.166, cr_loss=0.3489, over 20772.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1946, cr_loss=0.4008, over 4031030.60 frames. ], batch size: 53, lr: 9.57e-03, grad_scale: 32.0 2024-09-14 15:30:07,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=147126.5, ans=0.1 2024-09-14 15:30:29,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=147183.16666666666, ans=0.125 2024-09-14 15:30:33,722 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.091e+02 2.348e+02 2.533e+02 4.670e+02, threshold=4.696e+02, percent-clipped=1.0 2024-09-14 15:30:46,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=147211.5, ans=0.125 2024-09-14 15:31:07,458 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-09-14 15:31:13,982 INFO [train.py:1198] (0/2) Epoch 9, batch 850, loss[loss=0.2937, ctc_loss=0.2063, cr_loss=0.4372, over 21077.00 frames. ], tot_loss[loss=0.2765, ctc_loss=0.1959, cr_loss=0.4027, over 4037296.17 frames. ], batch size: 59, lr: 9.57e-03, grad_scale: 32.0 2024-09-14 15:31:46,949 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-52000.pt 2024-09-14 15:31:54,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=15.0 2024-09-14 15:32:22,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=147381.5, ans=10.0 2024-09-14 15:32:29,684 INFO [train.py:1198] (0/2) Epoch 9, batch 900, loss[loss=0.2772, ctc_loss=0.1937, cr_loss=0.4173, over 20955.00 frames. ], tot_loss[loss=0.2766, ctc_loss=0.196, cr_loss=0.403, over 4039246.71 frames. ], batch size: 58, lr: 9.57e-03, grad_scale: 32.0 2024-09-14 15:32:37,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=147409.83333333334, ans=0.0 2024-09-14 15:32:40,980 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 15:33:04,508 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.072e+02 2.211e+02 2.432e+02 3.229e+02, threshold=4.421e+02, percent-clipped=0.0 2024-09-14 15:33:49,179 INFO [train.py:1198] (0/2) Epoch 9, batch 950, loss[loss=0.2563, ctc_loss=0.1806, cr_loss=0.3783, over 21046.00 frames. ], tot_loss[loss=0.2769, ctc_loss=0.1962, cr_loss=0.4038, over 4052856.51 frames. ], batch size: 56, lr: 9.56e-03, grad_scale: 32.0 2024-09-14 15:33:58,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=147551.5, ans=0.1 2024-09-14 15:34:01,880 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=22.5 2024-09-14 15:35:03,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=147693.16666666666, ans=0.2 2024-09-14 15:35:04,778 INFO [train.py:1198] (0/2) Epoch 9, batch 1000, loss[loss=0.293, ctc_loss=0.2113, cr_loss=0.4087, over 20660.00 frames. ], tot_loss[loss=0.2762, ctc_loss=0.1956, cr_loss=0.4026, over 4058914.42 frames. ], batch size: 68, lr: 9.56e-03, grad_scale: 32.0 2024-09-14 15:35:10,015 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.21 vs. limit=22.5 2024-09-14 15:35:27,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=147721.5, ans=0.025 2024-09-14 15:35:42,582 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.137e+02 2.357e+02 2.706e+02 4.826e+02, threshold=4.713e+02, percent-clipped=1.0 2024-09-14 15:36:23,183 INFO [train.py:1198] (0/2) Epoch 9, batch 1050, loss[loss=0.2474, ctc_loss=0.1722, cr_loss=0.3764, over 20978.00 frames. ], tot_loss[loss=0.2764, ctc_loss=0.1957, cr_loss=0.4032, over 4073146.80 frames. ], batch size: 55, lr: 9.55e-03, grad_scale: 32.0 2024-09-14 15:36:27,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=147834.83333333334, ans=0.05 2024-09-14 15:37:38,773 INFO [train.py:1198] (0/2) Epoch 9, batch 1100, loss[loss=0.2364, ctc_loss=0.1633, cr_loss=0.3657, over 20950.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1949, cr_loss=0.4021, over 4078980.78 frames. ], batch size: 49, lr: 9.55e-03, grad_scale: 16.0 2024-09-14 15:37:54,735 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=22.5 2024-09-14 15:38:03,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=148004.83333333334, ans=0.125 2024-09-14 15:38:05,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=148004.83333333334, ans=0.125 2024-09-14 15:38:12,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=148033.16666666666, ans=0.04949747468305833 2024-09-14 15:38:15,093 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.049e+02 2.233e+02 2.569e+02 3.716e+02, threshold=4.467e+02, percent-clipped=0.0 2024-09-14 15:38:18,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=148033.16666666666, ans=0.125 2024-09-14 15:38:32,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.36 vs. limit=12.0 2024-09-14 15:38:33,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=148061.5, ans=0.2 2024-09-14 15:38:37,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=148089.83333333334, ans=0.125 2024-09-14 15:38:45,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.55 vs. limit=10.0 2024-09-14 15:38:53,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=148118.16666666666, ans=0.1 2024-09-14 15:38:54,262 INFO [train.py:1198] (0/2) Epoch 9, batch 1150, loss[loss=0.2768, ctc_loss=0.1935, cr_loss=0.4166, over 21055.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1943, cr_loss=0.4009, over 4082703.83 frames. ], batch size: 59, lr: 9.54e-03, grad_scale: 16.0 2024-09-14 15:39:41,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=148203.16666666666, ans=0.0 2024-09-14 15:39:53,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=148203.16666666666, ans=0.015 2024-09-14 15:39:59,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.62 vs. limit=10.0 2024-09-14 15:40:09,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=148231.5, ans=0.125 2024-09-14 15:40:12,488 INFO [train.py:1198] (0/2) Epoch 9, batch 1200, loss[loss=0.2615, ctc_loss=0.1827, cr_loss=0.3938, over 20976.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.195, cr_loss=0.4019, over 4076522.46 frames. ], batch size: 52, lr: 9.54e-03, grad_scale: 32.0 2024-09-14 15:40:26,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=148288.16666666666, ans=0.2 2024-09-14 15:40:37,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=148288.16666666666, ans=0.125 2024-09-14 15:40:47,762 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.084e+02 2.241e+02 2.597e+02 3.993e+02, threshold=4.481e+02, percent-clipped=0.0 2024-09-14 15:40:48,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=148316.5, ans=0.125 2024-09-14 15:40:54,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-09-14 15:41:03,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=148344.83333333334, ans=0.0 2024-09-14 15:41:26,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=148373.16666666666, ans=0.2 2024-09-14 15:41:29,692 INFO [train.py:1198] (0/2) Epoch 9, batch 1250, loss[loss=0.2966, ctc_loss=0.2115, cr_loss=0.4256, over 20999.00 frames. ], tot_loss[loss=0.2766, ctc_loss=0.196, cr_loss=0.4031, over 4075105.28 frames. ], batch size: 64, lr: 9.53e-03, grad_scale: 32.0 2024-09-14 15:41:34,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-14 15:41:39,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=148401.5, ans=0.125 2024-09-14 15:41:42,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=148401.5, ans=0.0 2024-09-14 15:41:50,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=148429.83333333334, ans=0.025 2024-09-14 15:41:57,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=148429.83333333334, ans=0.0 2024-09-14 15:42:20,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=148486.5, ans=0.125 2024-09-14 15:42:22,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=148486.5, ans=0.0 2024-09-14 15:42:45,669 INFO [train.py:1198] (0/2) Epoch 9, batch 1300, loss[loss=0.2771, ctc_loss=0.195, cr_loss=0.4103, over 20844.00 frames. ], tot_loss[loss=0.2769, ctc_loss=0.1965, cr_loss=0.4022, over 4063878.92 frames. ], batch size: 65, lr: 9.53e-03, grad_scale: 32.0 2024-09-14 15:42:59,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=148571.5, ans=0.125 2024-09-14 15:43:21,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.89 vs. limit=15.0 2024-09-14 15:43:21,735 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.097e+02 2.224e+02 2.451e+02 7.655e+02, threshold=4.447e+02, percent-clipped=2.0 2024-09-14 15:43:25,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=22.5 2024-09-14 15:43:35,962 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=12.0 2024-09-14 15:43:38,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=148628.16666666666, ans=0.125 2024-09-14 15:43:41,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=148628.16666666666, ans=0.125 2024-09-14 15:43:48,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=148656.5, ans=0.2 2024-09-14 15:44:00,915 INFO [train.py:1198] (0/2) Epoch 9, batch 1350, loss[loss=0.2935, ctc_loss=0.2077, cr_loss=0.4289, over 21007.00 frames. ], tot_loss[loss=0.2782, ctc_loss=0.1975, cr_loss=0.4035, over 4063904.52 frames. ], batch size: 63, lr: 9.53e-03, grad_scale: 32.0 2024-09-14 15:44:18,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-09-14 15:44:23,141 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-14 15:44:29,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2024-09-14 15:44:52,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-14 15:45:00,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=148769.83333333334, ans=0.0 2024-09-14 15:45:19,248 INFO [train.py:1198] (0/2) Epoch 9, batch 1400, loss[loss=0.2765, ctc_loss=0.199, cr_loss=0.3872, over 20955.00 frames. ], tot_loss[loss=0.2775, ctc_loss=0.1968, cr_loss=0.4035, over 4082215.59 frames. ], batch size: 64, lr: 9.52e-03, grad_scale: 32.0 2024-09-14 15:45:25,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=148826.5, ans=0.125 2024-09-14 15:45:44,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=148854.83333333334, ans=0.04949747468305833 2024-09-14 15:45:56,662 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 2.104e+02 2.225e+02 2.434e+02 3.450e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-14 15:45:57,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=148883.16666666666, ans=0.09899494936611666 2024-09-14 15:46:01,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=148883.16666666666, ans=0.04949747468305833 2024-09-14 15:46:16,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=148911.5, ans=0.125 2024-09-14 15:46:34,300 INFO [train.py:1198] (0/2) Epoch 9, batch 1450, loss[loss=0.2779, ctc_loss=0.1965, cr_loss=0.4069, over 20939.00 frames. ], tot_loss[loss=0.2783, ctc_loss=0.1972, cr_loss=0.4051, over 4086650.30 frames. ], batch size: 60, lr: 9.52e-03, grad_scale: 16.0 2024-09-14 15:46:54,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=148996.5, ans=0.0 2024-09-14 15:46:59,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2024-09-14 15:47:10,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=149024.83333333334, ans=0.0 2024-09-14 15:47:40,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=149081.5, ans=0.0 2024-09-14 15:47:53,995 INFO [train.py:1198] (0/2) Epoch 9, batch 1500, loss[loss=0.303, ctc_loss=0.2159, cr_loss=0.4354, over 20684.00 frames. ], tot_loss[loss=0.2775, ctc_loss=0.1966, cr_loss=0.4043, over 4077124.97 frames. ], batch size: 68, lr: 9.51e-03, grad_scale: 16.0 2024-09-14 15:48:06,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=149109.83333333334, ans=0.2 2024-09-14 15:48:13,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=149138.16666666666, ans=0.2 2024-09-14 15:48:19,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-09-14 15:48:31,299 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.038e+02 2.322e+02 2.723e+02 3.987e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-14 15:48:47,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=149194.83333333334, ans=0.0 2024-09-14 15:49:08,972 INFO [train.py:1198] (0/2) Epoch 9, batch 1550, loss[loss=0.3017, ctc_loss=0.2166, cr_loss=0.4257, over 19529.00 frames. ], tot_loss[loss=0.2776, ctc_loss=0.1967, cr_loss=0.4045, over 4077033.66 frames. ], batch size: 90, lr: 9.51e-03, grad_scale: 16.0 2024-09-14 15:49:18,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=149251.5, ans=0.2 2024-09-14 15:49:40,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2024-09-14 15:50:25,014 INFO [train.py:1198] (0/2) Epoch 9, batch 1600, loss[loss=0.2801, ctc_loss=0.1996, cr_loss=0.4027, over 20987.00 frames. ], tot_loss[loss=0.2775, ctc_loss=0.1966, cr_loss=0.4048, over 4086411.42 frames. ], batch size: 58, lr: 9.50e-03, grad_scale: 32.0 2024-09-14 15:50:51,283 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.88 vs. limit=22.5 2024-09-14 15:51:05,463 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.705e+02 2.144e+02 2.333e+02 2.616e+02 4.738e+02, threshold=4.667e+02, percent-clipped=1.0 2024-09-14 15:51:20,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=149478.16666666666, ans=0.0 2024-09-14 15:51:22,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=149478.16666666666, ans=0.125 2024-09-14 15:51:28,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=149506.5, ans=0.125 2024-09-14 15:51:29,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=149506.5, ans=0.125 2024-09-14 15:51:42,850 INFO [train.py:1198] (0/2) Epoch 9, batch 1650, loss[loss=0.2999, ctc_loss=0.2186, cr_loss=0.4065, over 19341.00 frames. ], tot_loss[loss=0.2765, ctc_loss=0.1958, cr_loss=0.4034, over 4087092.96 frames. ], batch size: 90, lr: 9.50e-03, grad_scale: 32.0 2024-09-14 15:51:51,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.04 vs. limit=15.0 2024-09-14 15:52:45,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=149648.16666666666, ans=0.125 2024-09-14 15:53:00,650 INFO [train.py:1198] (0/2) Epoch 9, batch 1700, loss[loss=0.2376, ctc_loss=0.161, cr_loss=0.3826, over 20973.00 frames. ], tot_loss[loss=0.276, ctc_loss=0.1954, cr_loss=0.4032, over 4102424.36 frames. ], batch size: 49, lr: 9.49e-03, grad_scale: 32.0 2024-09-14 15:53:22,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=149704.83333333334, ans=0.0 2024-09-14 15:53:38,813 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.049e+02 2.258e+02 2.531e+02 7.471e+02, threshold=4.516e+02, percent-clipped=2.0 2024-09-14 15:54:16,871 INFO [train.py:1198] (0/2) Epoch 9, batch 1750, loss[loss=0.3063, ctc_loss=0.2116, cr_loss=0.4737, over 20649.00 frames. ], tot_loss[loss=0.2757, ctc_loss=0.1953, cr_loss=0.4021, over 4095182.47 frames. ], batch size: 66, lr: 9.49e-03, grad_scale: 32.0 2024-09-14 15:54:48,838 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 15:54:56,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=149874.83333333334, ans=0.0 2024-09-14 15:55:09,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=149903.16666666666, ans=0.125 2024-09-14 15:55:18,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=149931.5, ans=0.025 2024-09-14 15:55:27,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=149931.5, ans=0.0 2024-09-14 15:55:31,814 INFO [train.py:1198] (0/2) Epoch 9, batch 1800, loss[loss=0.2583, ctc_loss=0.1815, cr_loss=0.3839, over 21067.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1949, cr_loss=0.4019, over 4098754.09 frames. ], batch size: 56, lr: 9.49e-03, grad_scale: 32.0 2024-09-14 15:56:00,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=150016.5, ans=0.125 2024-09-14 15:56:00,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150016.5, ans=0.1 2024-09-14 15:56:05,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-09-14 15:56:09,624 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.064e+02 2.286e+02 2.541e+02 4.423e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-14 15:56:37,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-09-14 15:56:50,319 INFO [train.py:1198] (0/2) Epoch 9, batch 1850, loss[loss=0.2075, ctc_loss=0.1419, cr_loss=0.3279, over 20970.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.194, cr_loss=0.401, over 4106426.78 frames. ], batch size: 50, lr: 9.48e-03, grad_scale: 32.0 2024-09-14 15:57:12,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=10.0 2024-09-14 15:57:13,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=150129.83333333334, ans=0.125 2024-09-14 15:57:23,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=150158.16666666666, ans=0.125 2024-09-14 15:58:05,575 INFO [train.py:1198] (0/2) Epoch 9, batch 1900, loss[loss=0.2954, ctc_loss=0.2113, cr_loss=0.4205, over 21023.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1948, cr_loss=0.4023, over 4106953.28 frames. ], batch size: 63, lr: 9.48e-03, grad_scale: 32.0 2024-09-14 15:58:26,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=150271.5, ans=0.125 2024-09-14 15:58:45,912 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.091e+02 2.239e+02 2.372e+02 4.186e+02, threshold=4.479e+02, percent-clipped=0.0 2024-09-14 15:58:54,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-09-14 15:59:23,356 INFO [train.py:1198] (0/2) Epoch 9, batch 1950, loss[loss=0.2981, ctc_loss=0.2123, cr_loss=0.4287, over 21004.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1951, cr_loss=0.4024, over 4106576.93 frames. ], batch size: 61, lr: 9.47e-03, grad_scale: 32.0 2024-09-14 15:59:23,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=150384.83333333334, ans=0.0 2024-09-14 16:00:02,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=150441.5, ans=0.125 2024-09-14 16:00:22,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=150498.16666666666, ans=0.125 2024-09-14 16:00:38,335 INFO [train.py:1198] (0/2) Epoch 9, batch 2000, loss[loss=0.2635, ctc_loss=0.1836, cr_loss=0.3998, over 20875.00 frames. ], tot_loss[loss=0.2751, ctc_loss=0.1945, cr_loss=0.4028, over 4108632.00 frames. ], batch size: 57, lr: 9.47e-03, grad_scale: 32.0 2024-09-14 16:00:50,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=150526.5, ans=0.05 2024-09-14 16:01:07,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=150583.16666666666, ans=0.0 2024-09-14 16:01:15,917 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.064e+02 2.222e+02 2.486e+02 5.486e+02, threshold=4.444e+02, percent-clipped=1.0 2024-09-14 16:01:28,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=150611.5, ans=0.05 2024-09-14 16:01:53,337 INFO [train.py:1198] (0/2) Epoch 9, batch 2050, loss[loss=0.2583, ctc_loss=0.1875, cr_loss=0.3538, over 21017.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1948, cr_loss=0.4027, over 4108628.81 frames. ], batch size: 52, lr: 9.46e-03, grad_scale: 32.0 2024-09-14 16:02:02,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=150668.16666666666, ans=0.0 2024-09-14 16:02:12,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=150696.5, ans=0.0 2024-09-14 16:02:48,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=150753.16666666666, ans=0.125 2024-09-14 16:03:11,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.41 vs. limit=6.0 2024-09-14 16:03:12,146 INFO [train.py:1198] (0/2) Epoch 9, batch 2100, loss[loss=0.2404, ctc_loss=0.1675, cr_loss=0.3643, over 21058.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1942, cr_loss=0.4017, over 4106403.60 frames. ], batch size: 53, lr: 9.46e-03, grad_scale: 32.0 2024-09-14 16:03:15,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=150809.83333333334, ans=0.125 2024-09-14 16:03:18,661 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:03:43,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.20 vs. limit=15.0 2024-09-14 16:03:49,708 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.042e+02 2.207e+02 2.407e+02 4.895e+02, threshold=4.414e+02, percent-clipped=1.0 2024-09-14 16:03:50,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=15.0 2024-09-14 16:04:05,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=150894.83333333334, ans=0.04949747468305833 2024-09-14 16:04:12,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=150923.16666666666, ans=0.04949747468305833 2024-09-14 16:04:14,093 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:04:30,053 INFO [train.py:1198] (0/2) Epoch 9, batch 2150, loss[loss=0.2434, ctc_loss=0.1716, cr_loss=0.3593, over 20963.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1943, cr_loss=0.402, over 4100830.14 frames. ], batch size: 50, lr: 9.45e-03, grad_scale: 32.0 2024-09-14 16:05:45,663 INFO [train.py:1198] (0/2) Epoch 9, batch 2200, loss[loss=0.3, ctc_loss=0.2153, cr_loss=0.4232, over 19986.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1948, cr_loss=0.4025, over 4093483.30 frames. ], batch size: 80, lr: 9.45e-03, grad_scale: 32.0 2024-09-14 16:06:05,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=151121.5, ans=0.0 2024-09-14 16:06:23,437 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.119e+02 2.414e+02 2.810e+02 4.816e+02, threshold=4.829e+02, percent-clipped=1.0 2024-09-14 16:06:39,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=151178.16666666666, ans=0.125 2024-09-14 16:06:42,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=151178.16666666666, ans=0.125 2024-09-14 16:06:45,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=151206.5, ans=0.125 2024-09-14 16:07:01,233 INFO [train.py:1198] (0/2) Epoch 9, batch 2250, loss[loss=0.2532, ctc_loss=0.1768, cr_loss=0.3823, over 21057.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1948, cr_loss=0.4021, over 4096623.49 frames. ], batch size: 56, lr: 9.45e-03, grad_scale: 32.0 2024-09-14 16:07:12,371 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2024-09-14 16:07:13,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=151234.83333333334, ans=0.0 2024-09-14 16:07:39,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=151291.5, ans=0.0 2024-09-14 16:07:45,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-09-14 16:07:48,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=151319.83333333334, ans=0.0 2024-09-14 16:07:54,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=151319.83333333334, ans=0.2 2024-09-14 16:08:04,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=151348.16666666666, ans=0.1 2024-09-14 16:08:16,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=151348.16666666666, ans=0.0 2024-09-14 16:08:18,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=12.0 2024-09-14 16:08:19,544 INFO [train.py:1198] (0/2) Epoch 9, batch 2300, loss[loss=0.3012, ctc_loss=0.2152, cr_loss=0.4298, over 20861.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1944, cr_loss=0.4012, over 4104780.38 frames. ], batch size: 57, lr: 9.44e-03, grad_scale: 32.0 2024-09-14 16:08:57,203 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.084e+02 2.261e+02 2.474e+02 4.816e+02, threshold=4.521e+02, percent-clipped=0.0 2024-09-14 16:08:57,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=151433.16666666666, ans=0.0 2024-09-14 16:09:35,200 INFO [train.py:1198] (0/2) Epoch 9, batch 2350, loss[loss=0.2554, ctc_loss=0.1782, cr_loss=0.386, over 20951.00 frames. ], tot_loss[loss=0.2746, ctc_loss=0.1943, cr_loss=0.4016, over 4093866.24 frames. ], batch size: 50, lr: 9.44e-03, grad_scale: 32.0 2024-09-14 16:09:50,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=151546.5, ans=0.2 2024-09-14 16:10:24,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=151603.16666666666, ans=0.125 2024-09-14 16:10:53,042 INFO [train.py:1198] (0/2) Epoch 9, batch 2400, loss[loss=0.2551, ctc_loss=0.1783, cr_loss=0.3838, over 20975.00 frames. ], tot_loss[loss=0.274, ctc_loss=0.1939, cr_loss=0.4005, over 4085646.41 frames. ], batch size: 48, lr: 9.43e-03, grad_scale: 32.0 2024-09-14 16:10:56,767 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-14 16:11:20,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=151688.16666666666, ans=0.0 2024-09-14 16:11:31,197 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.051e+02 2.214e+02 2.513e+02 3.869e+02, threshold=4.429e+02, percent-clipped=0.0 2024-09-14 16:11:33,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.45 vs. limit=15.0 2024-09-14 16:11:51,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=151744.83333333334, ans=0.125 2024-09-14 16:11:52,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=151773.16666666666, ans=0.125 2024-09-14 16:11:57,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=151773.16666666666, ans=0.125 2024-09-14 16:12:09,101 INFO [train.py:1198] (0/2) Epoch 9, batch 2450, loss[loss=0.2905, ctc_loss=0.2097, cr_loss=0.404, over 20947.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1931, cr_loss=0.4001, over 4097829.84 frames. ], batch size: 64, lr: 9.43e-03, grad_scale: 16.0 2024-09-14 16:12:12,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=151801.5, ans=0.0 2024-09-14 16:12:44,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=151858.16666666666, ans=0.0 2024-09-14 16:12:46,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=151858.16666666666, ans=0.2 2024-09-14 16:12:53,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=151886.5, ans=0.5 2024-09-14 16:12:59,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=151886.5, ans=0.125 2024-09-14 16:13:21,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=151914.83333333334, ans=0.125 2024-09-14 16:13:27,328 INFO [train.py:1198] (0/2) Epoch 9, batch 2500, loss[loss=0.2739, ctc_loss=0.1945, cr_loss=0.3968, over 20871.00 frames. ], tot_loss[loss=0.2732, ctc_loss=0.1932, cr_loss=0.4, over 4084750.40 frames. ], batch size: 54, lr: 9.42e-03, grad_scale: 16.0 2024-09-14 16:13:29,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=151943.16666666666, ans=15.0 2024-09-14 16:14:06,078 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.100e+02 2.274e+02 2.459e+02 4.462e+02, threshold=4.548e+02, percent-clipped=1.0 2024-09-14 16:14:42,354 INFO [train.py:1198] (0/2) Epoch 9, batch 2550, loss[loss=0.2929, ctc_loss=0.2093, cr_loss=0.4181, over 20691.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1938, cr_loss=0.4008, over 4074594.43 frames. ], batch size: 71, lr: 9.42e-03, grad_scale: 16.0 2024-09-14 16:15:01,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-14 16:15:17,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=152141.5, ans=0.2 2024-09-14 16:15:26,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=152169.83333333334, ans=0.0 2024-09-14 16:15:56,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=152198.16666666666, ans=0.0 2024-09-14 16:16:01,115 INFO [train.py:1198] (0/2) Epoch 9, batch 2600, loss[loss=0.2684, ctc_loss=0.1883, cr_loss=0.4003, over 20877.00 frames. ], tot_loss[loss=0.2734, ctc_loss=0.1934, cr_loss=0.4, over 4083135.00 frames. ], batch size: 57, lr: 9.42e-03, grad_scale: 16.0 2024-09-14 16:16:39,171 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:16:40,243 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.052e+02 2.227e+02 2.453e+02 4.284e+02, threshold=4.454e+02, percent-clipped=0.0 2024-09-14 16:16:40,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=152283.16666666666, ans=0.125 2024-09-14 16:16:40,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=152283.16666666666, ans=0.05 2024-09-14 16:16:51,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=152311.5, ans=0.0 2024-09-14 16:17:16,058 INFO [train.py:1198] (0/2) Epoch 9, batch 2650, loss[loss=0.2876, ctc_loss=0.1988, cr_loss=0.4439, over 20769.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.1941, cr_loss=0.4008, over 4083464.52 frames. ], batch size: 56, lr: 9.41e-03, grad_scale: 16.0 2024-09-14 16:17:24,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152368.16666666666, ans=0.1 2024-09-14 16:17:48,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=152424.83333333334, ans=0.2 2024-09-14 16:17:51,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2024-09-14 16:18:00,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=152453.16666666666, ans=0.2 2024-09-14 16:18:10,923 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:18:31,587 INFO [train.py:1198] (0/2) Epoch 9, batch 2700, loss[loss=0.2896, ctc_loss=0.2049, cr_loss=0.4235, over 21021.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1945, cr_loss=0.4013, over 4073671.36 frames. ], batch size: 63, lr: 9.41e-03, grad_scale: 16.0 2024-09-14 16:18:39,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=152509.83333333334, ans=0.025 2024-09-14 16:18:45,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=152538.16666666666, ans=0.2 2024-09-14 16:19:13,356 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.173e+02 2.396e+02 2.767e+02 4.710e+02, threshold=4.792e+02, percent-clipped=1.0 2024-09-14 16:19:25,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=152594.83333333334, ans=0.125 2024-09-14 16:19:41,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=152623.16666666666, ans=0.125 2024-09-14 16:19:49,212 INFO [train.py:1198] (0/2) Epoch 9, batch 2750, loss[loss=0.3024, ctc_loss=0.2123, cr_loss=0.4505, over 21065.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1937, cr_loss=0.401, over 4089753.62 frames. ], batch size: 59, lr: 9.40e-03, grad_scale: 16.0 2024-09-14 16:20:20,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=152708.16666666666, ans=0.125 2024-09-14 16:20:34,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=152736.5, ans=0.125 2024-09-14 16:20:54,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=152764.83333333334, ans=0.1 2024-09-14 16:21:04,336 INFO [train.py:1198] (0/2) Epoch 9, batch 2800, loss[loss=0.2785, ctc_loss=0.1978, cr_loss=0.4037, over 21073.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1933, cr_loss=0.401, over 4090995.56 frames. ], batch size: 56, lr: 9.40e-03, grad_scale: 32.0 2024-09-14 16:21:46,824 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.030e+02 2.197e+02 2.409e+02 5.376e+02, threshold=4.393e+02, percent-clipped=1.0 2024-09-14 16:22:17,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=152906.5, ans=0.0 2024-09-14 16:22:23,049 INFO [train.py:1198] (0/2) Epoch 9, batch 2850, loss[loss=0.2733, ctc_loss=0.193, cr_loss=0.4013, over 20837.00 frames. ], tot_loss[loss=0.2734, ctc_loss=0.1934, cr_loss=0.4003, over 4081692.06 frames. ], batch size: 65, lr: 9.39e-03, grad_scale: 32.0 2024-09-14 16:22:31,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152934.83333333334, ans=0.1 2024-09-14 16:22:50,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=152963.16666666666, ans=0.125 2024-09-14 16:22:53,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=152991.5, ans=0.125 2024-09-14 16:22:56,112 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-09-14 16:23:35,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.65 vs. limit=10.0 2024-09-14 16:23:39,069 INFO [train.py:1198] (0/2) Epoch 9, batch 2900, loss[loss=0.3003, ctc_loss=0.2155, cr_loss=0.4238, over 20653.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1934, cr_loss=0.4003, over 4078323.80 frames. ], batch size: 66, lr: 9.39e-03, grad_scale: 32.0 2024-09-14 16:23:49,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=153076.5, ans=0.125 2024-09-14 16:24:16,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=153133.16666666666, ans=0.0 2024-09-14 16:24:18,248 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.031e+02 2.179e+02 2.346e+02 3.418e+02, threshold=4.358e+02, percent-clipped=0.0 2024-09-14 16:24:20,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=153133.16666666666, ans=0.0 2024-09-14 16:24:43,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=153189.83333333334, ans=0.125 2024-09-14 16:24:56,652 INFO [train.py:1198] (0/2) Epoch 9, batch 2950, loss[loss=0.2455, ctc_loss=0.1714, cr_loss=0.3708, over 20953.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1931, cr_loss=0.4, over 4076370.14 frames. ], batch size: 48, lr: 9.39e-03, grad_scale: 32.0 2024-09-14 16:25:01,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=12.0 2024-09-14 16:25:34,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=153274.83333333334, ans=0.125 2024-09-14 16:25:36,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=153274.83333333334, ans=0.0 2024-09-14 16:25:43,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=153303.16666666666, ans=0.125 2024-09-14 16:25:52,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=153303.16666666666, ans=0.0 2024-09-14 16:26:08,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=22.5 2024-09-14 16:26:11,951 INFO [train.py:1198] (0/2) Epoch 9, batch 3000, loss[loss=0.2747, ctc_loss=0.1933, cr_loss=0.4069, over 20878.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1926, cr_loss=0.3993, over 4076695.58 frames. ], batch size: 54, lr: 9.38e-03, grad_scale: 32.0 2024-09-14 16:26:11,952 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 16:26:31,417 INFO [train.py:1230] (0/2) Epoch 9, validation: loss=0.0567, ctc_loss=0.0567, cr_loss=9.377e-15, over 944034.00 frames. 2024-09-14 16:26:31,418 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 16:26:35,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.33 vs. limit=5.0 2024-09-14 16:26:41,718 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2024-09-14 16:26:42,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=153359.83333333334, ans=0.025 2024-09-14 16:27:03,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=153416.5, ans=0.04949747468305833 2024-09-14 16:27:13,388 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.098e+02 2.324e+02 2.738e+02 4.478e+02, threshold=4.647e+02, percent-clipped=1.0 2024-09-14 16:27:26,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=153444.83333333334, ans=0.0 2024-09-14 16:27:32,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=153444.83333333334, ans=0.0 2024-09-14 16:27:50,194 INFO [train.py:1198] (0/2) Epoch 9, batch 3050, loss[loss=0.3091, ctc_loss=0.2226, cr_loss=0.4323, over 18287.00 frames. ], tot_loss[loss=0.2725, ctc_loss=0.1926, cr_loss=0.3998, over 4079306.34 frames. ], batch size: 108, lr: 9.38e-03, grad_scale: 32.0 2024-09-14 16:28:13,156 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-14 16:28:17,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=153529.83333333334, ans=0.0 2024-09-14 16:28:43,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=153586.5, ans=0.0 2024-09-14 16:28:44,197 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-14 16:28:55,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.93 vs. limit=15.0 2024-09-14 16:29:01,975 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:29:05,993 INFO [train.py:1198] (0/2) Epoch 9, batch 3100, loss[loss=0.261, ctc_loss=0.1836, cr_loss=0.387, over 21043.00 frames. ], tot_loss[loss=0.2725, ctc_loss=0.1926, cr_loss=0.3996, over 4085506.40 frames. ], batch size: 56, lr: 9.37e-03, grad_scale: 32.0 2024-09-14 16:29:09,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=153643.16666666666, ans=0.0 2024-09-14 16:29:09,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=153643.16666666666, ans=0.125 2024-09-14 16:29:40,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=153699.83333333334, ans=0.125 2024-09-14 16:29:40,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=153699.83333333334, ans=0.04949747468305833 2024-09-14 16:29:45,023 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.675e+02 2.029e+02 2.142e+02 2.385e+02 4.083e+02, threshold=4.283e+02, percent-clipped=0.0 2024-09-14 16:29:48,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=153699.83333333334, ans=0.2 2024-09-14 16:29:54,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=153728.16666666666, ans=0.025 2024-09-14 16:30:23,333 INFO [train.py:1198] (0/2) Epoch 9, batch 3150, loss[loss=0.2323, ctc_loss=0.159, cr_loss=0.3668, over 20956.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1934, cr_loss=0.4016, over 4066551.72 frames. ], batch size: 48, lr: 9.37e-03, grad_scale: 32.0 2024-09-14 16:30:25,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=153784.83333333334, ans=0.0 2024-09-14 16:30:26,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=153784.83333333334, ans=0.125 2024-09-14 16:30:43,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=153813.16666666666, ans=0.0 2024-09-14 16:30:44,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=153813.16666666666, ans=0.025 2024-09-14 16:30:51,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=22.5 2024-09-14 16:31:04,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=153841.5, ans=0.125 2024-09-14 16:31:07,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=153869.83333333334, ans=0.0 2024-09-14 16:31:11,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=153869.83333333334, ans=0.125 2024-09-14 16:31:15,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=153869.83333333334, ans=0.0 2024-09-14 16:31:19,621 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:31:38,558 INFO [train.py:1198] (0/2) Epoch 9, batch 3200, loss[loss=0.2817, ctc_loss=0.1981, cr_loss=0.4182, over 21012.00 frames. ], tot_loss[loss=0.2721, ctc_loss=0.192, cr_loss=0.4003, over 4077409.98 frames. ], batch size: 63, lr: 9.36e-03, grad_scale: 32.0 2024-09-14 16:31:45,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=153926.5, ans=15.0 2024-09-14 16:32:01,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=153954.83333333334, ans=0.0 2024-09-14 16:32:17,885 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.682e+02 2.060e+02 2.182e+02 2.345e+02 3.448e+02, threshold=4.365e+02, percent-clipped=0.0 2024-09-14 16:32:52,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-09-14 16:32:57,400 INFO [train.py:1198] (0/2) Epoch 9, batch 3250, loss[loss=0.2737, ctc_loss=0.1938, cr_loss=0.3991, over 21002.00 frames. ], tot_loss[loss=0.2718, ctc_loss=0.1919, cr_loss=0.3995, over 4087588.80 frames. ], batch size: 63, lr: 9.36e-03, grad_scale: 16.0 2024-09-14 16:33:29,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154124.83333333334, ans=0.1 2024-09-14 16:33:36,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=154124.83333333334, ans=0.125 2024-09-14 16:34:08,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=154181.5, ans=0.125 2024-09-14 16:34:10,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=154181.5, ans=0.0 2024-09-14 16:34:13,056 INFO [train.py:1198] (0/2) Epoch 9, batch 3300, loss[loss=0.281, ctc_loss=0.1966, cr_loss=0.4219, over 21071.00 frames. ], tot_loss[loss=0.272, ctc_loss=0.192, cr_loss=0.4, over 4095599.22 frames. ], batch size: 59, lr: 9.36e-03, grad_scale: 16.0 2024-09-14 16:34:31,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=154238.16666666666, ans=0.125 2024-09-14 16:34:53,526 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.132e+02 2.385e+02 2.654e+02 6.296e+02, threshold=4.769e+02, percent-clipped=1.0 2024-09-14 16:35:07,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=154294.83333333334, ans=0.125 2024-09-14 16:35:28,294 INFO [train.py:1198] (0/2) Epoch 9, batch 3350, loss[loss=0.2943, ctc_loss=0.208, cr_loss=0.4316, over 21049.00 frames. ], tot_loss[loss=0.2736, ctc_loss=0.1933, cr_loss=0.4015, over 4093387.12 frames. ], batch size: 62, lr: 9.35e-03, grad_scale: 16.0 2024-09-14 16:35:31,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=154351.5, ans=0.0 2024-09-14 16:36:02,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154408.16666666666, ans=0.1 2024-09-14 16:36:41,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=154464.83333333334, ans=0.125 2024-09-14 16:36:43,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-14 16:36:47,141 INFO [train.py:1198] (0/2) Epoch 9, batch 3400, loss[loss=0.2446, ctc_loss=0.1713, cr_loss=0.3665, over 20769.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1936, cr_loss=0.4019, over 4094391.89 frames. ], batch size: 56, lr: 9.35e-03, grad_scale: 16.0 2024-09-14 16:37:03,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=154521.5, ans=0.125 2024-09-14 16:37:04,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=154521.5, ans=0.125 2024-09-14 16:37:19,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=154549.83333333334, ans=0.125 2024-09-14 16:37:20,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154549.83333333334, ans=0.1 2024-09-14 16:37:22,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=154549.83333333334, ans=0.125 2024-09-14 16:37:27,819 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.104e+02 2.223e+02 2.375e+02 5.219e+02, threshold=4.447e+02, percent-clipped=1.0 2024-09-14 16:37:35,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=154578.16666666666, ans=0.0 2024-09-14 16:37:38,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=154578.16666666666, ans=0.025 2024-09-14 16:37:52,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154606.5, ans=0.1 2024-09-14 16:38:02,528 INFO [train.py:1198] (0/2) Epoch 9, batch 3450, loss[loss=0.286, ctc_loss=0.2029, cr_loss=0.4156, over 20827.00 frames. ], tot_loss[loss=0.2728, ctc_loss=0.1926, cr_loss=0.401, over 4096593.70 frames. ], batch size: 59, lr: 9.34e-03, grad_scale: 16.0 2024-09-14 16:38:28,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154663.16666666666, ans=0.1 2024-09-14 16:38:35,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=154691.5, ans=0.125 2024-09-14 16:38:55,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=154719.83333333334, ans=0.125 2024-09-14 16:38:57,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=154719.83333333334, ans=0.125 2024-09-14 16:39:07,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=154748.16666666666, ans=0.125 2024-09-14 16:39:15,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=154748.16666666666, ans=0.1 2024-09-14 16:39:15,886 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:39:21,560 INFO [train.py:1198] (0/2) Epoch 9, batch 3500, loss[loss=0.2677, ctc_loss=0.1873, cr_loss=0.4018, over 20794.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1921, cr_loss=0.4013, over 4106080.72 frames. ], batch size: 56, lr: 9.34e-03, grad_scale: 16.0 2024-09-14 16:39:53,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=154833.16666666666, ans=0.95 2024-09-14 16:40:02,142 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.092e+02 2.306e+02 2.543e+02 5.355e+02, threshold=4.612e+02, percent-clipped=1.0 2024-09-14 16:40:37,020 INFO [train.py:1198] (0/2) Epoch 9, batch 3550, loss[loss=0.292, ctc_loss=0.2065, cr_loss=0.4275, over 20868.00 frames. ], tot_loss[loss=0.2728, ctc_loss=0.1926, cr_loss=0.4011, over 4092774.78 frames. ], batch size: 65, lr: 9.34e-03, grad_scale: 16.0 2024-09-14 16:40:37,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2024-09-14 16:41:03,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=154946.5, ans=0.0 2024-09-14 16:41:12,462 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:41:20,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=12.0 2024-09-14 16:41:55,696 INFO [train.py:1198] (0/2) Epoch 9, batch 3600, loss[loss=0.3083, ctc_loss=0.2178, cr_loss=0.4527, over 20640.00 frames. ], tot_loss[loss=0.272, ctc_loss=0.1919, cr_loss=0.4007, over 4105035.25 frames. ], batch size: 66, lr: 9.33e-03, grad_scale: 32.0 2024-09-14 16:42:17,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=155088.16666666666, ans=0.2 2024-09-14 16:42:17,183 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 16:42:32,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=155116.5, ans=0.025 2024-09-14 16:42:32,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=155116.5, ans=0.125 2024-09-14 16:42:35,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2024-09-14 16:42:38,202 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.137e+02 2.269e+02 2.630e+02 5.003e+02, threshold=4.537e+02, percent-clipped=1.0 2024-09-14 16:42:52,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=155144.83333333334, ans=0.125 2024-09-14 16:42:57,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=155173.16666666666, ans=0.125 2024-09-14 16:42:59,170 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-09-14 16:43:06,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=155173.16666666666, ans=0.125 2024-09-14 16:43:12,304 INFO [train.py:1198] (0/2) Epoch 9, batch 3650, loss[loss=0.3175, ctc_loss=0.2242, cr_loss=0.4667, over 20678.00 frames. ], tot_loss[loss=0.2716, ctc_loss=0.1914, cr_loss=0.401, over 4119725.87 frames. ], batch size: 66, lr: 9.33e-03, grad_scale: 16.0 2024-09-14 16:43:38,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=155229.83333333334, ans=0.025 2024-09-14 16:44:16,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=155314.83333333334, ans=0.125 2024-09-14 16:44:23,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=155314.83333333334, ans=0.125 2024-09-14 16:44:25,663 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2024-09-14 16:44:30,831 INFO [train.py:1198] (0/2) Epoch 9, batch 3700, loss[loss=0.3464, ctc_loss=0.2573, cr_loss=0.4456, over 15130.00 frames. ], tot_loss[loss=0.2725, ctc_loss=0.1922, cr_loss=0.4012, over 4100663.49 frames. ], batch size: 149, lr: 9.32e-03, grad_scale: 16.0 2024-09-14 16:45:13,433 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.066e+02 2.239e+02 2.559e+02 3.668e+02, threshold=4.478e+02, percent-clipped=0.0 2024-09-14 16:45:18,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=8.0 2024-09-14 16:45:22,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=155428.16666666666, ans=0.0 2024-09-14 16:45:35,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=22.5 2024-09-14 16:45:37,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=155456.5, ans=0.125 2024-09-14 16:45:43,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=155456.5, ans=0.025 2024-09-14 16:45:46,256 INFO [train.py:1198] (0/2) Epoch 9, batch 3750, loss[loss=0.2403, ctc_loss=0.1692, cr_loss=0.3559, over 20785.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1922, cr_loss=0.401, over 4098987.16 frames. ], batch size: 56, lr: 9.32e-03, grad_scale: 16.0 2024-09-14 16:46:36,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=155569.83333333334, ans=0.025 2024-09-14 16:47:01,812 INFO [train.py:1198] (0/2) Epoch 9, batch 3800, loss[loss=0.2497, ctc_loss=0.1745, cr_loss=0.3757, over 20975.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1912, cr_loss=0.3999, over 4102463.77 frames. ], batch size: 55, lr: 9.31e-03, grad_scale: 16.0 2024-09-14 16:47:26,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=155654.83333333334, ans=0.125 2024-09-14 16:47:30,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=155654.83333333334, ans=0.1 2024-09-14 16:47:47,281 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.740e+02 2.153e+02 2.340e+02 2.738e+02 4.041e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-14 16:48:20,333 INFO [train.py:1198] (0/2) Epoch 9, batch 3850, loss[loss=0.2808, ctc_loss=0.1979, cr_loss=0.4143, over 20372.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1914, cr_loss=0.3994, over 4099565.78 frames. ], batch size: 74, lr: 9.31e-03, grad_scale: 16.0 2024-09-14 16:49:22,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=155881.5, ans=0.125 2024-09-14 16:49:39,406 INFO [train.py:1198] (0/2) Epoch 9, batch 3900, loss[loss=0.2461, ctc_loss=0.1721, cr_loss=0.3697, over 20955.00 frames. ], tot_loss[loss=0.2708, ctc_loss=0.191, cr_loss=0.3993, over 4111808.31 frames. ], batch size: 49, lr: 9.31e-03, grad_scale: 16.0 2024-09-14 16:50:17,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=155966.5, ans=0.2 2024-09-14 16:50:21,563 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.086e+02 2.271e+02 2.677e+02 3.517e+02, threshold=4.542e+02, percent-clipped=0.0 2024-09-14 16:50:41,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=156023.16666666666, ans=0.125 2024-09-14 16:50:50,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=156023.16666666666, ans=0.0 2024-09-14 16:50:54,607 INFO [train.py:1198] (0/2) Epoch 9, batch 3950, loss[loss=0.2546, ctc_loss=0.1804, cr_loss=0.3708, over 21052.00 frames. ], tot_loss[loss=0.2714, ctc_loss=0.1914, cr_loss=0.3997, over 4106122.40 frames. ], batch size: 56, lr: 9.30e-03, grad_scale: 16.0 2024-09-14 16:50:56,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156051.5, ans=0.1 2024-09-14 16:51:03,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=156051.5, ans=0.0 2024-09-14 16:51:37,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=156136.5, ans=0.125 2024-09-14 16:51:38,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2024-09-14 16:51:51,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2024-09-14 16:51:52,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=156164.83333333334, ans=0.2 2024-09-14 16:52:09,075 INFO [train.py:1198] (0/2) Epoch 9, batch 4000, loss[loss=0.2348, ctc_loss=0.1645, cr_loss=0.3516, over 20950.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1922, cr_loss=0.401, over 4107403.72 frames. ], batch size: 52, lr: 9.30e-03, grad_scale: 32.0 2024-09-14 16:52:11,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=156193.16666666666, ans=0.125 2024-09-14 16:52:26,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=156221.5, ans=0.2 2024-09-14 16:52:32,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=156221.5, ans=0.125 2024-09-14 16:52:51,643 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.125e+02 2.308e+02 2.600e+02 3.947e+02, threshold=4.615e+02, percent-clipped=0.0 2024-09-14 16:52:52,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=156249.83333333334, ans=0.0 2024-09-14 16:53:04,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=156278.16666666666, ans=0.5 2024-09-14 16:53:28,038 INFO [train.py:1198] (0/2) Epoch 9, batch 4050, loss[loss=0.2983, ctc_loss=0.2116, cr_loss=0.4336, over 20671.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1931, cr_loss=0.4024, over 4094916.15 frames. ], batch size: 71, lr: 9.29e-03, grad_scale: 32.0 2024-09-14 16:53:28,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=156334.83333333334, ans=0.125 2024-09-14 16:53:36,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2024-09-14 16:54:04,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=156391.5, ans=0.0 2024-09-14 16:54:15,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=156419.83333333334, ans=0.0 2024-09-14 16:54:27,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=156448.16666666666, ans=0.0 2024-09-14 16:54:34,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=22.5 2024-09-14 16:54:38,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-14 16:54:40,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2024-09-14 16:54:44,155 INFO [train.py:1198] (0/2) Epoch 9, batch 4100, loss[loss=0.281, ctc_loss=0.1979, cr_loss=0.4156, over 21015.00 frames. ], tot_loss[loss=0.2741, ctc_loss=0.1938, cr_loss=0.4015, over 4073655.89 frames. ], batch size: 61, lr: 9.29e-03, grad_scale: 32.0 2024-09-14 16:55:07,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156504.83333333334, ans=0.1 2024-09-14 16:55:13,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=156504.83333333334, ans=0.1 2024-09-14 16:55:29,724 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.077e+02 2.209e+02 2.480e+02 3.413e+02, threshold=4.419e+02, percent-clipped=0.0 2024-09-14 16:55:56,387 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-09-14 16:56:03,183 INFO [train.py:1198] (0/2) Epoch 9, batch 4150, loss[loss=0.2831, ctc_loss=0.2054, cr_loss=0.3885, over 21036.00 frames. ], tot_loss[loss=0.2733, ctc_loss=0.1931, cr_loss=0.4011, over 4087962.21 frames. ], batch size: 62, lr: 9.29e-03, grad_scale: 32.0 2024-09-14 16:56:14,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=156618.16666666666, ans=0.125 2024-09-14 16:56:52,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=156703.16666666666, ans=0.035 2024-09-14 16:57:07,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=156731.5, ans=0.0 2024-09-14 16:57:16,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.12 vs. limit=10.0 2024-09-14 16:57:18,880 INFO [train.py:1198] (0/2) Epoch 9, batch 4200, loss[loss=0.2927, ctc_loss=0.2062, cr_loss=0.4323, over 21063.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.194, cr_loss=0.4024, over 4091361.75 frames. ], batch size: 59, lr: 9.28e-03, grad_scale: 32.0 2024-09-14 16:57:55,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156816.5, ans=0.1 2024-09-14 16:58:01,196 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.160e+02 2.352e+02 2.615e+02 4.061e+02, threshold=4.705e+02, percent-clipped=0.0 2024-09-14 16:58:33,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=156901.5, ans=0.125 2024-09-14 16:58:34,392 INFO [train.py:1198] (0/2) Epoch 9, batch 4250, loss[loss=0.2515, ctc_loss=0.1775, cr_loss=0.3699, over 20880.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1936, cr_loss=0.4017, over 4087625.67 frames. ], batch size: 57, lr: 9.28e-03, grad_scale: 32.0 2024-09-14 16:59:16,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=156958.16666666666, ans=0.2 2024-09-14 16:59:47,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=157014.83333333334, ans=0.025 2024-09-14 16:59:52,932 INFO [train.py:1198] (0/2) Epoch 9, batch 4300, loss[loss=0.2316, ctc_loss=0.1598, cr_loss=0.3588, over 20960.00 frames. ], tot_loss[loss=0.2746, ctc_loss=0.1942, cr_loss=0.402, over 4083206.47 frames. ], batch size: 49, lr: 9.27e-03, grad_scale: 32.0 2024-09-14 16:59:53,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=157043.16666666666, ans=0.125 2024-09-14 16:59:55,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=22.5 2024-09-14 17:00:05,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-14 17:00:11,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=157071.5, ans=0.04949747468305833 2024-09-14 17:00:14,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-14 17:00:35,551 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.080e+02 2.212e+02 2.497e+02 3.875e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-14 17:01:00,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-14 17:01:12,039 INFO [train.py:1198] (0/2) Epoch 9, batch 4350, loss[loss=0.3092, ctc_loss=0.2245, cr_loss=0.4237, over 19522.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.193, cr_loss=0.4002, over 4085381.68 frames. ], batch size: 90, lr: 9.27e-03, grad_scale: 32.0 2024-09-14 17:01:42,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=157241.5, ans=0.0 2024-09-14 17:02:10,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-14 17:02:11,826 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:02:25,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2024-09-14 17:02:28,190 INFO [train.py:1198] (0/2) Epoch 9, batch 4400, loss[loss=0.3011, ctc_loss=0.2133, cr_loss=0.4394, over 20110.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1933, cr_loss=0.4017, over 4095005.24 frames. ], batch size: 80, lr: 9.27e-03, grad_scale: 32.0 2024-09-14 17:02:31,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=157326.5, ans=0.0 2024-09-14 17:02:32,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-09-14 17:02:37,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=157326.5, ans=0.125 2024-09-14 17:02:37,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=157326.5, ans=0.04949747468305833 2024-09-14 17:02:52,718 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2024-09-14 17:03:10,134 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.662e+02 2.086e+02 2.306e+02 2.587e+02 4.199e+02, threshold=4.612e+02, percent-clipped=0.0 2024-09-14 17:03:42,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=157468.16666666666, ans=0.0 2024-09-14 17:03:43,263 INFO [train.py:1198] (0/2) Epoch 9, batch 4450, loss[loss=0.2327, ctc_loss=0.1569, cr_loss=0.3792, over 20983.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1934, cr_loss=0.4015, over 4072956.29 frames. ], batch size: 52, lr: 9.26e-03, grad_scale: 32.0 2024-09-14 17:04:34,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=157553.16666666666, ans=0.125 2024-09-14 17:05:00,928 INFO [train.py:1198] (0/2) Epoch 9, batch 4500, loss[loss=0.2171, ctc_loss=0.1524, cr_loss=0.3233, over 20994.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1928, cr_loss=0.4003, over 4066244.96 frames. ], batch size: 52, lr: 9.26e-03, grad_scale: 32.0 2024-09-14 17:05:31,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=157666.5, ans=0.125 2024-09-14 17:05:36,190 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:05:40,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=157666.5, ans=0.125 2024-09-14 17:05:43,276 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.115e+02 2.292e+02 2.578e+02 7.609e+02, threshold=4.584e+02, percent-clipped=1.0 2024-09-14 17:05:48,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=157694.83333333334, ans=0.0 2024-09-14 17:06:12,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=157723.16666666666, ans=0.125 2024-09-14 17:06:16,644 INFO [train.py:1198] (0/2) Epoch 9, batch 4550, loss[loss=0.2839, ctc_loss=0.201, cr_loss=0.4141, over 20665.00 frames. ], tot_loss[loss=0.2714, ctc_loss=0.1916, cr_loss=0.3987, over 4077179.66 frames. ], batch size: 71, lr: 9.25e-03, grad_scale: 32.0 2024-09-14 17:06:22,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=157751.5, ans=0.1 2024-09-14 17:06:24,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=157751.5, ans=0.125 2024-09-14 17:07:15,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=157836.5, ans=0.0 2024-09-14 17:07:34,354 INFO [train.py:1198] (0/2) Epoch 9, batch 4600, loss[loss=0.2724, ctc_loss=0.1924, cr_loss=0.3999, over 20233.00 frames. ], tot_loss[loss=0.2721, ctc_loss=0.1921, cr_loss=0.3997, over 4086950.34 frames. ], batch size: 74, lr: 9.25e-03, grad_scale: 32.0 2024-09-14 17:07:53,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2024-09-14 17:08:16,503 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.028e+02 2.220e+02 2.385e+02 3.296e+02, threshold=4.439e+02, percent-clipped=0.0 2024-09-14 17:08:23,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=157978.16666666666, ans=0.0 2024-09-14 17:08:33,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=158006.5, ans=0.05 2024-09-14 17:08:49,504 INFO [train.py:1198] (0/2) Epoch 9, batch 4650, loss[loss=0.2639, ctc_loss=0.1882, cr_loss=0.3784, over 20922.00 frames. ], tot_loss[loss=0.2723, ctc_loss=0.1923, cr_loss=0.3999, over 4082490.35 frames. ], batch size: 60, lr: 9.24e-03, grad_scale: 32.0 2024-09-14 17:08:58,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=158034.83333333334, ans=0.2 2024-09-14 17:09:00,881 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=12.0 2024-09-14 17:09:05,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=158063.16666666666, ans=0.125 2024-09-14 17:09:14,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=158063.16666666666, ans=0.0 2024-09-14 17:09:40,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=158119.83333333334, ans=0.0 2024-09-14 17:10:08,666 INFO [train.py:1198] (0/2) Epoch 9, batch 4700, loss[loss=0.2283, ctc_loss=0.1582, cr_loss=0.3504, over 21054.00 frames. ], tot_loss[loss=0.2725, ctc_loss=0.1924, cr_loss=0.4001, over 4080102.44 frames. ], batch size: 53, lr: 9.24e-03, grad_scale: 32.0 2024-09-14 17:10:33,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=158204.83333333334, ans=0.2 2024-09-14 17:10:50,626 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.080e+02 2.241e+02 2.591e+02 3.565e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-14 17:10:58,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=158261.5, ans=0.2 2024-09-14 17:11:16,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=158289.83333333334, ans=12.0 2024-09-14 17:11:23,596 INFO [train.py:1198] (0/2) Epoch 9, batch 4750, loss[loss=0.2359, ctc_loss=0.1658, cr_loss=0.3503, over 20877.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1924, cr_loss=0.4002, over 4087133.46 frames. ], batch size: 54, lr: 9.24e-03, grad_scale: 16.0 2024-09-14 17:12:17,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=158403.16666666666, ans=0.0 2024-09-14 17:12:19,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=158403.16666666666, ans=0.125 2024-09-14 17:12:25,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=158431.5, ans=0.0 2024-09-14 17:12:41,530 INFO [train.py:1198] (0/2) Epoch 9, batch 4800, loss[loss=0.2686, ctc_loss=0.1901, cr_loss=0.3926, over 21018.00 frames. ], tot_loss[loss=0.2725, ctc_loss=0.1926, cr_loss=0.3995, over 4076565.73 frames. ], batch size: 63, lr: 9.23e-03, grad_scale: 32.0 2024-09-14 17:12:41,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=158459.83333333334, ans=0.035 2024-09-14 17:13:25,545 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.078e+02 2.292e+02 2.536e+02 3.600e+02, threshold=4.584e+02, percent-clipped=0.0 2024-09-14 17:13:46,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158573.16666666666, ans=0.1 2024-09-14 17:13:57,248 INFO [train.py:1198] (0/2) Epoch 9, batch 4850, loss[loss=0.2866, ctc_loss=0.206, cr_loss=0.4032, over 20978.00 frames. ], tot_loss[loss=0.2718, ctc_loss=0.192, cr_loss=0.3991, over 4082543.46 frames. ], batch size: 64, lr: 9.23e-03, grad_scale: 32.0 2024-09-14 17:14:20,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=158629.83333333334, ans=0.0 2024-09-14 17:14:30,398 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-56000.pt 2024-09-14 17:14:58,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=158714.83333333334, ans=0.0 2024-09-14 17:15:07,261 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:15:12,766 INFO [train.py:1198] (0/2) Epoch 9, batch 4900, loss[loss=0.2762, ctc_loss=0.1963, cr_loss=0.3993, over 20093.00 frames. ], tot_loss[loss=0.272, ctc_loss=0.1921, cr_loss=0.3996, over 4080593.51 frames. ], batch size: 80, lr: 9.22e-03, grad_scale: 32.0 2024-09-14 17:15:17,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=158743.16666666666, ans=10.0 2024-09-14 17:15:23,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=158743.16666666666, ans=0.0 2024-09-14 17:15:44,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=158799.83333333334, ans=0.125 2024-09-14 17:15:55,441 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.129e+02 2.311e+02 2.603e+02 4.263e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-14 17:15:57,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=158828.16666666666, ans=0.125 2024-09-14 17:16:29,624 INFO [train.py:1198] (0/2) Epoch 9, batch 4950, loss[loss=0.294, ctc_loss=0.204, cr_loss=0.4503, over 20859.00 frames. ], tot_loss[loss=0.2721, ctc_loss=0.1921, cr_loss=0.3998, over 4093830.79 frames. ], batch size: 57, lr: 9.22e-03, grad_scale: 32.0 2024-09-14 17:16:47,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=158913.16666666666, ans=0.0 2024-09-14 17:17:05,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=158941.5, ans=0.0 2024-09-14 17:17:10,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2024-09-14 17:17:23,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=158969.83333333334, ans=0.0 2024-09-14 17:17:44,327 INFO [train.py:1198] (0/2) Epoch 9, batch 5000, loss[loss=0.2854, ctc_loss=0.2011, cr_loss=0.4217, over 20969.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1923, cr_loss=0.4004, over 4091614.55 frames. ], batch size: 58, lr: 9.22e-03, grad_scale: 16.0 2024-09-14 17:17:50,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=159026.5, ans=0.125 2024-09-14 17:18:27,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=159111.5, ans=0.125 2024-09-14 17:18:28,845 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.125e+02 2.345e+02 2.595e+02 9.253e+02, threshold=4.690e+02, percent-clipped=1.0 2024-09-14 17:18:39,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=159111.5, ans=0.125 2024-09-14 17:18:58,618 INFO [train.py:1198] (0/2) Epoch 9, batch 5050, loss[loss=0.3161, ctc_loss=0.2215, cr_loss=0.4731, over 20673.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1932, cr_loss=0.4026, over 4101233.99 frames. ], batch size: 66, lr: 9.21e-03, grad_scale: 16.0 2024-09-14 17:19:20,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=22.5 2024-09-14 17:20:15,727 INFO [train.py:1198] (0/2) Epoch 9, batch 5100, loss[loss=0.3133, ctc_loss=0.2177, cr_loss=0.4782, over 20679.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1933, cr_loss=0.403, over 4103184.63 frames. ], batch size: 71, lr: 9.21e-03, grad_scale: 16.0 2024-09-14 17:20:39,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=159338.16666666666, ans=0.1 2024-09-14 17:20:45,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=159366.5, ans=0.0 2024-09-14 17:20:53,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-09-14 17:20:57,872 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:20:57,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=159366.5, ans=0.1 2024-09-14 17:21:00,475 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.099e+02 2.285e+02 2.574e+02 3.012e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-14 17:21:29,873 INFO [train.py:1198] (0/2) Epoch 9, batch 5150, loss[loss=0.3013, ctc_loss=0.2154, cr_loss=0.4295, over 20246.00 frames. ], tot_loss[loss=0.2736, ctc_loss=0.1932, cr_loss=0.4019, over 4097194.22 frames. ], batch size: 80, lr: 9.20e-03, grad_scale: 16.0 2024-09-14 17:21:37,522 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:21:40,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=159451.5, ans=0.0 2024-09-14 17:21:52,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=159479.83333333334, ans=0.2 2024-09-14 17:22:34,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=159564.83333333334, ans=0.0 2024-09-14 17:22:43,743 INFO [train.py:1198] (0/2) Epoch 9, batch 5200, loss[loss=0.2739, ctc_loss=0.1946, cr_loss=0.396, over 21052.00 frames. ], tot_loss[loss=0.2732, ctc_loss=0.193, cr_loss=0.4012, over 4088672.93 frames. ], batch size: 62, lr: 9.20e-03, grad_scale: 32.0 2024-09-14 17:22:44,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=159593.16666666666, ans=0.2 2024-09-14 17:22:48,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-14 17:22:54,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=159593.16666666666, ans=0.125 2024-09-14 17:22:55,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=159593.16666666666, ans=0.125 2024-09-14 17:23:27,823 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.078e+02 2.268e+02 2.552e+02 4.758e+02, threshold=4.536e+02, percent-clipped=1.0 2024-09-14 17:23:35,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=159678.16666666666, ans=0.1 2024-09-14 17:23:41,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=159706.5, ans=0.125 2024-09-14 17:23:50,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=159706.5, ans=0.125 2024-09-14 17:23:53,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=159706.5, ans=0.1 2024-09-14 17:23:57,268 INFO [train.py:1198] (0/2) Epoch 9, batch 5250, loss[loss=0.2885, ctc_loss=0.2045, cr_loss=0.4198, over 19362.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1931, cr_loss=0.4022, over 4096759.89 frames. ], batch size: 90, lr: 9.20e-03, grad_scale: 32.0 2024-09-14 17:24:28,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=159791.5, ans=0.125 2024-09-14 17:24:30,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=159791.5, ans=0.07 2024-09-14 17:24:38,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=159791.5, ans=0.2 2024-09-14 17:25:03,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.64 vs. limit=15.0 2024-09-14 17:25:11,227 INFO [train.py:1198] (0/2) Epoch 9, batch 5300, loss[loss=0.2598, ctc_loss=0.1795, cr_loss=0.4016, over 20774.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1921, cr_loss=0.4012, over 4106967.96 frames. ], batch size: 56, lr: 9.19e-03, grad_scale: 32.0 2024-09-14 17:25:16,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=159876.5, ans=0.125 2024-09-14 17:25:55,490 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.100e+02 2.340e+02 2.644e+02 4.163e+02, threshold=4.681e+02, percent-clipped=0.0 2024-09-14 17:26:15,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=159989.83333333334, ans=0.125 2024-09-14 17:26:27,136 INFO [train.py:1198] (0/2) Epoch 9, batch 5350, loss[loss=0.2863, ctc_loss=0.2019, cr_loss=0.4224, over 21013.00 frames. ], tot_loss[loss=0.272, ctc_loss=0.1918, cr_loss=0.4009, over 4110392.66 frames. ], batch size: 61, lr: 9.19e-03, grad_scale: 32.0 2024-09-14 17:27:42,038 INFO [train.py:1198] (0/2) Epoch 9, batch 5400, loss[loss=0.2615, ctc_loss=0.1841, cr_loss=0.3875, over 20681.00 frames. ], tot_loss[loss=0.2722, ctc_loss=0.192, cr_loss=0.4012, over 4101496.00 frames. ], batch size: 71, lr: 9.18e-03, grad_scale: 32.0 2024-09-14 17:28:20,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=160216.5, ans=0.125 2024-09-14 17:28:26,567 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.050e+02 2.236e+02 2.556e+02 3.213e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-14 17:28:28,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=160244.83333333334, ans=0.04949747468305833 2024-09-14 17:28:37,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=160244.83333333334, ans=0.125 2024-09-14 17:28:40,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160273.16666666666, ans=0.1 2024-09-14 17:28:49,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=160273.16666666666, ans=0.0 2024-09-14 17:28:56,246 INFO [train.py:1198] (0/2) Epoch 9, batch 5450, loss[loss=0.2872, ctc_loss=0.2043, cr_loss=0.4146, over 20822.00 frames. ], tot_loss[loss=0.2718, ctc_loss=0.1917, cr_loss=0.4006, over 4097245.71 frames. ], batch size: 59, lr: 9.18e-03, grad_scale: 32.0 2024-09-14 17:29:05,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2024-09-14 17:29:13,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=160329.83333333334, ans=0.0 2024-09-14 17:30:12,597 INFO [train.py:1198] (0/2) Epoch 9, batch 5500, loss[loss=0.2888, ctc_loss=0.2066, cr_loss=0.4108, over 21014.00 frames. ], tot_loss[loss=0.2715, ctc_loss=0.1915, cr_loss=0.3999, over 4087365.61 frames. ], batch size: 61, lr: 9.18e-03, grad_scale: 32.0 2024-09-14 17:30:12,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=160443.16666666666, ans=0.0 2024-09-14 17:30:29,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=160471.5, ans=0.2 2024-09-14 17:30:38,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=160471.5, ans=0.0 2024-09-14 17:30:44,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=160499.83333333334, ans=0.025 2024-09-14 17:30:57,763 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.094e+02 2.232e+02 2.492e+02 3.807e+02, threshold=4.465e+02, percent-clipped=0.0 2024-09-14 17:31:08,596 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:31:18,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=160556.5, ans=0.125 2024-09-14 17:31:27,368 INFO [train.py:1198] (0/2) Epoch 9, batch 5550, loss[loss=0.2596, ctc_loss=0.1832, cr_loss=0.3818, over 21055.00 frames. ], tot_loss[loss=0.2701, ctc_loss=0.1904, cr_loss=0.3982, over 4094290.01 frames. ], batch size: 56, lr: 9.17e-03, grad_scale: 32.0 2024-09-14 17:31:43,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=160613.16666666666, ans=0.125 2024-09-14 17:31:46,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=160613.16666666666, ans=0.125 2024-09-14 17:32:36,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=160698.16666666666, ans=0.0 2024-09-14 17:32:41,018 INFO [train.py:1198] (0/2) Epoch 9, batch 5600, loss[loss=0.2531, ctc_loss=0.1774, cr_loss=0.3785, over 20971.00 frames. ], tot_loss[loss=0.2717, ctc_loss=0.1917, cr_loss=0.3998, over 4099292.50 frames. ], batch size: 48, lr: 9.17e-03, grad_scale: 32.0 2024-09-14 17:32:44,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=160726.5, ans=0.125 2024-09-14 17:33:01,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.91 vs. limit=15.0 2024-09-14 17:33:18,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=160783.16666666666, ans=0.125 2024-09-14 17:33:25,332 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.093e+02 2.245e+02 2.529e+02 3.867e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-14 17:33:26,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.11 vs. limit=12.0 2024-09-14 17:33:46,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=160839.83333333334, ans=0.125 2024-09-14 17:33:54,932 INFO [train.py:1198] (0/2) Epoch 9, batch 5650, loss[loss=0.3018, ctc_loss=0.2144, cr_loss=0.4373, over 20981.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1935, cr_loss=0.402, over 4085421.09 frames. ], batch size: 64, lr: 9.16e-03, grad_scale: 32.0 2024-09-14 17:34:16,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=160896.5, ans=0.0 2024-09-14 17:34:44,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=160953.16666666666, ans=0.125 2024-09-14 17:35:02,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=160981.5, ans=0.1 2024-09-14 17:35:03,316 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-09-14 17:35:11,470 INFO [train.py:1198] (0/2) Epoch 9, batch 5700, loss[loss=0.2522, ctc_loss=0.1737, cr_loss=0.3923, over 20954.00 frames. ], tot_loss[loss=0.2727, ctc_loss=0.1925, cr_loss=0.401, over 4091550.37 frames. ], batch size: 50, lr: 9.16e-03, grad_scale: 32.0 2024-09-14 17:35:31,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.91 vs. limit=22.5 2024-09-14 17:35:35,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=15.0 2024-09-14 17:35:38,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2024-09-14 17:35:55,979 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.044e+02 2.225e+02 2.584e+02 3.392e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-14 17:36:25,756 INFO [train.py:1198] (0/2) Epoch 9, batch 5750, loss[loss=0.2683, ctc_loss=0.1928, cr_loss=0.3775, over 20931.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1911, cr_loss=0.3994, over 4096766.98 frames. ], batch size: 60, lr: 9.16e-03, grad_scale: 32.0 2024-09-14 17:36:38,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=161151.5, ans=0.0 2024-09-14 17:36:50,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2024-09-14 17:36:52,011 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2024-09-14 17:36:58,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=161208.16666666666, ans=0.0 2024-09-14 17:37:23,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161264.83333333334, ans=0.1 2024-09-14 17:37:41,854 INFO [train.py:1198] (0/2) Epoch 9, batch 5800, loss[loss=0.2851, ctc_loss=0.2003, cr_loss=0.4238, over 20849.00 frames. ], tot_loss[loss=0.273, ctc_loss=0.1927, cr_loss=0.4011, over 4085970.29 frames. ], batch size: 65, lr: 9.15e-03, grad_scale: 32.0 2024-09-14 17:38:07,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=22.5 2024-09-14 17:38:14,955 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-14 17:38:26,535 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.716e+02 2.164e+02 2.328e+02 2.619e+02 4.158e+02, threshold=4.656e+02, percent-clipped=0.0 2024-09-14 17:38:53,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161406.5, ans=0.1 2024-09-14 17:38:56,078 INFO [train.py:1198] (0/2) Epoch 9, batch 5850, loss[loss=0.2932, ctc_loss=0.2065, cr_loss=0.4336, over 20651.00 frames. ], tot_loss[loss=0.2732, ctc_loss=0.1929, cr_loss=0.4014, over 4079157.75 frames. ], batch size: 66, lr: 9.15e-03, grad_scale: 32.0 2024-09-14 17:39:10,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=161463.16666666666, ans=0.125 2024-09-14 17:39:11,617 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=22.5 2024-09-14 17:39:27,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2024-09-14 17:39:36,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=161491.5, ans=0.04949747468305833 2024-09-14 17:39:47,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=161519.83333333334, ans=0.0 2024-09-14 17:39:51,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2024-09-14 17:40:09,860 INFO [train.py:1198] (0/2) Epoch 9, batch 5900, loss[loss=0.2775, ctc_loss=0.1954, cr_loss=0.4106, over 20703.00 frames. ], tot_loss[loss=0.273, ctc_loss=0.1927, cr_loss=0.4013, over 4081404.47 frames. ], batch size: 71, lr: 9.14e-03, grad_scale: 32.0 2024-09-14 17:40:54,574 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.098e+02 2.330e+02 2.680e+02 3.765e+02, threshold=4.661e+02, percent-clipped=0.0 2024-09-14 17:41:08,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=22.5 2024-09-14 17:41:23,529 INFO [train.py:1198] (0/2) Epoch 9, batch 5950, loss[loss=0.2525, ctc_loss=0.1751, cr_loss=0.3869, over 21048.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1927, cr_loss=0.4021, over 4092266.90 frames. ], batch size: 56, lr: 9.14e-03, grad_scale: 32.0 2024-09-14 17:42:14,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2024-09-14 17:42:33,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161831.5, ans=0.1 2024-09-14 17:42:37,648 INFO [train.py:1198] (0/2) Epoch 9, batch 6000, loss[loss=0.307, ctc_loss=0.2186, cr_loss=0.4417, over 21039.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1931, cr_loss=0.4023, over 4087383.48 frames. ], batch size: 62, lr: 9.14e-03, grad_scale: 32.0 2024-09-14 17:42:37,649 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 17:42:58,465 INFO [train.py:1230] (0/2) Epoch 9, validation: loss=0.05442, ctc_loss=0.05442, cr_loss=9.512e-15, over 944034.00 frames. 2024-09-14 17:42:58,465 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 17:43:32,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=161916.5, ans=0.125 2024-09-14 17:43:32,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161916.5, ans=0.1 2024-09-14 17:43:42,574 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.065e+02 2.261e+02 2.727e+02 3.866e+02, threshold=4.521e+02, percent-clipped=0.0 2024-09-14 17:43:56,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.74 vs. limit=10.0 2024-09-14 17:44:13,201 INFO [train.py:1198] (0/2) Epoch 9, batch 6050, loss[loss=0.3096, ctc_loss=0.2214, cr_loss=0.4413, over 20669.00 frames. ], tot_loss[loss=0.2738, ctc_loss=0.1932, cr_loss=0.4032, over 4086943.61 frames. ], batch size: 68, lr: 9.13e-03, grad_scale: 32.0 2024-09-14 17:44:23,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=162001.5, ans=0.125 2024-09-14 17:44:34,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=162029.83333333334, ans=0.125 2024-09-14 17:45:28,296 INFO [train.py:1198] (0/2) Epoch 9, batch 6100, loss[loss=0.3169, ctc_loss=0.2251, cr_loss=0.4591, over 19395.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1931, cr_loss=0.4028, over 4086229.40 frames. ], batch size: 90, lr: 9.13e-03, grad_scale: 32.0 2024-09-14 17:45:46,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162171.5, ans=0.1 2024-09-14 17:45:55,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=162171.5, ans=0.0 2024-09-14 17:46:13,214 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.735e+02 2.083e+02 2.381e+02 2.787e+02 4.151e+02, threshold=4.762e+02, percent-clipped=0.0 2024-09-14 17:46:42,817 INFO [train.py:1198] (0/2) Epoch 9, batch 6150, loss[loss=0.2605, ctc_loss=0.1776, cr_loss=0.4144, over 19919.00 frames. ], tot_loss[loss=0.2717, ctc_loss=0.1915, cr_loss=0.4009, over 4088542.33 frames. ], batch size: 44, lr: 9.12e-03, grad_scale: 32.0 2024-09-14 17:47:14,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=162341.5, ans=0.0 2024-09-14 17:47:56,905 INFO [train.py:1198] (0/2) Epoch 9, batch 6200, loss[loss=0.2768, ctc_loss=0.1962, cr_loss=0.4031, over 21035.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1928, cr_loss=0.4013, over 4068781.28 frames. ], batch size: 63, lr: 9.12e-03, grad_scale: 32.0 2024-09-14 17:48:07,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=162426.5, ans=0.09899494936611666 2024-09-14 17:48:28,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=162483.16666666666, ans=0.0 2024-09-14 17:48:41,740 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.049e+02 2.171e+02 2.420e+02 4.623e+02, threshold=4.342e+02, percent-clipped=0.0 2024-09-14 17:48:57,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=162539.83333333334, ans=22.5 2024-09-14 17:49:11,300 INFO [train.py:1198] (0/2) Epoch 9, batch 6250, loss[loss=0.2861, ctc_loss=0.205, cr_loss=0.4054, over 20818.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1928, cr_loss=0.4016, over 4074262.66 frames. ], batch size: 65, lr: 9.12e-03, grad_scale: 32.0 2024-09-14 17:49:12,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0 2024-09-14 17:49:40,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=162624.83333333334, ans=0.125 2024-09-14 17:49:47,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=162624.83333333334, ans=0.0 2024-09-14 17:49:51,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2024-09-14 17:49:54,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2024-09-14 17:50:21,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.79 vs. limit=15.0 2024-09-14 17:50:26,172 INFO [train.py:1198] (0/2) Epoch 9, batch 6300, loss[loss=0.2833, ctc_loss=0.1993, cr_loss=0.4203, over 20669.00 frames. ], tot_loss[loss=0.2727, ctc_loss=0.1926, cr_loss=0.4009, over 4057381.75 frames. ], batch size: 71, lr: 9.11e-03, grad_scale: 32.0 2024-09-14 17:50:32,356 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:50:33,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=162709.83333333334, ans=0.125 2024-09-14 17:50:45,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=162738.16666666666, ans=0.0 2024-09-14 17:51:02,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2024-09-14 17:51:08,792 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.115e+02 2.309e+02 2.708e+02 4.561e+02, threshold=4.619e+02, percent-clipped=1.0 2024-09-14 17:51:24,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=162823.16666666666, ans=0.0 2024-09-14 17:51:36,043 INFO [train.py:1198] (0/2) Epoch 9, batch 6350, loss[loss=0.38, ctc_loss=0.2876, cr_loss=0.4623, over 14018.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.1992, cr_loss=0.4043, over 3898170.35 frames. ], batch size: 149, lr: 9.11e-03, grad_scale: 32.0 2024-09-14 17:52:02,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=162879.83333333334, ans=0.125 2024-09-14 17:52:32,816 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-9.pt 2024-09-14 17:53:20,908 INFO [train.py:1198] (0/2) Epoch 10, batch 0, loss[loss=0.2588, ctc_loss=0.1785, cr_loss=0.4014, over 20840.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1785, cr_loss=0.4014, over 20840.00 frames. ], batch size: 59, lr: 8.66e-03, grad_scale: 32.0 2024-09-14 17:53:20,909 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 17:53:39,077 INFO [train.py:1230] (0/2) Epoch 10, validation: loss=0.05531, ctc_loss=0.05531, cr_loss=9.031e-15, over 944034.00 frames. 2024-09-14 17:53:39,077 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 17:53:56,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=162993.16666666666, ans=0.1 2024-09-14 17:54:07,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=162993.16666666666, ans=0.0 2024-09-14 17:54:19,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=163021.5, ans=0.125 2024-09-14 17:54:25,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=163049.83333333334, ans=0.2 2024-09-14 17:54:27,424 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 17:54:41,650 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.186e+02 2.441e+02 2.681e+02 3.994e+02, threshold=4.882e+02, percent-clipped=0.0 2024-09-14 17:54:56,164 INFO [train.py:1198] (0/2) Epoch 10, batch 50, loss[loss=0.3441, ctc_loss=0.2569, cr_loss=0.436, over 14209.00 frames. ], tot_loss[loss=0.2707, ctc_loss=0.1911, cr_loss=0.3978, over 904296.10 frames. ], batch size: 150, lr: 8.66e-03, grad_scale: 32.0 2024-09-14 17:55:11,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=163134.83333333334, ans=0.07 2024-09-14 17:55:18,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=163134.83333333334, ans=0.125 2024-09-14 17:55:24,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=163163.16666666666, ans=0.125 2024-09-14 17:55:29,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163163.16666666666, ans=0.1 2024-09-14 17:55:44,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=163191.5, ans=0.025 2024-09-14 17:55:48,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=22.5 2024-09-14 17:56:05,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=163219.83333333334, ans=0.125 2024-09-14 17:56:11,372 INFO [train.py:1198] (0/2) Epoch 10, batch 100, loss[loss=0.2647, ctc_loss=0.1851, cr_loss=0.3978, over 20777.00 frames. ], tot_loss[loss=0.2697, ctc_loss=0.1902, cr_loss=0.3975, over 1614131.24 frames. ], batch size: 56, lr: 8.65e-03, grad_scale: 32.0 2024-09-14 17:57:11,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=163333.16666666666, ans=0.0 2024-09-14 17:57:14,143 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.029e+02 2.182e+02 2.362e+02 3.664e+02, threshold=4.363e+02, percent-clipped=0.0 2024-09-14 17:57:29,374 INFO [train.py:1198] (0/2) Epoch 10, batch 150, loss[loss=0.2798, ctc_loss=0.1945, cr_loss=0.4261, over 20855.00 frames. ], tot_loss[loss=0.2704, ctc_loss=0.1906, cr_loss=0.3991, over 2162578.06 frames. ], batch size: 57, lr: 8.65e-03, grad_scale: 32.0 2024-09-14 17:57:29,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=163389.83333333334, ans=0.125 2024-09-14 17:57:39,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=163389.83333333334, ans=0.125 2024-09-14 17:57:45,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=163418.16666666666, ans=0.125 2024-09-14 17:58:05,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=163446.5, ans=0.125 2024-09-14 17:58:19,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=22.5 2024-09-14 17:58:43,343 INFO [train.py:1198] (0/2) Epoch 10, batch 200, loss[loss=0.2808, ctc_loss=0.2016, cr_loss=0.3959, over 20349.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1912, cr_loss=0.3988, over 2578851.14 frames. ], batch size: 74, lr: 8.64e-03, grad_scale: 32.0 2024-09-14 17:58:43,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=163531.5, ans=0.2 2024-09-14 17:59:20,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=163588.16666666666, ans=0.0 2024-09-14 17:59:46,413 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.046e+02 2.222e+02 2.425e+02 6.390e+02, threshold=4.445e+02, percent-clipped=1.0 2024-09-14 18:00:01,417 INFO [train.py:1198] (0/2) Epoch 10, batch 250, loss[loss=0.2633, ctc_loss=0.1845, cr_loss=0.3938, over 21055.00 frames. ], tot_loss[loss=0.2717, ctc_loss=0.1915, cr_loss=0.4006, over 2925109.60 frames. ], batch size: 56, lr: 8.64e-03, grad_scale: 32.0 2024-09-14 18:00:10,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=163673.16666666666, ans=0.125 2024-09-14 18:00:27,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=163701.5, ans=0.2 2024-09-14 18:01:06,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=163786.5, ans=0.125 2024-09-14 18:01:06,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=163786.5, ans=0.0 2024-09-14 18:01:17,057 INFO [train.py:1198] (0/2) Epoch 10, batch 300, loss[loss=0.2443, ctc_loss=0.1681, cr_loss=0.3808, over 21001.00 frames. ], tot_loss[loss=0.2709, ctc_loss=0.1909, cr_loss=0.3999, over 3185465.53 frames. ], batch size: 52, lr: 8.64e-03, grad_scale: 32.0 2024-09-14 18:01:35,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-09-14 18:01:36,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=163843.16666666666, ans=0.125 2024-09-14 18:01:45,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=163871.5, ans=0.125 2024-09-14 18:01:59,929 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2024-09-14 18:02:20,283 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.028e+02 2.219e+02 2.414e+02 4.165e+02, threshold=4.438e+02, percent-clipped=0.0 2024-09-14 18:02:28,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-09-14 18:02:35,242 INFO [train.py:1198] (0/2) Epoch 10, batch 350, loss[loss=0.2916, ctc_loss=0.2054, cr_loss=0.431, over 20713.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1909, cr_loss=0.4007, over 3398112.66 frames. ], batch size: 71, lr: 8.63e-03, grad_scale: 32.0 2024-09-14 18:03:02,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=163984.83333333334, ans=0.0 2024-09-14 18:03:04,217 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 18:03:18,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2024-09-14 18:03:50,620 INFO [train.py:1198] (0/2) Epoch 10, batch 400, loss[loss=0.302, ctc_loss=0.2112, cr_loss=0.4542, over 20640.00 frames. ], tot_loss[loss=0.2709, ctc_loss=0.1908, cr_loss=0.4006, over 3554584.67 frames. ], batch size: 68, lr: 8.63e-03, grad_scale: 32.0 2024-09-14 18:03:51,336 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-14 18:04:07,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-09-14 18:04:20,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=164154.83333333334, ans=0.125 2024-09-14 18:04:23,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=164154.83333333334, ans=0.125 2024-09-14 18:04:51,695 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.079e+02 2.205e+02 2.399e+02 3.065e+02, threshold=4.411e+02, percent-clipped=0.0 2024-09-14 18:04:52,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164211.5, ans=0.1 2024-09-14 18:04:56,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=164211.5, ans=0.125 2024-09-14 18:04:59,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164211.5, ans=0.1 2024-09-14 18:05:05,063 INFO [train.py:1198] (0/2) Epoch 10, batch 450, loss[loss=0.284, ctc_loss=0.1979, cr_loss=0.4304, over 20937.00 frames. ], tot_loss[loss=0.2715, ctc_loss=0.1913, cr_loss=0.4008, over 3673351.61 frames. ], batch size: 60, lr: 8.63e-03, grad_scale: 32.0 2024-09-14 18:06:01,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=164324.83333333334, ans=0.5 2024-09-14 18:06:02,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=164324.83333333334, ans=0.125 2024-09-14 18:06:14,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=164353.16666666666, ans=0.125 2024-09-14 18:06:23,035 INFO [train.py:1198] (0/2) Epoch 10, batch 500, loss[loss=0.2703, ctc_loss=0.1897, cr_loss=0.403, over 20863.00 frames. ], tot_loss[loss=0.2711, ctc_loss=0.1911, cr_loss=0.4001, over 3769441.58 frames. ], batch size: 57, lr: 8.62e-03, grad_scale: 32.0 2024-09-14 18:06:49,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164409.83333333334, ans=0.1 2024-09-14 18:06:52,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=164438.16666666666, ans=0.125 2024-09-14 18:06:57,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=164438.16666666666, ans=0.0 2024-09-14 18:06:58,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=164438.16666666666, ans=0.0 2024-09-14 18:07:25,040 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.220e+02 2.404e+02 2.694e+02 3.649e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-14 18:07:31,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=164494.83333333334, ans=0.0 2024-09-14 18:07:38,537 INFO [train.py:1198] (0/2) Epoch 10, batch 550, loss[loss=0.2868, ctc_loss=0.2074, cr_loss=0.397, over 20045.00 frames. ], tot_loss[loss=0.2718, ctc_loss=0.1916, cr_loss=0.4008, over 3838691.92 frames. ], batch size: 80, lr: 8.62e-03, grad_scale: 32.0 2024-09-14 18:07:46,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=164523.16666666666, ans=0.125 2024-09-14 18:08:57,215 INFO [train.py:1198] (0/2) Epoch 10, batch 600, loss[loss=0.2935, ctc_loss=0.2149, cr_loss=0.393, over 20997.00 frames. ], tot_loss[loss=0.2707, ctc_loss=0.1906, cr_loss=0.4002, over 3902018.13 frames. ], batch size: 67, lr: 8.62e-03, grad_scale: 32.0 2024-09-14 18:08:59,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.34 vs. limit=10.0 2024-09-14 18:09:40,153 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2024-09-14 18:09:51,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=164749.83333333334, ans=0.0 2024-09-14 18:10:00,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.151e+02 2.375e+02 2.704e+02 4.252e+02, threshold=4.750e+02, percent-clipped=0.0 2024-09-14 18:10:12,456 INFO [train.py:1198] (0/2) Epoch 10, batch 650, loss[loss=0.2214, ctc_loss=0.1534, cr_loss=0.3401, over 20948.00 frames. ], tot_loss[loss=0.2693, ctc_loss=0.1897, cr_loss=0.3982, over 3947485.11 frames. ], batch size: 48, lr: 8.61e-03, grad_scale: 16.0 2024-09-14 18:10:23,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=164806.5, ans=0.09899494936611666 2024-09-14 18:10:43,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=12.0 2024-09-14 18:11:30,424 INFO [train.py:1198] (0/2) Epoch 10, batch 700, loss[loss=0.2722, ctc_loss=0.1881, cr_loss=0.4205, over 21036.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.1905, cr_loss=0.4001, over 3980452.77 frames. ], batch size: 62, lr: 8.61e-03, grad_scale: 16.0 2024-09-14 18:11:40,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=164948.16666666666, ans=0.0 2024-09-14 18:12:03,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=165004.83333333334, ans=0.2 2024-09-14 18:12:18,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=165033.16666666666, ans=0.125 2024-09-14 18:12:25,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=165033.16666666666, ans=0.125 2024-09-14 18:12:32,589 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.095e+02 2.296e+02 2.518e+02 4.159e+02, threshold=4.592e+02, percent-clipped=0.0 2024-09-14 18:12:34,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=12.0 2024-09-14 18:12:41,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=165061.5, ans=0.125 2024-09-14 18:12:44,814 INFO [train.py:1198] (0/2) Epoch 10, batch 750, loss[loss=0.3697, ctc_loss=0.2771, cr_loss=0.4629, over 14262.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.191, cr_loss=0.4011, over 4007936.58 frames. ], batch size: 149, lr: 8.60e-03, grad_scale: 16.0 2024-09-14 18:14:02,316 INFO [train.py:1198] (0/2) Epoch 10, batch 800, loss[loss=0.2633, ctc_loss=0.1844, cr_loss=0.3943, over 20978.00 frames. ], tot_loss[loss=0.2725, ctc_loss=0.192, cr_loss=0.4027, over 4027227.82 frames. ], batch size: 55, lr: 8.60e-03, grad_scale: 32.0 2024-09-14 18:14:58,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=165316.5, ans=10.0 2024-09-14 18:15:06,902 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.048e+02 2.244e+02 2.473e+02 4.643e+02, threshold=4.488e+02, percent-clipped=1.0 2024-09-14 18:15:07,670 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2024-09-14 18:15:11,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=165344.83333333334, ans=0.1 2024-09-14 18:15:17,309 INFO [train.py:1198] (0/2) Epoch 10, batch 850, loss[loss=0.2974, ctc_loss=0.2119, cr_loss=0.4271, over 20655.00 frames. ], tot_loss[loss=0.2722, ctc_loss=0.1918, cr_loss=0.4019, over 4025782.62 frames. ], batch size: 68, lr: 8.60e-03, grad_scale: 16.0 2024-09-14 18:15:17,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=165373.16666666666, ans=0.125 2024-09-14 18:16:21,027 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.28 vs. limit=10.0 2024-09-14 18:16:22,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.25 vs. limit=22.5 2024-09-14 18:16:26,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=165486.5, ans=0.2 2024-09-14 18:16:31,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-09-14 18:16:35,289 INFO [train.py:1198] (0/2) Epoch 10, batch 900, loss[loss=0.2361, ctc_loss=0.1615, cr_loss=0.3729, over 20922.00 frames. ], tot_loss[loss=0.2726, ctc_loss=0.1921, cr_loss=0.4023, over 4019554.50 frames. ], batch size: 54, lr: 8.59e-03, grad_scale: 16.0 2024-09-14 18:17:11,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=165571.5, ans=0.125 2024-09-14 18:17:13,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=165571.5, ans=0.125 2024-09-14 18:17:19,305 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 18:17:39,687 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.005e+02 2.189e+02 2.470e+02 3.871e+02, threshold=4.378e+02, percent-clipped=0.0 2024-09-14 18:17:44,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=165628.16666666666, ans=0.0 2024-09-14 18:17:50,213 INFO [train.py:1198] (0/2) Epoch 10, batch 950, loss[loss=0.2539, ctc_loss=0.1775, cr_loss=0.382, over 21000.00 frames. ], tot_loss[loss=0.2707, ctc_loss=0.1906, cr_loss=0.4004, over 4040879.99 frames. ], batch size: 52, lr: 8.59e-03, grad_scale: 16.0 2024-09-14 18:18:05,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=165684.83333333334, ans=0.125 2024-09-14 18:18:24,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=165713.16666666666, ans=0.125 2024-09-14 18:19:02,113 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 18:19:07,687 INFO [train.py:1198] (0/2) Epoch 10, batch 1000, loss[loss=0.2528, ctc_loss=0.1776, cr_loss=0.3759, over 20944.00 frames. ], tot_loss[loss=0.2707, ctc_loss=0.1906, cr_loss=0.4003, over 4045104.83 frames. ], batch size: 58, lr: 8.59e-03, grad_scale: 8.0 2024-09-14 18:19:18,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=165798.16666666666, ans=0.05 2024-09-14 18:19:29,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=165826.5, ans=0.0 2024-09-14 18:19:57,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=165883.16666666666, ans=0.125 2024-09-14 18:20:06,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=165911.5, ans=0.2 2024-09-14 18:20:13,332 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.038e+02 2.163e+02 2.393e+02 4.771e+02, threshold=4.326e+02, percent-clipped=1.0 2024-09-14 18:20:22,290 INFO [train.py:1198] (0/2) Epoch 10, batch 1050, loss[loss=0.2819, ctc_loss=0.1976, cr_loss=0.4219, over 21066.00 frames. ], tot_loss[loss=0.2698, ctc_loss=0.1898, cr_loss=0.4, over 4066786.63 frames. ], batch size: 59, lr: 8.58e-03, grad_scale: 8.0 2024-09-14 18:20:57,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=165996.5, ans=0.125 2024-09-14 18:20:58,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=165996.5, ans=0.2 2024-09-14 18:21:04,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=165996.5, ans=0.025 2024-09-14 18:21:14,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=166024.83333333334, ans=0.0 2024-09-14 18:21:37,089 INFO [train.py:1198] (0/2) Epoch 10, batch 1100, loss[loss=0.3049, ctc_loss=0.2158, cr_loss=0.4451, over 18211.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.1904, cr_loss=0.4006, over 4066698.56 frames. ], batch size: 108, lr: 8.58e-03, grad_scale: 8.0 2024-09-14 18:21:53,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=166109.83333333334, ans=0.1 2024-09-14 18:22:04,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=166109.83333333334, ans=15.0 2024-09-14 18:22:33,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2024-09-14 18:22:45,905 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.043e+02 2.161e+02 2.398e+02 3.362e+02, threshold=4.323e+02, percent-clipped=0.0 2024-09-14 18:22:52,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=166194.83333333334, ans=0.025 2024-09-14 18:22:54,727 INFO [train.py:1198] (0/2) Epoch 10, batch 1150, loss[loss=0.2379, ctc_loss=0.1673, cr_loss=0.3528, over 21076.00 frames. ], tot_loss[loss=0.2687, ctc_loss=0.1889, cr_loss=0.399, over 4084498.23 frames. ], batch size: 59, lr: 8.58e-03, grad_scale: 8.0 2024-09-14 18:23:02,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2024-09-14 18:23:17,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=166251.5, ans=0.1 2024-09-14 18:23:41,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=166308.16666666666, ans=0.125 2024-09-14 18:23:46,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=166308.16666666666, ans=0.125 2024-09-14 18:23:57,319 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-09-14 18:24:09,975 INFO [train.py:1198] (0/2) Epoch 10, batch 1200, loss[loss=0.2839, ctc_loss=0.197, cr_loss=0.4344, over 20979.00 frames. ], tot_loss[loss=0.2696, ctc_loss=0.1897, cr_loss=0.3996, over 4086150.72 frames. ], batch size: 61, lr: 8.57e-03, grad_scale: 16.0 2024-09-14 18:24:46,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=166421.5, ans=0.015 2024-09-14 18:25:19,359 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.148e+02 2.314e+02 2.586e+02 5.166e+02, threshold=4.628e+02, percent-clipped=1.0 2024-09-14 18:25:28,471 INFO [train.py:1198] (0/2) Epoch 10, batch 1250, loss[loss=0.2447, ctc_loss=0.1726, cr_loss=0.3606, over 21024.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1889, cr_loss=0.3981, over 4096665.50 frames. ], batch size: 52, lr: 8.57e-03, grad_scale: 16.0 2024-09-14 18:25:42,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=166534.83333333334, ans=0.0 2024-09-14 18:26:03,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-09-14 18:26:06,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=166563.16666666666, ans=0.0 2024-09-14 18:26:27,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=166619.83333333334, ans=0.025 2024-09-14 18:26:43,513 INFO [train.py:1198] (0/2) Epoch 10, batch 1300, loss[loss=0.2389, ctc_loss=0.1634, cr_loss=0.3776, over 20977.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1891, cr_loss=0.3997, over 4101932.71 frames. ], batch size: 49, lr: 8.56e-03, grad_scale: 16.0 2024-09-14 18:26:48,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=166648.16666666666, ans=0.125 2024-09-14 18:26:53,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2024-09-14 18:27:02,163 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=22.5 2024-09-14 18:27:24,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=166704.83333333334, ans=0.0 2024-09-14 18:27:28,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=166733.16666666666, ans=0.025 2024-09-14 18:27:49,934 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 1.999e+02 2.184e+02 2.416e+02 4.144e+02, threshold=4.367e+02, percent-clipped=0.0 2024-09-14 18:27:51,866 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 18:28:00,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166789.83333333334, ans=0.1 2024-09-14 18:28:01,701 INFO [train.py:1198] (0/2) Epoch 10, batch 1350, loss[loss=0.359, ctc_loss=0.2688, cr_loss=0.4508, over 14405.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1882, cr_loss=0.398, over 4098775.09 frames. ], batch size: 150, lr: 8.56e-03, grad_scale: 16.0 2024-09-14 18:29:16,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=166931.5, ans=0.125 2024-09-14 18:29:17,902 INFO [train.py:1198] (0/2) Epoch 10, batch 1400, loss[loss=0.24, ctc_loss=0.1656, cr_loss=0.3722, over 20933.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1882, cr_loss=0.3985, over 4099415.44 frames. ], batch size: 50, lr: 8.56e-03, grad_scale: 16.0 2024-09-14 18:30:23,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.093e+02 2.269e+02 2.508e+02 4.820e+02, threshold=4.538e+02, percent-clipped=2.0 2024-09-14 18:30:35,908 INFO [train.py:1198] (0/2) Epoch 10, batch 1450, loss[loss=0.2331, ctc_loss=0.1641, cr_loss=0.345, over 20926.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1886, cr_loss=0.3978, over 4095077.36 frames. ], batch size: 49, lr: 8.55e-03, grad_scale: 16.0 2024-09-14 18:30:56,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=8.0 2024-09-14 18:31:24,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=167158.16666666666, ans=0.125 2024-09-14 18:31:34,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=167158.16666666666, ans=0.125 2024-09-14 18:31:35,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=167186.5, ans=0.125 2024-09-14 18:31:51,558 INFO [train.py:1198] (0/2) Epoch 10, batch 1500, loss[loss=0.2677, ctc_loss=0.1863, cr_loss=0.4071, over 20817.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.1872, cr_loss=0.3962, over 4093231.89 frames. ], batch size: 59, lr: 8.55e-03, grad_scale: 16.0 2024-09-14 18:32:43,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=167299.83333333334, ans=0.0 2024-09-14 18:32:57,948 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.026e+02 2.192e+02 2.462e+02 4.090e+02, threshold=4.385e+02, percent-clipped=0.0 2024-09-14 18:33:07,329 INFO [train.py:1198] (0/2) Epoch 10, batch 1550, loss[loss=0.3293, ctc_loss=0.2437, cr_loss=0.4284, over 14063.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1874, cr_loss=0.3966, over 4096745.18 frames. ], batch size: 149, lr: 8.55e-03, grad_scale: 16.0 2024-09-14 18:33:09,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=167356.5, ans=0.0 2024-09-14 18:33:39,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=167413.16666666666, ans=0.0 2024-09-14 18:33:41,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=167413.16666666666, ans=0.125 2024-09-14 18:33:43,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=167413.16666666666, ans=0.2 2024-09-14 18:33:52,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=167413.16666666666, ans=0.125 2024-09-14 18:34:25,387 INFO [train.py:1198] (0/2) Epoch 10, batch 1600, loss[loss=0.3181, ctc_loss=0.2335, cr_loss=0.423, over 19343.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1878, cr_loss=0.3971, over 4097199.33 frames. ], batch size: 90, lr: 8.54e-03, grad_scale: 32.0 2024-09-14 18:34:48,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=167526.5, ans=0.125 2024-09-14 18:35:00,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=167554.83333333334, ans=0.0 2024-09-14 18:35:10,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2024-09-14 18:35:31,597 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.651e+02 2.028e+02 2.194e+02 2.393e+02 3.990e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-14 18:35:33,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=167611.5, ans=0.04949747468305833 2024-09-14 18:35:40,745 INFO [train.py:1198] (0/2) Epoch 10, batch 1650, loss[loss=0.2211, ctc_loss=0.1496, cr_loss=0.3578, over 21054.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.1881, cr_loss=0.3982, over 4100543.90 frames. ], batch size: 53, lr: 8.54e-03, grad_scale: 32.0 2024-09-14 18:35:47,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=167639.83333333334, ans=0.2 2024-09-14 18:36:09,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=167696.5, ans=0.2 2024-09-14 18:36:30,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167724.83333333334, ans=0.1 2024-09-14 18:36:57,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=167781.5, ans=0.0 2024-09-14 18:36:58,638 INFO [train.py:1198] (0/2) Epoch 10, batch 1700, loss[loss=0.2603, ctc_loss=0.1821, cr_loss=0.391, over 20783.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1876, cr_loss=0.397, over 4107680.22 frames. ], batch size: 53, lr: 8.54e-03, grad_scale: 32.0 2024-09-14 18:37:04,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=167781.5, ans=0.0 2024-09-14 18:37:16,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.18 vs. limit=10.0 2024-09-14 18:37:16,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=167809.83333333334, ans=0.025 2024-09-14 18:37:36,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=167838.16666666666, ans=0.07 2024-09-14 18:37:40,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.34 vs. limit=22.5 2024-09-14 18:38:00,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167894.83333333334, ans=0.1 2024-09-14 18:38:04,326 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.085e+02 2.242e+02 2.497e+02 4.815e+02, threshold=4.485e+02, percent-clipped=1.0 2024-09-14 18:38:13,436 INFO [train.py:1198] (0/2) Epoch 10, batch 1750, loss[loss=0.2581, ctc_loss=0.1785, cr_loss=0.3981, over 20775.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1876, cr_loss=0.3971, over 4112090.43 frames. ], batch size: 53, lr: 8.53e-03, grad_scale: 32.0 2024-09-14 18:38:30,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167951.5, ans=0.1 2024-09-14 18:39:00,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168008.16666666666, ans=0.1 2024-09-14 18:39:13,244 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.96 vs. limit=12.0 2024-09-14 18:39:28,462 INFO [train.py:1198] (0/2) Epoch 10, batch 1800, loss[loss=0.2219, ctc_loss=0.1506, cr_loss=0.3568, over 20958.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1879, cr_loss=0.3977, over 4101468.83 frames. ], batch size: 50, lr: 8.53e-03, grad_scale: 32.0 2024-09-14 18:39:28,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=168064.83333333334, ans=0.125 2024-09-14 18:39:56,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=168093.16666666666, ans=0.2 2024-09-14 18:40:17,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=168149.83333333334, ans=0.125 2024-09-14 18:40:18,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=168149.83333333334, ans=0.125 2024-09-14 18:40:28,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=168149.83333333334, ans=0.1 2024-09-14 18:40:37,571 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.059e+02 2.228e+02 2.512e+02 3.783e+02, threshold=4.456e+02, percent-clipped=0.0 2024-09-14 18:40:46,474 INFO [train.py:1198] (0/2) Epoch 10, batch 1850, loss[loss=0.3057, ctc_loss=0.2164, cr_loss=0.4467, over 19981.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1876, cr_loss=0.3971, over 4109689.40 frames. ], batch size: 80, lr: 8.53e-03, grad_scale: 32.0 2024-09-14 18:41:26,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2024-09-14 18:41:35,787 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=15.0 2024-09-14 18:41:36,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=168291.5, ans=0.025 2024-09-14 18:41:53,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=168319.83333333334, ans=6.0 2024-09-14 18:42:04,724 INFO [train.py:1198] (0/2) Epoch 10, batch 1900, loss[loss=0.2561, ctc_loss=0.1775, cr_loss=0.3927, over 20942.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.187, cr_loss=0.397, over 4112716.78 frames. ], batch size: 60, lr: 8.52e-03, grad_scale: 32.0 2024-09-14 18:42:08,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=168348.16666666666, ans=0.125 2024-09-14 18:42:39,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=168404.83333333334, ans=0.125 2024-09-14 18:42:39,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=168404.83333333334, ans=0.125 2024-09-14 18:43:10,841 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.072e+02 2.257e+02 2.592e+02 3.635e+02, threshold=4.515e+02, percent-clipped=0.0 2024-09-14 18:43:11,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=168461.5, ans=0.125 2024-09-14 18:43:11,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=168461.5, ans=0.125 2024-09-14 18:43:14,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=168461.5, ans=0.125 2024-09-14 18:43:19,881 INFO [train.py:1198] (0/2) Epoch 10, batch 1950, loss[loss=0.2896, ctc_loss=0.2033, cr_loss=0.4316, over 20988.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1873, cr_loss=0.3976, over 4111428.42 frames. ], batch size: 61, lr: 8.52e-03, grad_scale: 32.0 2024-09-14 18:43:26,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-09-14 18:43:46,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-09-14 18:43:58,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=168546.5, ans=0.125 2024-09-14 18:44:11,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=168574.83333333334, ans=0.0 2024-09-14 18:44:18,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=168574.83333333334, ans=0.2 2024-09-14 18:44:31,984 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-14 18:44:35,886 INFO [train.py:1198] (0/2) Epoch 10, batch 2000, loss[loss=0.28, ctc_loss=0.1988, cr_loss=0.406, over 21054.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1881, cr_loss=0.3985, over 4104810.22 frames. ], batch size: 56, lr: 8.51e-03, grad_scale: 32.0 2024-09-14 18:44:54,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=168659.83333333334, ans=0.125 2024-09-14 18:45:06,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=168688.16666666666, ans=0.2 2024-09-14 18:45:22,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=168716.5, ans=0.2 2024-09-14 18:45:25,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=168716.5, ans=0.125 2024-09-14 18:45:32,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-14 18:45:45,026 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.076e+02 2.278e+02 2.528e+02 4.492e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-14 18:45:54,342 INFO [train.py:1198] (0/2) Epoch 10, batch 2050, loss[loss=0.2246, ctc_loss=0.1542, cr_loss=0.3517, over 20989.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.1881, cr_loss=0.3978, over 4106543.51 frames. ], batch size: 49, lr: 8.51e-03, grad_scale: 32.0 2024-09-14 18:46:08,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=168801.5, ans=0.125 2024-09-14 18:46:29,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=168829.83333333334, ans=0.125 2024-09-14 18:47:09,609 INFO [train.py:1198] (0/2) Epoch 10, batch 2100, loss[loss=0.2828, ctc_loss=0.2012, cr_loss=0.408, over 20756.00 frames. ], tot_loss[loss=0.2689, ctc_loss=0.189, cr_loss=0.3995, over 4102454.94 frames. ], batch size: 71, lr: 8.51e-03, grad_scale: 32.0 2024-09-14 18:47:24,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=168943.16666666666, ans=0.125 2024-09-14 18:47:47,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=168971.5, ans=0.0 2024-09-14 18:48:18,242 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.092e+02 2.264e+02 2.617e+02 3.907e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-14 18:48:22,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.56 vs. limit=15.0 2024-09-14 18:48:27,062 INFO [train.py:1198] (0/2) Epoch 10, batch 2150, loss[loss=0.2787, ctc_loss=0.1988, cr_loss=0.3995, over 20351.00 frames. ], tot_loss[loss=0.2692, ctc_loss=0.1894, cr_loss=0.3994, over 4095138.64 frames. ], batch size: 74, lr: 8.50e-03, grad_scale: 32.0 2024-09-14 18:48:29,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=169056.5, ans=0.0 2024-09-14 18:48:45,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169084.83333333334, ans=0.1 2024-09-14 18:48:45,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=169084.83333333334, ans=0.0 2024-09-14 18:49:20,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169141.5, ans=0.1 2024-09-14 18:49:42,799 INFO [train.py:1198] (0/2) Epoch 10, batch 2200, loss[loss=0.2462, ctc_loss=0.1695, cr_loss=0.3832, over 20958.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1888, cr_loss=0.3986, over 4083431.16 frames. ], batch size: 51, lr: 8.50e-03, grad_scale: 32.0 2024-09-14 18:49:48,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-14 18:50:05,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=169226.5, ans=0.0 2024-09-14 18:50:14,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=169254.83333333334, ans=0.0 2024-09-14 18:50:48,874 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.073e+02 2.344e+02 2.630e+02 4.263e+02, threshold=4.688e+02, percent-clipped=0.0 2024-09-14 18:51:00,464 INFO [train.py:1198] (0/2) Epoch 10, batch 2250, loss[loss=0.2592, ctc_loss=0.1801, cr_loss=0.3957, over 20779.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.1882, cr_loss=0.3978, over 4085627.21 frames. ], batch size: 53, lr: 8.50e-03, grad_scale: 32.0 2024-09-14 18:51:15,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=169368.16666666666, ans=0.0 2024-09-14 18:51:46,649 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=22.5 2024-09-14 18:52:03,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-14 18:52:04,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=169453.16666666666, ans=0.0 2024-09-14 18:52:12,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=169453.16666666666, ans=0.125 2024-09-14 18:52:16,356 INFO [train.py:1198] (0/2) Epoch 10, batch 2300, loss[loss=0.2632, ctc_loss=0.1835, cr_loss=0.3987, over 20980.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1888, cr_loss=0.3983, over 4079007.14 frames. ], batch size: 64, lr: 8.49e-03, grad_scale: 32.0 2024-09-14 18:52:21,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=169481.5, ans=0.125 2024-09-14 18:52:31,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=169509.83333333334, ans=0.125 2024-09-14 18:53:25,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.645e+02 2.093e+02 2.291e+02 2.530e+02 4.547e+02, threshold=4.582e+02, percent-clipped=0.0 2024-09-14 18:53:30,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=169594.83333333334, ans=0.125 2024-09-14 18:53:34,718 INFO [train.py:1198] (0/2) Epoch 10, batch 2350, loss[loss=0.2813, ctc_loss=0.201, cr_loss=0.4011, over 20981.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1894, cr_loss=0.3987, over 4084916.46 frames. ], batch size: 67, lr: 8.49e-03, grad_scale: 32.0 2024-09-14 18:53:38,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=169623.16666666666, ans=0.125 2024-09-14 18:53:41,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=169623.16666666666, ans=0.0 2024-09-14 18:53:45,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=169623.16666666666, ans=0.07 2024-09-14 18:53:54,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=15.0 2024-09-14 18:54:15,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-09-14 18:54:34,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=169736.5, ans=0.2 2024-09-14 18:54:49,737 INFO [train.py:1198] (0/2) Epoch 10, batch 2400, loss[loss=0.2808, ctc_loss=0.1951, cr_loss=0.4287, over 20970.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1894, cr_loss=0.3984, over 4086704.16 frames. ], batch size: 58, lr: 8.49e-03, grad_scale: 32.0 2024-09-14 18:55:53,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=169878.16666666666, ans=0.05 2024-09-14 18:55:56,281 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.735e+02 2.059e+02 2.222e+02 2.398e+02 4.370e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-14 18:56:05,276 INFO [train.py:1198] (0/2) Epoch 10, batch 2450, loss[loss=0.2713, ctc_loss=0.1903, cr_loss=0.405, over 20882.00 frames. ], tot_loss[loss=0.2692, ctc_loss=0.1894, cr_loss=0.3991, over 4093555.25 frames. ], batch size: 54, lr: 8.48e-03, grad_scale: 32.0 2024-09-14 18:56:07,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=169906.5, ans=0.125 2024-09-14 18:56:56,291 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-60000.pt 2024-09-14 18:57:23,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=170048.16666666666, ans=0.0 2024-09-14 18:57:24,242 INFO [train.py:1198] (0/2) Epoch 10, batch 2500, loss[loss=0.2995, ctc_loss=0.2123, cr_loss=0.4358, over 20644.00 frames. ], tot_loss[loss=0.2696, ctc_loss=0.1898, cr_loss=0.3987, over 4087578.29 frames. ], batch size: 66, lr: 8.48e-03, grad_scale: 32.0 2024-09-14 18:58:09,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=170133.16666666666, ans=0.125 2024-09-14 18:58:16,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=170133.16666666666, ans=0.125 2024-09-14 18:58:21,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=170133.16666666666, ans=0.025 2024-09-14 18:58:29,996 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.099e+02 2.284e+02 2.517e+02 7.014e+02, threshold=4.569e+02, percent-clipped=1.0 2024-09-14 18:58:39,173 INFO [train.py:1198] (0/2) Epoch 10, batch 2550, loss[loss=0.2333, ctc_loss=0.1603, cr_loss=0.3654, over 20982.00 frames. ], tot_loss[loss=0.2692, ctc_loss=0.1896, cr_loss=0.3982, over 4074987.30 frames. ], batch size: 50, lr: 8.48e-03, grad_scale: 32.0 2024-09-14 18:59:38,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=170274.83333333334, ans=0.0 2024-09-14 18:59:52,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2024-09-14 18:59:57,388 INFO [train.py:1198] (0/2) Epoch 10, batch 2600, loss[loss=0.2933, ctc_loss=0.2073, cr_loss=0.4299, over 20335.00 frames. ], tot_loss[loss=0.2701, ctc_loss=0.19, cr_loss=0.4004, over 4083597.42 frames. ], batch size: 74, lr: 8.47e-03, grad_scale: 32.0 2024-09-14 18:59:59,645 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2024-09-14 19:00:18,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=170359.83333333334, ans=0.2 2024-09-14 19:00:20,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=170359.83333333334, ans=0.2 2024-09-14 19:00:24,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=170359.83333333334, ans=0.125 2024-09-14 19:00:38,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170388.16666666666, ans=0.1 2024-09-14 19:00:47,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=170416.5, ans=0.125 2024-09-14 19:01:03,416 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.025e+02 2.201e+02 2.372e+02 4.038e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-14 19:01:12,373 INFO [train.py:1198] (0/2) Epoch 10, batch 2650, loss[loss=0.245, ctc_loss=0.1727, cr_loss=0.3616, over 21062.00 frames. ], tot_loss[loss=0.2694, ctc_loss=0.1894, cr_loss=0.3999, over 4094308.33 frames. ], batch size: 53, lr: 8.47e-03, grad_scale: 32.0 2024-09-14 19:01:14,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=170473.16666666666, ans=0.0 2024-09-14 19:01:24,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=170473.16666666666, ans=0.0 2024-09-14 19:01:42,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=170529.83333333334, ans=0.125 2024-09-14 19:01:45,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=170529.83333333334, ans=0.07 2024-09-14 19:02:03,127 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=22.5 2024-09-14 19:02:07,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170558.16666666666, ans=0.1 2024-09-14 19:02:30,996 INFO [train.py:1198] (0/2) Epoch 10, batch 2700, loss[loss=0.2296, ctc_loss=0.1586, cr_loss=0.3549, over 19921.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1879, cr_loss=0.3982, over 4095549.58 frames. ], batch size: 44, lr: 8.47e-03, grad_scale: 32.0 2024-09-14 19:02:49,812 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=22.5 2024-09-14 19:03:29,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=170728.16666666666, ans=0.125 2024-09-14 19:03:32,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=170728.16666666666, ans=0.2 2024-09-14 19:03:36,838 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.708e+02 2.074e+02 2.227e+02 2.481e+02 4.375e+02, threshold=4.453e+02, percent-clipped=0.0 2024-09-14 19:03:45,659 INFO [train.py:1198] (0/2) Epoch 10, batch 2750, loss[loss=0.3061, ctc_loss=0.217, cr_loss=0.4457, over 20605.00 frames. ], tot_loss[loss=0.2684, ctc_loss=0.1886, cr_loss=0.3991, over 4087637.21 frames. ], batch size: 68, lr: 8.46e-03, grad_scale: 32.0 2024-09-14 19:03:45,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=170756.5, ans=0.125 2024-09-14 19:03:51,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=170756.5, ans=0.05 2024-09-14 19:04:19,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=170813.16666666666, ans=0.125 2024-09-14 19:05:04,423 INFO [train.py:1198] (0/2) Epoch 10, batch 2800, loss[loss=0.2426, ctc_loss=0.1666, cr_loss=0.3803, over 20790.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1891, cr_loss=0.4002, over 4095645.75 frames. ], batch size: 53, lr: 8.46e-03, grad_scale: 32.0 2024-09-14 19:05:31,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170926.5, ans=0.1 2024-09-14 19:05:37,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170954.83333333334, ans=0.1 2024-09-14 19:05:52,049 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.00 vs. limit=15.0 2024-09-14 19:06:06,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=171011.5, ans=0.0 2024-09-14 19:06:10,547 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.172e+02 2.467e+02 2.780e+02 3.931e+02, threshold=4.934e+02, percent-clipped=0.0 2024-09-14 19:06:19,686 INFO [train.py:1198] (0/2) Epoch 10, batch 2850, loss[loss=0.272, ctc_loss=0.1941, cr_loss=0.3896, over 19390.00 frames. ], tot_loss[loss=0.2698, ctc_loss=0.1897, cr_loss=0.4008, over 4092085.78 frames. ], batch size: 90, lr: 8.46e-03, grad_scale: 32.0 2024-09-14 19:06:27,908 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=12.0 2024-09-14 19:06:35,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=171068.16666666666, ans=0.05 2024-09-14 19:06:59,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=171096.5, ans=0.025 2024-09-14 19:07:08,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=171124.83333333334, ans=0.1 2024-09-14 19:07:20,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=171153.16666666666, ans=0.025 2024-09-14 19:07:28,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=171153.16666666666, ans=0.125 2024-09-14 19:07:35,009 INFO [train.py:1198] (0/2) Epoch 10, batch 2900, loss[loss=0.2496, ctc_loss=0.1698, cr_loss=0.3988, over 20826.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1891, cr_loss=0.3998, over 4092948.19 frames. ], batch size: 59, lr: 8.45e-03, grad_scale: 32.0 2024-09-14 19:08:23,837 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.72 vs. limit=15.0 2024-09-14 19:08:26,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=171266.5, ans=0.125 2024-09-14 19:08:32,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=171266.5, ans=0.125 2024-09-14 19:08:42,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=171294.83333333334, ans=0.0 2024-09-14 19:08:44,175 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.114e+02 2.226e+02 2.484e+02 4.067e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-14 19:08:46,617 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.72 vs. limit=15.0 2024-09-14 19:08:53,120 INFO [train.py:1198] (0/2) Epoch 10, batch 2950, loss[loss=0.2709, ctc_loss=0.1882, cr_loss=0.4134, over 21041.00 frames. ], tot_loss[loss=0.2693, ctc_loss=0.1894, cr_loss=0.3996, over 4083101.37 frames. ], batch size: 62, lr: 8.45e-03, grad_scale: 32.0 2024-09-14 19:09:05,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=171323.16666666666, ans=0.125 2024-09-14 19:09:29,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=171379.83333333334, ans=0.125 2024-09-14 19:09:49,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=171408.16666666666, ans=0.0 2024-09-14 19:09:53,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=171436.5, ans=0.125 2024-09-14 19:10:08,293 INFO [train.py:1198] (0/2) Epoch 10, batch 3000, loss[loss=0.2983, ctc_loss=0.2097, cr_loss=0.4428, over 20655.00 frames. ], tot_loss[loss=0.27, ctc_loss=0.1899, cr_loss=0.4003, over 4085714.88 frames. ], batch size: 66, lr: 8.45e-03, grad_scale: 64.0 2024-09-14 19:10:08,294 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 19:10:38,152 INFO [train.py:1230] (0/2) Epoch 10, validation: loss=0.05315, ctc_loss=0.05315, cr_loss=9.745e-15, over 944034.00 frames. 2024-09-14 19:10:38,153 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 19:10:47,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=171464.83333333334, ans=0.125 2024-09-14 19:11:05,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=171493.16666666666, ans=0.125 2024-09-14 19:11:15,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.25 vs. limit=22.5 2024-09-14 19:11:44,790 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.037e+02 2.200e+02 2.527e+02 3.549e+02, threshold=4.400e+02, percent-clipped=0.0 2024-09-14 19:11:48,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=171578.16666666666, ans=0.0 2024-09-14 19:11:52,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2024-09-14 19:11:53,675 INFO [train.py:1198] (0/2) Epoch 10, batch 3050, loss[loss=0.2339, ctc_loss=0.1627, cr_loss=0.3562, over 20973.00 frames. ], tot_loss[loss=0.2686, ctc_loss=0.1887, cr_loss=0.3992, over 4083062.09 frames. ], batch size: 52, lr: 8.44e-03, grad_scale: 64.0 2024-09-14 19:11:53,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=171606.5, ans=0.1 2024-09-14 19:12:50,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=171691.5, ans=0.0 2024-09-14 19:12:57,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=171719.83333333334, ans=0.0 2024-09-14 19:13:03,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-14 19:13:09,542 INFO [train.py:1198] (0/2) Epoch 10, batch 3100, loss[loss=0.283, ctc_loss=0.2049, cr_loss=0.3903, over 20667.00 frames. ], tot_loss[loss=0.2682, ctc_loss=0.1886, cr_loss=0.3981, over 4084016.50 frames. ], batch size: 68, lr: 8.44e-03, grad_scale: 32.0 2024-09-14 19:13:14,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=171748.16666666666, ans=0.2 2024-09-14 19:13:42,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=171804.83333333334, ans=0.2 2024-09-14 19:14:08,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=171833.16666666666, ans=0.125 2024-09-14 19:14:10,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=171833.16666666666, ans=0.0 2024-09-14 19:14:20,587 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.074e+02 2.221e+02 2.474e+02 4.054e+02, threshold=4.441e+02, percent-clipped=0.0 2024-09-14 19:14:28,092 INFO [train.py:1198] (0/2) Epoch 10, batch 3150, loss[loss=0.2711, ctc_loss=0.1896, cr_loss=0.4074, over 21027.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.188, cr_loss=0.3984, over 4089362.26 frames. ], batch size: 61, lr: 8.44e-03, grad_scale: 32.0 2024-09-14 19:15:36,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=172003.16666666666, ans=0.0 2024-09-14 19:15:43,838 INFO [train.py:1198] (0/2) Epoch 10, batch 3200, loss[loss=0.2971, ctc_loss=0.21, cr_loss=0.4351, over 20631.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1879, cr_loss=0.3975, over 4092281.74 frames. ], batch size: 71, lr: 8.43e-03, grad_scale: 32.0 2024-09-14 19:15:44,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=172031.5, ans=0.0 2024-09-14 19:15:47,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=172031.5, ans=0.5 2024-09-14 19:16:09,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=172059.83333333334, ans=0.125 2024-09-14 19:16:16,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=172088.16666666666, ans=0.125 2024-09-14 19:16:34,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2024-09-14 19:16:35,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=172116.5, ans=0.0 2024-09-14 19:16:54,576 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.000e+02 2.171e+02 2.341e+02 6.422e+02, threshold=4.341e+02, percent-clipped=1.0 2024-09-14 19:17:00,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=172173.16666666666, ans=0.0 2024-09-14 19:17:01,976 INFO [train.py:1198] (0/2) Epoch 10, batch 3250, loss[loss=0.2859, ctc_loss=0.2041, cr_loss=0.4093, over 20999.00 frames. ], tot_loss[loss=0.2686, ctc_loss=0.1889, cr_loss=0.3983, over 4079865.13 frames. ], batch size: 61, lr: 8.43e-03, grad_scale: 32.0 2024-09-14 19:17:06,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172173.16666666666, ans=0.1 2024-09-14 19:17:09,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=172173.16666666666, ans=0.5 2024-09-14 19:17:09,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=172173.16666666666, ans=0.0 2024-09-14 19:17:12,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=172173.16666666666, ans=0.2 2024-09-14 19:18:16,882 INFO [train.py:1198] (0/2) Epoch 10, batch 3300, loss[loss=0.2771, ctc_loss=0.1919, cr_loss=0.4263, over 20823.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1886, cr_loss=0.3995, over 4099617.57 frames. ], batch size: 59, lr: 8.42e-03, grad_scale: 32.0 2024-09-14 19:19:14,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172399.83333333334, ans=0.1 2024-09-14 19:19:17,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2024-09-14 19:19:19,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=172428.16666666666, ans=0.0 2024-09-14 19:19:27,016 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.081e+02 2.217e+02 2.391e+02 3.363e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-14 19:19:31,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.61 vs. limit=22.5 2024-09-14 19:19:34,469 INFO [train.py:1198] (0/2) Epoch 10, batch 3350, loss[loss=0.2903, ctc_loss=0.2092, cr_loss=0.4052, over 20980.00 frames. ], tot_loss[loss=0.2711, ctc_loss=0.1906, cr_loss=0.4023, over 4087137.16 frames. ], batch size: 64, lr: 8.42e-03, grad_scale: 32.0 2024-09-14 19:19:49,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=172484.83333333334, ans=0.125 2024-09-14 19:20:03,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=172513.16666666666, ans=0.125 2024-09-14 19:20:19,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=172541.5, ans=0.125 2024-09-14 19:20:29,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=172541.5, ans=0.125 2024-09-14 19:20:47,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=172569.83333333334, ans=0.2 2024-09-14 19:20:49,623 INFO [train.py:1198] (0/2) Epoch 10, batch 3400, loss[loss=0.303, ctc_loss=0.2143, cr_loss=0.4435, over 20940.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.1901, cr_loss=0.402, over 4098412.75 frames. ], batch size: 60, lr: 8.42e-03, grad_scale: 32.0 2024-09-14 19:20:55,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=172598.16666666666, ans=0.1 2024-09-14 19:21:33,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=172683.16666666666, ans=0.125 2024-09-14 19:21:56,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172711.5, ans=0.1 2024-09-14 19:22:00,339 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.066e+02 2.255e+02 2.634e+02 4.666e+02, threshold=4.510e+02, percent-clipped=1.0 2024-09-14 19:22:07,985 INFO [train.py:1198] (0/2) Epoch 10, batch 3450, loss[loss=0.258, ctc_loss=0.1834, cr_loss=0.3728, over 20985.00 frames. ], tot_loss[loss=0.2711, ctc_loss=0.1905, cr_loss=0.4029, over 4086728.04 frames. ], batch size: 48, lr: 8.41e-03, grad_scale: 32.0 2024-09-14 19:22:10,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=22.5 2024-09-14 19:22:11,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=172739.83333333334, ans=0.5 2024-09-14 19:22:14,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172739.83333333334, ans=0.1 2024-09-14 19:22:35,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=172768.16666666666, ans=0.0 2024-09-14 19:22:47,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=172796.5, ans=0.125 2024-09-14 19:23:01,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=22.5 2024-09-14 19:23:08,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=172853.16666666666, ans=0.125 2024-09-14 19:23:20,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=172853.16666666666, ans=0.2 2024-09-14 19:23:23,208 INFO [train.py:1198] (0/2) Epoch 10, batch 3500, loss[loss=0.2357, ctc_loss=0.1635, cr_loss=0.3611, over 21062.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1906, cr_loss=0.402, over 4077073.69 frames. ], batch size: 53, lr: 8.41e-03, grad_scale: 32.0 2024-09-14 19:23:31,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172881.5, ans=0.1 2024-09-14 19:23:36,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-09-14 19:23:53,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=172938.16666666666, ans=0.0 2024-09-14 19:23:53,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=172938.16666666666, ans=0.125 2024-09-14 19:24:10,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=172966.5, ans=0.2 2024-09-14 19:24:31,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.075e+02 2.327e+02 2.771e+02 5.529e+02, threshold=4.654e+02, percent-clipped=2.0 2024-09-14 19:24:38,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=173023.16666666666, ans=0.125 2024-09-14 19:24:39,153 INFO [train.py:1198] (0/2) Epoch 10, batch 3550, loss[loss=0.2841, ctc_loss=0.1978, cr_loss=0.4314, over 21069.00 frames. ], tot_loss[loss=0.2702, ctc_loss=0.19, cr_loss=0.4008, over 4084740.96 frames. ], batch size: 59, lr: 8.41e-03, grad_scale: 32.0 2024-09-14 19:24:48,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=173023.16666666666, ans=0.0 2024-09-14 19:25:00,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=173051.5, ans=0.0 2024-09-14 19:25:34,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=173108.16666666666, ans=0.0 2024-09-14 19:25:39,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=173108.16666666666, ans=0.125 2024-09-14 19:25:56,780 INFO [train.py:1198] (0/2) Epoch 10, batch 3600, loss[loss=0.2215, ctc_loss=0.1479, cr_loss=0.3678, over 19843.00 frames. ], tot_loss[loss=0.2707, ctc_loss=0.1905, cr_loss=0.4006, over 4071521.18 frames. ], batch size: 44, lr: 8.40e-03, grad_scale: 32.0 2024-09-14 19:26:01,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-09-14 19:26:09,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=173164.83333333334, ans=0.0 2024-09-14 19:26:16,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=173193.16666666666, ans=0.0 2024-09-14 19:26:21,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=173193.16666666666, ans=0.125 2024-09-14 19:26:42,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173249.83333333334, ans=0.1 2024-09-14 19:26:48,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=173249.83333333334, ans=0.025 2024-09-14 19:27:05,175 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.702e+02 2.101e+02 2.294e+02 2.862e+02 4.439e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-14 19:27:12,636 INFO [train.py:1198] (0/2) Epoch 10, batch 3650, loss[loss=0.2967, ctc_loss=0.2078, cr_loss=0.4443, over 20884.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1903, cr_loss=0.4013, over 4086345.87 frames. ], batch size: 57, lr: 8.40e-03, grad_scale: 32.0 2024-09-14 19:27:27,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173306.5, ans=0.1 2024-09-14 19:27:57,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=173363.16666666666, ans=0.0 2024-09-14 19:28:04,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=173391.5, ans=0.0 2024-09-14 19:28:07,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=12.0 2024-09-14 19:28:08,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=173391.5, ans=15.0 2024-09-14 19:28:11,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=173391.5, ans=0.0 2024-09-14 19:28:30,675 INFO [train.py:1198] (0/2) Epoch 10, batch 3700, loss[loss=0.3008, ctc_loss=0.2144, cr_loss=0.4318, over 18285.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1904, cr_loss=0.4013, over 4087352.68 frames. ], batch size: 108, lr: 8.40e-03, grad_scale: 32.0 2024-09-14 19:28:30,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=173448.16666666666, ans=0.5 2024-09-14 19:28:32,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=173448.16666666666, ans=0.0 2024-09-14 19:28:32,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=22.5 2024-09-14 19:29:09,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-14 19:29:10,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=173504.83333333334, ans=0.0 2024-09-14 19:29:16,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=173533.16666666666, ans=0.125 2024-09-14 19:29:26,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=173533.16666666666, ans=0.125 2024-09-14 19:29:35,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=173561.5, ans=0.2 2024-09-14 19:29:36,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2024-09-14 19:29:38,458 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.120e+02 2.269e+02 2.497e+02 3.790e+02, threshold=4.538e+02, percent-clipped=0.0 2024-09-14 19:29:45,952 INFO [train.py:1198] (0/2) Epoch 10, batch 3750, loss[loss=0.2083, ctc_loss=0.1436, cr_loss=0.3235, over 20008.00 frames. ], tot_loss[loss=0.2693, ctc_loss=0.1893, cr_loss=0.4001, over 4098262.63 frames. ], batch size: 44, lr: 8.39e-03, grad_scale: 32.0 2024-09-14 19:30:08,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=173618.16666666666, ans=0.125 2024-09-14 19:30:45,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=173674.83333333334, ans=0.125 2024-09-14 19:30:54,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=173703.16666666666, ans=0.125 2024-09-14 19:31:03,476 INFO [train.py:1198] (0/2) Epoch 10, batch 3800, loss[loss=0.2419, ctc_loss=0.1669, cr_loss=0.3748, over 21063.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1881, cr_loss=0.3989, over 4098864.37 frames. ], batch size: 56, lr: 8.39e-03, grad_scale: 16.0 2024-09-14 19:31:17,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=173759.83333333334, ans=0.125 2024-09-14 19:31:47,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=173816.5, ans=0.025 2024-09-14 19:32:12,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=173844.83333333334, ans=0.025 2024-09-14 19:32:13,356 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.052e+02 2.220e+02 2.453e+02 4.923e+02, threshold=4.439e+02, percent-clipped=3.0 2024-09-14 19:32:14,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.08 vs. limit=22.5 2024-09-14 19:32:19,443 INFO [train.py:1198] (0/2) Epoch 10, batch 3850, loss[loss=0.2713, ctc_loss=0.1914, cr_loss=0.3994, over 20978.00 frames. ], tot_loss[loss=0.2682, ctc_loss=0.1884, cr_loss=0.3989, over 4100043.35 frames. ], batch size: 55, lr: 8.39e-03, grad_scale: 16.0 2024-09-14 19:32:46,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=173901.5, ans=0.125 2024-09-14 19:32:52,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=173929.83333333334, ans=0.125 2024-09-14 19:33:05,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=173958.16666666666, ans=0.125 2024-09-14 19:33:18,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2024-09-14 19:33:24,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=173986.5, ans=0.025 2024-09-14 19:33:37,011 INFO [train.py:1198] (0/2) Epoch 10, batch 3900, loss[loss=0.2649, ctc_loss=0.185, cr_loss=0.3998, over 20677.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.1871, cr_loss=0.398, over 4113415.44 frames. ], batch size: 66, lr: 8.38e-03, grad_scale: 16.0 2024-09-14 19:33:46,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=174014.83333333334, ans=0.125 2024-09-14 19:34:14,863 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 19:34:18,125 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=22.5 2024-09-14 19:34:22,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=174099.83333333334, ans=15.0 2024-09-14 19:34:30,129 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 19:34:46,501 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.039e+02 2.151e+02 2.340e+02 3.445e+02, threshold=4.302e+02, percent-clipped=0.0 2024-09-14 19:34:46,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=174128.16666666666, ans=0.2 2024-09-14 19:34:52,782 INFO [train.py:1198] (0/2) Epoch 10, batch 3950, loss[loss=0.2666, ctc_loss=0.1895, cr_loss=0.3857, over 20763.00 frames. ], tot_loss[loss=0.2659, ctc_loss=0.1864, cr_loss=0.3974, over 4115620.11 frames. ], batch size: 53, lr: 8.38e-03, grad_scale: 16.0 2024-09-14 19:35:36,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=174241.5, ans=0.125 2024-09-14 19:35:51,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=174269.83333333334, ans=0.1 2024-09-14 19:36:07,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=174269.83333333334, ans=0.2 2024-09-14 19:36:09,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-09-14 19:36:10,417 INFO [train.py:1198] (0/2) Epoch 10, batch 4000, loss[loss=0.2265, ctc_loss=0.1561, cr_loss=0.3521, over 20989.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1873, cr_loss=0.398, over 4098828.62 frames. ], batch size: 48, lr: 8.38e-03, grad_scale: 32.0 2024-09-14 19:36:27,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=174326.5, ans=0.0 2024-09-14 19:36:43,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=174354.83333333334, ans=0.125 2024-09-14 19:36:51,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=174354.83333333334, ans=0.2 2024-09-14 19:37:03,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=174383.16666666666, ans=0.2 2024-09-14 19:37:20,011 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.060e+02 2.217e+02 2.405e+02 5.104e+02, threshold=4.433e+02, percent-clipped=1.0 2024-09-14 19:37:26,182 INFO [train.py:1198] (0/2) Epoch 10, batch 4050, loss[loss=0.2257, ctc_loss=0.1566, cr_loss=0.3457, over 20960.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.187, cr_loss=0.3971, over 4097251.42 frames. ], batch size: 51, lr: 8.37e-03, grad_scale: 32.0 2024-09-14 19:37:28,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=174439.83333333334, ans=0.07 2024-09-14 19:37:40,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=174468.16666666666, ans=0.125 2024-09-14 19:38:10,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=174524.83333333334, ans=0.025 2024-09-14 19:38:26,454 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2024-09-14 19:38:44,736 INFO [train.py:1198] (0/2) Epoch 10, batch 4100, loss[loss=0.2872, ctc_loss=0.205, cr_loss=0.4109, over 19465.00 frames. ], tot_loss[loss=0.268, ctc_loss=0.1882, cr_loss=0.3989, over 4082349.16 frames. ], batch size: 90, lr: 8.37e-03, grad_scale: 32.0 2024-09-14 19:39:07,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=174609.83333333334, ans=0.125 2024-09-14 19:39:15,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2024-09-14 19:39:19,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174638.16666666666, ans=0.1 2024-09-14 19:39:26,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=174638.16666666666, ans=0.2 2024-09-14 19:39:36,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=174666.5, ans=0.125 2024-09-14 19:39:45,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=174694.83333333334, ans=0.125 2024-09-14 19:39:53,745 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.095e+02 2.241e+02 2.577e+02 5.850e+02, threshold=4.482e+02, percent-clipped=1.0 2024-09-14 19:39:56,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2024-09-14 19:39:59,823 INFO [train.py:1198] (0/2) Epoch 10, batch 4150, loss[loss=0.3017, ctc_loss=0.2102, cr_loss=0.4576, over 20997.00 frames. ], tot_loss[loss=0.2682, ctc_loss=0.1883, cr_loss=0.3997, over 4084924.25 frames. ], batch size: 64, lr: 8.37e-03, grad_scale: 32.0 2024-09-14 19:40:03,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-09-14 19:40:11,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=174723.16666666666, ans=10.0 2024-09-14 19:40:12,569 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.79 vs. limit=15.0 2024-09-14 19:40:16,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.43 vs. limit=15.0 2024-09-14 19:40:35,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=174779.83333333334, ans=0.0 2024-09-14 19:40:42,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=174779.83333333334, ans=0.125 2024-09-14 19:41:11,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174836.5, ans=0.1 2024-09-14 19:41:14,626 INFO [train.py:1198] (0/2) Epoch 10, batch 4200, loss[loss=0.2469, ctc_loss=0.1689, cr_loss=0.3901, over 20772.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1885, cr_loss=0.4002, over 4083799.10 frames. ], batch size: 56, lr: 8.36e-03, grad_scale: 32.0 2024-09-14 19:41:22,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=174864.83333333334, ans=0.125 2024-09-14 19:41:25,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=174864.83333333334, ans=0.125 2024-09-14 19:41:32,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-09-14 19:41:57,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=174921.5, ans=0.125 2024-09-14 19:41:58,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=174921.5, ans=0.04949747468305833 2024-09-14 19:42:13,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=174949.83333333334, ans=0.0 2024-09-14 19:42:26,428 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.047e+02 2.187e+02 2.358e+02 3.971e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-14 19:42:32,497 INFO [train.py:1198] (0/2) Epoch 10, batch 4250, loss[loss=0.2567, ctc_loss=0.1757, cr_loss=0.4054, over 21058.00 frames. ], tot_loss[loss=0.2686, ctc_loss=0.1886, cr_loss=0.4, over 4077908.52 frames. ], batch size: 62, lr: 8.36e-03, grad_scale: 32.0 2024-09-14 19:42:51,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2024-09-14 19:43:21,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2024-09-14 19:43:48,045 INFO [train.py:1198] (0/2) Epoch 10, batch 4300, loss[loss=0.3165, ctc_loss=0.2262, cr_loss=0.4512, over 19626.00 frames. ], tot_loss[loss=0.2692, ctc_loss=0.1891, cr_loss=0.4003, over 4091567.13 frames. ], batch size: 90, lr: 8.36e-03, grad_scale: 32.0 2024-09-14 19:43:48,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=175148.16666666666, ans=0.5 2024-09-14 19:43:49,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=175148.16666666666, ans=0.0 2024-09-14 19:44:15,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=175176.5, ans=0.0 2024-09-14 19:44:17,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=175176.5, ans=0.125 2024-09-14 19:44:25,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=175204.83333333334, ans=0.2 2024-09-14 19:44:58,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=175261.5, ans=0.09899494936611666 2024-09-14 19:44:58,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=175261.5, ans=0.2 2024-09-14 19:45:00,342 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2024-09-14 19:45:01,035 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.125e+02 2.354e+02 2.641e+02 3.768e+02, threshold=4.708e+02, percent-clipped=0.0 2024-09-14 19:45:07,097 INFO [train.py:1198] (0/2) Epoch 10, batch 4350, loss[loss=0.2638, ctc_loss=0.1888, cr_loss=0.3747, over 19361.00 frames. ], tot_loss[loss=0.2693, ctc_loss=0.1894, cr_loss=0.3996, over 4083544.03 frames. ], batch size: 90, lr: 8.35e-03, grad_scale: 32.0 2024-09-14 19:45:21,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=175318.16666666666, ans=0.0 2024-09-14 19:45:52,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=22.5 2024-09-14 19:46:22,105 INFO [train.py:1198] (0/2) Epoch 10, batch 4400, loss[loss=0.2824, ctc_loss=0.2025, cr_loss=0.3995, over 20607.00 frames. ], tot_loss[loss=0.2686, ctc_loss=0.1887, cr_loss=0.3994, over 4092892.14 frames. ], batch size: 75, lr: 8.35e-03, grad_scale: 32.0 2024-09-14 19:46:46,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=175459.83333333334, ans=0.125 2024-09-14 19:46:52,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=175488.16666666666, ans=0.125 2024-09-14 19:47:02,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=175488.16666666666, ans=0.0 2024-09-14 19:47:20,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=175544.83333333334, ans=0.2 2024-09-14 19:47:34,449 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 1.999e+02 2.156e+02 2.318e+02 3.565e+02, threshold=4.312e+02, percent-clipped=0.0 2024-09-14 19:47:40,430 INFO [train.py:1198] (0/2) Epoch 10, batch 4450, loss[loss=0.2603, ctc_loss=0.1814, cr_loss=0.3944, over 20999.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.1878, cr_loss=0.399, over 4096351.11 frames. ], batch size: 63, lr: 8.35e-03, grad_scale: 32.0 2024-09-14 19:48:01,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=175601.5, ans=0.05 2024-09-14 19:48:05,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=22.5 2024-09-14 19:48:44,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=175686.5, ans=0.125 2024-09-14 19:48:48,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=175686.5, ans=0.025 2024-09-14 19:48:55,968 INFO [train.py:1198] (0/2) Epoch 10, batch 4500, loss[loss=0.3128, ctc_loss=0.2158, cr_loss=0.4853, over 20679.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1872, cr_loss=0.3985, over 4104115.90 frames. ], batch size: 68, lr: 8.34e-03, grad_scale: 32.0 2024-09-14 19:49:34,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=175771.5, ans=0.0 2024-09-14 19:49:58,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=175828.16666666666, ans=0.0 2024-09-14 19:50:08,426 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.704e+02 2.063e+02 2.279e+02 2.542e+02 3.668e+02, threshold=4.557e+02, percent-clipped=0.0 2024-09-14 19:50:14,374 INFO [train.py:1198] (0/2) Epoch 10, batch 4550, loss[loss=0.2848, ctc_loss=0.2003, cr_loss=0.4225, over 20702.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1873, cr_loss=0.3991, over 4111747.63 frames. ], batch size: 71, lr: 8.34e-03, grad_scale: 32.0 2024-09-14 19:50:37,820 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-09-14 19:51:00,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=175941.5, ans=0.125 2024-09-14 19:51:06,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=175941.5, ans=0.025 2024-09-14 19:51:11,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-14 19:51:28,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=175998.16666666666, ans=0.125 2024-09-14 19:51:29,712 INFO [train.py:1198] (0/2) Epoch 10, batch 4600, loss[loss=0.305, ctc_loss=0.2169, cr_loss=0.4406, over 20274.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1873, cr_loss=0.3981, over 4100322.33 frames. ], batch size: 74, lr: 8.34e-03, grad_scale: 32.0 2024-09-14 19:51:34,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=175998.16666666666, ans=0.0 2024-09-14 19:51:55,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=176026.5, ans=0.125 2024-09-14 19:52:15,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=176083.16666666666, ans=0.025 2024-09-14 19:52:39,036 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.072e+02 2.240e+02 2.532e+02 4.884e+02, threshold=4.480e+02, percent-clipped=1.0 2024-09-14 19:52:44,905 INFO [train.py:1198] (0/2) Epoch 10, batch 4650, loss[loss=0.2795, ctc_loss=0.1993, cr_loss=0.4006, over 20970.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1875, cr_loss=0.3986, over 4102170.37 frames. ], batch size: 64, lr: 8.33e-03, grad_scale: 32.0 2024-09-14 19:52:51,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=176139.83333333334, ans=0.125 2024-09-14 19:54:00,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=176253.16666666666, ans=0.04949747468305833 2024-09-14 19:54:03,612 INFO [train.py:1198] (0/2) Epoch 10, batch 4700, loss[loss=0.2591, ctc_loss=0.1802, cr_loss=0.3942, over 20829.00 frames. ], tot_loss[loss=0.2666, ctc_loss=0.187, cr_loss=0.3979, over 4101414.42 frames. ], batch size: 59, lr: 8.33e-03, grad_scale: 32.0 2024-09-14 19:54:12,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=176281.5, ans=0.0 2024-09-14 19:54:14,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=176281.5, ans=0.125 2024-09-14 19:54:20,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2024-09-14 19:54:59,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=176366.5, ans=0.0 2024-09-14 19:55:03,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=176394.83333333334, ans=0.125 2024-09-14 19:55:12,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.092e+02 2.276e+02 2.485e+02 3.566e+02, threshold=4.552e+02, percent-clipped=0.0 2024-09-14 19:55:18,471 INFO [train.py:1198] (0/2) Epoch 10, batch 4750, loss[loss=0.2469, ctc_loss=0.1693, cr_loss=0.3884, over 20985.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1879, cr_loss=0.3996, over 4102641.53 frames. ], batch size: 55, lr: 8.33e-03, grad_scale: 32.0 2024-09-14 19:56:12,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=176508.16666666666, ans=0.025 2024-09-14 19:56:36,965 INFO [train.py:1198] (0/2) Epoch 10, batch 4800, loss[loss=0.295, ctc_loss=0.2084, cr_loss=0.4327, over 20844.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.187, cr_loss=0.3988, over 4113569.19 frames. ], batch size: 65, lr: 8.32e-03, grad_scale: 32.0 2024-09-14 19:56:46,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=176564.83333333334, ans=0.125 2024-09-14 19:56:58,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=176593.16666666666, ans=0.0 2024-09-14 19:57:36,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-09-14 19:57:47,555 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.086e+02 2.234e+02 2.429e+02 3.536e+02, threshold=4.468e+02, percent-clipped=0.0 2024-09-14 19:57:51,826 INFO [train.py:1198] (0/2) Epoch 10, batch 4850, loss[loss=0.2513, ctc_loss=0.1768, cr_loss=0.3725, over 20968.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1861, cr_loss=0.3979, over 4114808.87 frames. ], batch size: 50, lr: 8.32e-03, grad_scale: 16.0 2024-09-14 19:58:02,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=176706.5, ans=0.125 2024-09-14 19:58:12,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-09-14 19:58:35,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=176791.5, ans=0.125 2024-09-14 19:58:40,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=176791.5, ans=0.2 2024-09-14 19:59:00,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-09-14 19:59:08,754 INFO [train.py:1198] (0/2) Epoch 10, batch 4900, loss[loss=0.3221, ctc_loss=0.2334, cr_loss=0.4435, over 20105.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1873, cr_loss=0.3984, over 4096174.03 frames. ], batch size: 80, lr: 8.32e-03, grad_scale: 16.0 2024-09-14 19:59:59,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=176933.16666666666, ans=0.0 2024-09-14 20:00:08,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=176961.5, ans=0.04949747468305833 2024-09-14 20:00:18,458 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.033e+02 2.185e+02 2.462e+02 3.596e+02, threshold=4.371e+02, percent-clipped=0.0 2024-09-14 20:00:22,892 INFO [train.py:1198] (0/2) Epoch 10, batch 4950, loss[loss=0.2284, ctc_loss=0.1588, cr_loss=0.348, over 19435.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1878, cr_loss=0.398, over 4093072.12 frames. ], batch size: 43, lr: 8.31e-03, grad_scale: 16.0 2024-09-14 20:00:29,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=176989.83333333334, ans=0.0 2024-09-14 20:00:29,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2024-09-14 20:00:35,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=176989.83333333334, ans=0.125 2024-09-14 20:00:52,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2024-09-14 20:01:13,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=177074.83333333334, ans=10.0 2024-09-14 20:01:19,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=177074.83333333334, ans=0.1 2024-09-14 20:01:37,656 INFO [train.py:1198] (0/2) Epoch 10, batch 5000, loss[loss=0.2242, ctc_loss=0.1572, cr_loss=0.335, over 20962.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1878, cr_loss=0.3982, over 4091557.51 frames. ], batch size: 51, lr: 8.31e-03, grad_scale: 16.0 2024-09-14 20:02:35,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=177244.83333333334, ans=0.2 2024-09-14 20:02:47,130 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.092e+02 2.222e+02 2.420e+02 3.214e+02, threshold=4.444e+02, percent-clipped=0.0 2024-09-14 20:02:51,624 INFO [train.py:1198] (0/2) Epoch 10, batch 5050, loss[loss=0.3229, ctc_loss=0.2346, cr_loss=0.4412, over 19573.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1881, cr_loss=0.3984, over 4097851.09 frames. ], batch size: 90, lr: 8.31e-03, grad_scale: 16.0 2024-09-14 20:03:02,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.80 vs. limit=10.0 2024-09-14 20:03:07,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2024-09-14 20:03:13,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=177301.5, ans=0.0 2024-09-14 20:03:34,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=177329.83333333334, ans=0.2 2024-09-14 20:03:48,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=177358.16666666666, ans=0.125 2024-09-14 20:04:04,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=177386.5, ans=0.125 2024-09-14 20:04:09,035 INFO [train.py:1198] (0/2) Epoch 10, batch 5100, loss[loss=0.2644, ctc_loss=0.1801, cr_loss=0.4215, over 20826.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.188, cr_loss=0.3981, over 4086255.82 frames. ], batch size: 59, lr: 8.30e-03, grad_scale: 16.0 2024-09-14 20:04:10,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=177414.83333333334, ans=0.125 2024-09-14 20:04:32,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=177443.16666666666, ans=0.125 2024-09-14 20:04:35,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=177443.16666666666, ans=0.0 2024-09-14 20:04:42,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=177471.5, ans=0.125 2024-09-14 20:04:52,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=177499.83333333334, ans=0.125 2024-09-14 20:05:08,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=177528.16666666666, ans=0.125 2024-09-14 20:05:10,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2024-09-14 20:05:10,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=177528.16666666666, ans=0.0 2024-09-14 20:05:18,027 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.017e+02 2.190e+02 2.387e+02 4.016e+02, threshold=4.380e+02, percent-clipped=0.0 2024-09-14 20:05:22,436 INFO [train.py:1198] (0/2) Epoch 10, batch 5150, loss[loss=0.2274, ctc_loss=0.156, cr_loss=0.357, over 20972.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.188, cr_loss=0.3993, over 4092887.67 frames. ], batch size: 52, lr: 8.30e-03, grad_scale: 16.0 2024-09-14 20:05:27,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=177556.5, ans=0.0 2024-09-14 20:06:08,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=177641.5, ans=0.125 2024-09-14 20:06:11,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=177641.5, ans=0.0 2024-09-14 20:06:27,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177669.83333333334, ans=0.1 2024-09-14 20:06:30,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=177669.83333333334, ans=0.125 2024-09-14 20:06:36,104 INFO [train.py:1198] (0/2) Epoch 10, batch 5200, loss[loss=0.251, ctc_loss=0.1712, cr_loss=0.3988, over 21036.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.1879, cr_loss=0.3989, over 4090184.08 frames. ], batch size: 56, lr: 8.30e-03, grad_scale: 32.0 2024-09-14 20:07:37,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2024-09-14 20:07:41,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177811.5, ans=0.1 2024-09-14 20:07:46,184 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.083e+02 2.247e+02 2.554e+02 8.086e+02, threshold=4.494e+02, percent-clipped=1.0 2024-09-14 20:07:50,584 INFO [train.py:1198] (0/2) Epoch 10, batch 5250, loss[loss=0.3101, ctc_loss=0.2206, cr_loss=0.4476, over 18060.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.188, cr_loss=0.3979, over 4069432.85 frames. ], batch size: 108, lr: 8.29e-03, grad_scale: 32.0 2024-09-14 20:08:07,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=177868.16666666666, ans=0.125 2024-09-14 20:08:15,373 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:08:42,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-09-14 20:09:07,485 INFO [train.py:1198] (0/2) Epoch 10, batch 5300, loss[loss=0.2757, ctc_loss=0.1931, cr_loss=0.4126, over 20699.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1875, cr_loss=0.398, over 4080750.11 frames. ], batch size: 68, lr: 8.29e-03, grad_scale: 32.0 2024-09-14 20:09:59,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178066.5, ans=0.1 2024-09-14 20:10:01,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=178066.5, ans=0.025 2024-09-14 20:10:05,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=178094.83333333334, ans=0.0 2024-09-14 20:10:05,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=178094.83333333334, ans=0.0 2024-09-14 20:10:08,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=178094.83333333334, ans=0.2 2024-09-14 20:10:12,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=178094.83333333334, ans=0.125 2024-09-14 20:10:16,692 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.084e+02 2.210e+02 2.533e+02 5.203e+02, threshold=4.421e+02, percent-clipped=3.0 2024-09-14 20:10:21,107 INFO [train.py:1198] (0/2) Epoch 10, batch 5350, loss[loss=0.2592, ctc_loss=0.1793, cr_loss=0.3994, over 20888.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1886, cr_loss=0.3994, over 4064863.11 frames. ], batch size: 54, lr: 8.29e-03, grad_scale: 32.0 2024-09-14 20:10:26,244 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2024-09-14 20:10:37,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=15.0 2024-09-14 20:10:44,233 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.55 vs. limit=10.0 2024-09-14 20:10:48,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178151.5, ans=0.1 2024-09-14 20:11:04,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178208.16666666666, ans=0.1 2024-09-14 20:11:34,724 INFO [train.py:1198] (0/2) Epoch 10, batch 5400, loss[loss=0.2705, ctc_loss=0.1892, cr_loss=0.4065, over 20882.00 frames. ], tot_loss[loss=0.2687, ctc_loss=0.1887, cr_loss=0.4001, over 4065066.02 frames. ], batch size: 57, lr: 8.29e-03, grad_scale: 32.0 2024-09-14 20:11:34,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=178264.83333333334, ans=0.0 2024-09-14 20:11:37,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=178264.83333333334, ans=0.125 2024-09-14 20:11:37,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178264.83333333334, ans=0.1 2024-09-14 20:12:18,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=178349.83333333334, ans=0.2 2024-09-14 20:12:23,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=178349.83333333334, ans=0.125 2024-09-14 20:12:26,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=178349.83333333334, ans=0.125 2024-09-14 20:12:26,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=178349.83333333334, ans=0.2 2024-09-14 20:12:31,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=178378.16666666666, ans=0.025 2024-09-14 20:12:38,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=178378.16666666666, ans=0.0 2024-09-14 20:12:45,835 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.079e+02 2.240e+02 2.521e+02 4.779e+02, threshold=4.480e+02, percent-clipped=1.0 2024-09-14 20:12:50,511 INFO [train.py:1198] (0/2) Epoch 10, batch 5450, loss[loss=0.2692, ctc_loss=0.188, cr_loss=0.406, over 21003.00 frames. ], tot_loss[loss=0.2688, ctc_loss=0.1888, cr_loss=0.4, over 4077433.52 frames. ], batch size: 52, lr: 8.28e-03, grad_scale: 32.0 2024-09-14 20:13:12,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=178434.83333333334, ans=0.125 2024-09-14 20:13:52,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=12.0 2024-09-14 20:14:04,252 INFO [train.py:1198] (0/2) Epoch 10, batch 5500, loss[loss=0.242, ctc_loss=0.1697, cr_loss=0.3615, over 21000.00 frames. ], tot_loss[loss=0.268, ctc_loss=0.1882, cr_loss=0.3987, over 4084769.06 frames. ], batch size: 52, lr: 8.28e-03, grad_scale: 32.0 2024-09-14 20:14:06,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=22.5 2024-09-14 20:14:10,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178548.16666666666, ans=0.1 2024-09-14 20:14:10,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=178548.16666666666, ans=0.0 2024-09-14 20:14:14,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=178548.16666666666, ans=0.0 2024-09-14 20:14:50,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=178633.16666666666, ans=0.0 2024-09-14 20:14:51,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=178633.16666666666, ans=0.95 2024-09-14 20:14:54,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=178633.16666666666, ans=0.0 2024-09-14 20:14:59,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=178633.16666666666, ans=0.2 2024-09-14 20:15:03,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178661.5, ans=0.1 2024-09-14 20:15:06,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=178661.5, ans=0.04949747468305833 2024-09-14 20:15:12,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=178661.5, ans=0.025 2024-09-14 20:15:13,556 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.098e+02 2.272e+02 2.564e+02 4.106e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-14 20:15:17,934 INFO [train.py:1198] (0/2) Epoch 10, batch 5550, loss[loss=0.2541, ctc_loss=0.1734, cr_loss=0.4035, over 20869.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1882, cr_loss=0.3987, over 4077293.49 frames. ], batch size: 54, lr: 8.28e-03, grad_scale: 32.0 2024-09-14 20:15:46,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=178746.5, ans=0.125 2024-09-14 20:16:04,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=178774.83333333334, ans=0.2 2024-09-14 20:16:05,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=178774.83333333334, ans=0.0 2024-09-14 20:16:09,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.22 vs. limit=10.0 2024-09-14 20:16:17,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=178803.16666666666, ans=0.0 2024-09-14 20:16:23,681 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.08 vs. limit=10.0 2024-09-14 20:16:31,715 INFO [train.py:1198] (0/2) Epoch 10, batch 5600, loss[loss=0.2797, ctc_loss=0.1959, cr_loss=0.419, over 21075.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1883, cr_loss=0.3989, over 4076010.93 frames. ], batch size: 59, lr: 8.27e-03, grad_scale: 32.0 2024-09-14 20:16:54,500 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:16:54,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=178859.83333333334, ans=0.0 2024-09-14 20:17:43,573 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.079e+02 2.263e+02 2.565e+02 3.644e+02, threshold=4.526e+02, percent-clipped=0.0 2024-09-14 20:17:47,765 INFO [train.py:1198] (0/2) Epoch 10, batch 5650, loss[loss=0.3145, ctc_loss=0.2284, cr_loss=0.4304, over 19257.00 frames. ], tot_loss[loss=0.2687, ctc_loss=0.1888, cr_loss=0.3995, over 4070572.03 frames. ], batch size: 90, lr: 8.27e-03, grad_scale: 32.0 2024-09-14 20:18:01,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=179001.5, ans=0.2 2024-09-14 20:18:03,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=179001.5, ans=0.125 2024-09-14 20:18:13,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=179001.5, ans=0.2 2024-09-14 20:18:55,838 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-14 20:18:57,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=179086.5, ans=0.125 2024-09-14 20:19:02,591 INFO [train.py:1198] (0/2) Epoch 10, batch 5700, loss[loss=0.2531, ctc_loss=0.1756, cr_loss=0.3873, over 20985.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1883, cr_loss=0.3988, over 4072475.49 frames. ], batch size: 51, lr: 8.27e-03, grad_scale: 32.0 2024-09-14 20:19:25,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=179143.16666666666, ans=0.0 2024-09-14 20:19:28,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=179143.16666666666, ans=0.125 2024-09-14 20:19:30,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=179143.16666666666, ans=15.0 2024-09-14 20:19:37,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=179171.5, ans=0.05 2024-09-14 20:19:40,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2024-09-14 20:19:50,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=179199.83333333334, ans=0.125 2024-09-14 20:20:09,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=179228.16666666666, ans=0.07 2024-09-14 20:20:11,766 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.712e+02 2.122e+02 2.353e+02 2.799e+02 5.192e+02, threshold=4.706e+02, percent-clipped=1.0 2024-09-14 20:20:15,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=22.5 2024-09-14 20:20:16,068 INFO [train.py:1198] (0/2) Epoch 10, batch 5750, loss[loss=0.2791, ctc_loss=0.1946, cr_loss=0.4228, over 20977.00 frames. ], tot_loss[loss=0.2666, ctc_loss=0.1872, cr_loss=0.3971, over 4084120.63 frames. ], batch size: 64, lr: 8.26e-03, grad_scale: 32.0 2024-09-14 20:20:19,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=179256.5, ans=0.125 2024-09-14 20:20:20,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.93 vs. limit=10.0 2024-09-14 20:20:22,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.18 vs. limit=15.0 2024-09-14 20:20:25,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=179256.5, ans=0.125 2024-09-14 20:20:36,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-14 20:20:39,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-14 20:20:49,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179313.16666666666, ans=0.1 2024-09-14 20:21:04,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2024-09-14 20:21:32,551 INFO [train.py:1198] (0/2) Epoch 10, batch 5800, loss[loss=0.229, ctc_loss=0.155, cr_loss=0.3701, over 20789.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.1881, cr_loss=0.3979, over 4070429.53 frames. ], batch size: 53, lr: 8.26e-03, grad_scale: 32.0 2024-09-14 20:21:35,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.93 vs. limit=15.0 2024-09-14 20:21:41,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=179398.16666666666, ans=0.1 2024-09-14 20:21:57,147 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2024-09-14 20:22:08,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=179454.83333333334, ans=0.0 2024-09-14 20:22:27,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=179483.16666666666, ans=0.125 2024-09-14 20:22:41,952 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.068e+02 2.266e+02 2.623e+02 4.460e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-14 20:22:46,289 INFO [train.py:1198] (0/2) Epoch 10, batch 5850, loss[loss=0.2954, ctc_loss=0.2119, cr_loss=0.4173, over 20668.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1883, cr_loss=0.3981, over 4065337.60 frames. ], batch size: 68, lr: 8.26e-03, grad_scale: 32.0 2024-09-14 20:23:01,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=179568.16666666666, ans=0.0 2024-09-14 20:23:07,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=179568.16666666666, ans=0.125 2024-09-14 20:23:08,780 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:23:32,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-09-14 20:23:35,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=179624.83333333334, ans=0.07 2024-09-14 20:23:44,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=179653.16666666666, ans=0.125 2024-09-14 20:23:58,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2024-09-14 20:24:00,192 INFO [train.py:1198] (0/2) Epoch 10, batch 5900, loss[loss=0.2179, ctc_loss=0.1494, cr_loss=0.3423, over 21063.00 frames. ], tot_loss[loss=0.268, ctc_loss=0.1885, cr_loss=0.3978, over 4077018.49 frames. ], batch size: 53, lr: 8.25e-03, grad_scale: 16.0 2024-09-14 20:24:16,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=179709.83333333334, ans=0.04949747468305833 2024-09-14 20:24:24,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=179709.83333333334, ans=0.0 2024-09-14 20:24:25,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=179709.83333333334, ans=0.2 2024-09-14 20:24:27,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2024-09-14 20:24:38,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=179738.16666666666, ans=0.125 2024-09-14 20:24:40,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=179738.16666666666, ans=0.0 2024-09-14 20:24:40,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=179738.16666666666, ans=0.04949747468305833 2024-09-14 20:24:40,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=22.5 2024-09-14 20:24:45,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-14 20:24:50,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=179766.5, ans=0.0 2024-09-14 20:24:53,887 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:25:07,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2024-09-14 20:25:11,467 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.709e+02 2.043e+02 2.284e+02 2.438e+02 2.924e+02, threshold=4.568e+02, percent-clipped=0.0 2024-09-14 20:25:14,487 INFO [train.py:1198] (0/2) Epoch 10, batch 5950, loss[loss=0.235, ctc_loss=0.1641, cr_loss=0.3549, over 21072.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1884, cr_loss=0.3976, over 4080060.96 frames. ], batch size: 56, lr: 8.25e-03, grad_scale: 16.0 2024-09-14 20:26:12,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179908.16666666666, ans=0.1 2024-09-14 20:26:23,663 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-09-14 20:26:30,397 INFO [train.py:1198] (0/2) Epoch 10, batch 6000, loss[loss=0.3105, ctc_loss=0.2197, cr_loss=0.454, over 20947.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1878, cr_loss=0.398, over 4080679.57 frames. ], batch size: 64, lr: 8.25e-03, grad_scale: 32.0 2024-09-14 20:26:30,398 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 20:26:56,312 INFO [train.py:1230] (0/2) Epoch 10, validation: loss=0.05303, ctc_loss=0.05303, cr_loss=9.606e-15, over 944034.00 frames. 2024-09-14 20:26:56,313 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 20:27:20,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=179993.16666666666, ans=0.0 2024-09-14 20:27:33,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2024-09-14 20:27:45,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=180049.83333333334, ans=0.025 2024-09-14 20:27:48,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=180049.83333333334, ans=0.0 2024-09-14 20:27:54,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=180049.83333333334, ans=0.0 2024-09-14 20:28:08,657 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.111e+02 2.253e+02 2.632e+02 5.421e+02, threshold=4.505e+02, percent-clipped=2.0 2024-09-14 20:28:11,710 INFO [train.py:1198] (0/2) Epoch 10, batch 6050, loss[loss=0.2407, ctc_loss=0.1673, cr_loss=0.367, over 20971.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1874, cr_loss=0.3978, over 4082550.54 frames. ], batch size: 50, lr: 8.24e-03, grad_scale: 32.0 2024-09-14 20:28:14,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=180106.5, ans=0.125 2024-09-14 20:28:55,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=180191.5, ans=0.04949747468305833 2024-09-14 20:29:09,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-14 20:29:22,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=180219.83333333334, ans=0.5 2024-09-14 20:29:26,339 INFO [train.py:1198] (0/2) Epoch 10, batch 6100, loss[loss=0.2828, ctc_loss=0.1983, cr_loss=0.4225, over 21035.00 frames. ], tot_loss[loss=0.2666, ctc_loss=0.1871, cr_loss=0.3976, over 4093569.57 frames. ], batch size: 62, lr: 8.24e-03, grad_scale: 32.0 2024-09-14 20:30:16,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=180333.16666666666, ans=0.2 2024-09-14 20:30:36,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.080e+02 2.247e+02 2.428e+02 5.980e+02, threshold=4.494e+02, percent-clipped=2.0 2024-09-14 20:30:39,839 INFO [train.py:1198] (0/2) Epoch 10, batch 6150, loss[loss=0.2252, ctc_loss=0.1561, cr_loss=0.3456, over 20970.00 frames. ], tot_loss[loss=0.2673, ctc_loss=0.1877, cr_loss=0.3981, over 4082145.51 frames. ], batch size: 48, lr: 8.24e-03, grad_scale: 32.0 2024-09-14 20:30:40,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2024-09-14 20:30:53,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.97 vs. limit=10.0 2024-09-14 20:31:02,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-14 20:31:42,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=180503.16666666666, ans=0.125 2024-09-14 20:31:52,623 INFO [train.py:1198] (0/2) Epoch 10, batch 6200, loss[loss=0.2806, ctc_loss=0.1971, cr_loss=0.4171, over 20901.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1875, cr_loss=0.3979, over 4055116.77 frames. ], batch size: 54, lr: 8.23e-03, grad_scale: 32.0 2024-09-14 20:32:09,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-09-14 20:32:20,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180588.16666666666, ans=0.1 2024-09-14 20:32:48,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=180616.5, ans=0.125 2024-09-14 20:33:02,641 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.740e+02 2.041e+02 2.240e+02 2.498e+02 3.680e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-14 20:33:05,562 INFO [train.py:1198] (0/2) Epoch 10, batch 6250, loss[loss=0.2834, ctc_loss=0.2011, cr_loss=0.4115, over 21014.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1882, cr_loss=0.3986, over 4041063.66 frames. ], batch size: 63, lr: 8.23e-03, grad_scale: 32.0 2024-09-14 20:33:14,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=180673.16666666666, ans=0.0 2024-09-14 20:33:41,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=180729.83333333334, ans=0.0 2024-09-14 20:33:53,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=180758.16666666666, ans=0.125 2024-09-14 20:34:19,407 INFO [train.py:1198] (0/2) Epoch 10, batch 6300, loss[loss=0.3059, ctc_loss=0.2265, cr_loss=0.3971, over 14402.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1875, cr_loss=0.3973, over 4028045.16 frames. ], batch size: 149, lr: 8.23e-03, grad_scale: 32.0 2024-09-14 20:34:22,933 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.32 vs. limit=10.0 2024-09-14 20:34:26,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=180814.83333333334, ans=0.2 2024-09-14 20:34:30,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=22.5 2024-09-14 20:34:43,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=180843.16666666666, ans=0.125 2024-09-14 20:35:17,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=180928.16666666666, ans=0.0 2024-09-14 20:35:23,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=180928.16666666666, ans=0.025 2024-09-14 20:35:28,530 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.177e+02 2.468e+02 2.812e+02 4.954e+02, threshold=4.937e+02, percent-clipped=1.0 2024-09-14 20:35:31,308 INFO [train.py:1198] (0/2) Epoch 10, batch 6350, loss[loss=0.3377, ctc_loss=0.2538, cr_loss=0.4192, over 13924.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1917, cr_loss=0.3976, over 3876809.06 frames. ], batch size: 150, lr: 8.22e-03, grad_scale: 32.0 2024-09-14 20:35:51,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180984.83333333334, ans=0.1 2024-09-14 20:35:54,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=180984.83333333334, ans=0.04949747468305833 2024-09-14 20:36:16,268 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.218e-02 2024-09-14 20:36:28,489 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-10.pt 2024-09-14 20:37:16,756 INFO [train.py:1198] (0/2) Epoch 11, batch 0, loss[loss=0.2464, ctc_loss=0.1716, cr_loss=0.3739, over 20772.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1716, cr_loss=0.3739, over 20772.00 frames. ], batch size: 56, lr: 7.85e-03, grad_scale: 32.0 2024-09-14 20:37:16,757 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 20:37:35,260 INFO [train.py:1230] (0/2) Epoch 11, validation: loss=0.05334, ctc_loss=0.05334, cr_loss=9.322e-15, over 944034.00 frames. 2024-09-14 20:37:35,261 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 20:37:41,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2024-09-14 20:38:16,443 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-14 20:38:19,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=181154.83333333334, ans=0.2 2024-09-14 20:38:28,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=181154.83333333334, ans=0.0 2024-09-14 20:38:50,007 INFO [train.py:1198] (0/2) Epoch 11, batch 50, loss[loss=0.2533, ctc_loss=0.1765, cr_loss=0.3837, over 20990.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1887, cr_loss=0.4016, over 934297.67 frames. ], batch size: 52, lr: 7.85e-03, grad_scale: 32.0 2024-09-14 20:38:56,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2024-09-14 20:38:59,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181211.5, ans=0.1 2024-09-14 20:39:01,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=181211.5, ans=0.125 2024-09-14 20:39:02,234 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.126e+02 2.351e+02 2.668e+02 3.477e+02, threshold=4.702e+02, percent-clipped=0.0 2024-09-14 20:39:17,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=181239.83333333334, ans=0.125 2024-09-14 20:39:35,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=181296.5, ans=0.125 2024-09-14 20:39:52,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=181324.83333333334, ans=0.0 2024-09-14 20:39:53,533 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-64000.pt 2024-09-14 20:40:06,214 INFO [train.py:1198] (0/2) Epoch 11, batch 100, loss[loss=0.239, ctc_loss=0.1655, cr_loss=0.3674, over 20964.00 frames. ], tot_loss[loss=0.2703, ctc_loss=0.1896, cr_loss=0.4035, over 1633067.70 frames. ], batch size: 51, lr: 7.84e-03, grad_scale: 32.0 2024-09-14 20:40:47,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2024-09-14 20:41:05,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=181438.16666666666, ans=0.0 2024-09-14 20:41:08,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.73 vs. limit=10.0 2024-09-14 20:41:22,953 INFO [train.py:1198] (0/2) Epoch 11, batch 150, loss[loss=0.3106, ctc_loss=0.2289, cr_loss=0.4085, over 13367.00 frames. ], tot_loss[loss=0.2655, ctc_loss=0.186, cr_loss=0.3974, over 2176305.19 frames. ], batch size: 149, lr: 7.84e-03, grad_scale: 32.0 2024-09-14 20:41:35,090 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.029e+02 2.298e+02 2.608e+02 3.542e+02, threshold=4.596e+02, percent-clipped=0.0 2024-09-14 20:42:18,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=181579.83333333334, ans=0.125 2024-09-14 20:42:27,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181608.16666666666, ans=0.1 2024-09-14 20:42:40,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181636.5, ans=0.1 2024-09-14 20:42:41,855 INFO [train.py:1198] (0/2) Epoch 11, batch 200, loss[loss=0.2569, ctc_loss=0.1747, cr_loss=0.4105, over 20757.00 frames. ], tot_loss[loss=0.2645, ctc_loss=0.185, cr_loss=0.3973, over 2613369.83 frames. ], batch size: 56, lr: 7.84e-03, grad_scale: 32.0 2024-09-14 20:42:43,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181636.5, ans=0.1 2024-09-14 20:42:55,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.87 vs. limit=15.0 2024-09-14 20:43:09,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=22.5 2024-09-14 20:43:21,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=181693.16666666666, ans=15.0 2024-09-14 20:43:28,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181721.5, ans=0.1 2024-09-14 20:43:30,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181721.5, ans=0.1 2024-09-14 20:43:32,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=181721.5, ans=0.2 2024-09-14 20:43:43,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181749.83333333334, ans=0.1 2024-09-14 20:43:57,320 INFO [train.py:1198] (0/2) Epoch 11, batch 250, loss[loss=0.2701, ctc_loss=0.1865, cr_loss=0.4183, over 20883.00 frames. ], tot_loss[loss=0.2619, ctc_loss=0.1832, cr_loss=0.3938, over 2943717.60 frames. ], batch size: 57, lr: 7.83e-03, grad_scale: 32.0 2024-09-14 20:44:09,112 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.090e+02 2.207e+02 2.442e+02 3.598e+02, threshold=4.413e+02, percent-clipped=0.0 2024-09-14 20:44:12,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2024-09-14 20:44:32,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=181834.83333333334, ans=0.0 2024-09-14 20:44:38,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=181834.83333333334, ans=0.125 2024-09-14 20:44:59,896 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=15.0 2024-09-14 20:45:08,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=181891.5, ans=0.2 2024-09-14 20:45:12,614 INFO [train.py:1198] (0/2) Epoch 11, batch 300, loss[loss=0.2204, ctc_loss=0.1504, cr_loss=0.3496, over 20993.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1846, cr_loss=0.3955, over 3204775.40 frames. ], batch size: 48, lr: 7.83e-03, grad_scale: 32.0 2024-09-14 20:45:16,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=181919.83333333334, ans=0.125 2024-09-14 20:45:42,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=181976.5, ans=0.025 2024-09-14 20:45:54,935 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-09-14 20:45:59,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=182004.83333333334, ans=0.125 2024-09-14 20:46:07,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2024-09-14 20:46:12,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2024-09-14 20:46:27,331 INFO [train.py:1198] (0/2) Epoch 11, batch 350, loss[loss=0.2387, ctc_loss=0.1677, cr_loss=0.3547, over 20478.00 frames. ], tot_loss[loss=0.2653, ctc_loss=0.1858, cr_loss=0.3975, over 3390018.36 frames. ], batch size: 45, lr: 7.83e-03, grad_scale: 32.0 2024-09-14 20:46:42,572 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.073e+02 2.252e+02 2.506e+02 3.676e+02, threshold=4.504e+02, percent-clipped=0.0 2024-09-14 20:46:54,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=182089.83333333334, ans=0.125 2024-09-14 20:47:18,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182146.5, ans=0.1 2024-09-14 20:47:23,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=182146.5, ans=0.025 2024-09-14 20:47:31,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=182174.83333333334, ans=0.125 2024-09-14 20:47:49,332 INFO [train.py:1198] (0/2) Epoch 11, batch 400, loss[loss=0.2748, ctc_loss=0.1932, cr_loss=0.4079, over 20286.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.1869, cr_loss=0.3981, over 3535815.42 frames. ], batch size: 74, lr: 7.82e-03, grad_scale: 32.0 2024-09-14 20:47:57,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=182203.16666666666, ans=0.125 2024-09-14 20:47:57,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-09-14 20:48:02,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=22.5 2024-09-14 20:48:57,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=182316.5, ans=0.025 2024-09-14 20:49:04,483 INFO [train.py:1198] (0/2) Epoch 11, batch 450, loss[loss=0.2835, ctc_loss=0.1941, cr_loss=0.447, over 20762.00 frames. ], tot_loss[loss=0.266, ctc_loss=0.1864, cr_loss=0.3982, over 3658401.30 frames. ], batch size: 56, lr: 7.82e-03, grad_scale: 32.0 2024-09-14 20:49:16,229 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.061e+02 2.238e+02 2.492e+02 3.942e+02, threshold=4.476e+02, percent-clipped=0.0 2024-09-14 20:49:22,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=182373.16666666666, ans=0.05 2024-09-14 20:49:33,296 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2024-09-14 20:49:52,521 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 20:50:19,487 INFO [train.py:1198] (0/2) Epoch 11, batch 500, loss[loss=0.3013, ctc_loss=0.2107, cr_loss=0.4532, over 20696.00 frames. ], tot_loss[loss=0.2661, ctc_loss=0.1863, cr_loss=0.3989, over 3756988.51 frames. ], batch size: 71, lr: 7.82e-03, grad_scale: 32.0 2024-09-14 20:50:45,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=182514.83333333334, ans=0.125 2024-09-14 20:51:31,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=182599.83333333334, ans=0.0 2024-09-14 20:51:34,508 INFO [train.py:1198] (0/2) Epoch 11, batch 550, loss[loss=0.2494, ctc_loss=0.175, cr_loss=0.3721, over 21047.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.1867, cr_loss=0.3988, over 3824374.24 frames. ], batch size: 53, lr: 7.82e-03, grad_scale: 32.0 2024-09-14 20:51:39,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=182628.16666666666, ans=0.2 2024-09-14 20:51:43,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=182628.16666666666, ans=0.1 2024-09-14 20:51:46,579 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.110e+02 2.255e+02 2.525e+02 4.249e+02, threshold=4.510e+02, percent-clipped=0.0 2024-09-14 20:51:58,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182656.5, ans=0.1 2024-09-14 20:52:00,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=182656.5, ans=0.0 2024-09-14 20:52:29,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=182713.16666666666, ans=0.0 2024-09-14 20:52:33,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=182713.16666666666, ans=0.125 2024-09-14 20:52:38,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-14 20:52:53,125 INFO [train.py:1198] (0/2) Epoch 11, batch 600, loss[loss=0.223, ctc_loss=0.1525, cr_loss=0.3524, over 19847.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.1849, cr_loss=0.3962, over 3885980.59 frames. ], batch size: 44, lr: 7.81e-03, grad_scale: 32.0 2024-09-14 20:53:52,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=182854.83333333334, ans=0.0 2024-09-14 20:53:57,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-09-14 20:54:11,224 INFO [train.py:1198] (0/2) Epoch 11, batch 650, loss[loss=0.2742, ctc_loss=0.1919, cr_loss=0.4117, over 20936.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1845, cr_loss=0.3964, over 3939538.89 frames. ], batch size: 60, lr: 7.81e-03, grad_scale: 32.0 2024-09-14 20:54:23,088 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.688e+02 1.966e+02 2.097e+02 2.244e+02 3.204e+02, threshold=4.194e+02, percent-clipped=0.0 2024-09-14 20:54:29,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=12.0 2024-09-14 20:55:26,190 INFO [train.py:1198] (0/2) Epoch 11, batch 700, loss[loss=0.2744, ctc_loss=0.1935, cr_loss=0.4045, over 21045.00 frames. ], tot_loss[loss=0.2643, ctc_loss=0.1848, cr_loss=0.3976, over 3979785.85 frames. ], batch size: 56, lr: 7.81e-03, grad_scale: 32.0 2024-09-14 20:55:36,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=183053.16666666666, ans=0.125 2024-09-14 20:55:55,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.00 vs. limit=15.0 2024-09-14 20:56:18,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=183138.16666666666, ans=0.1 2024-09-14 20:56:27,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=183166.5, ans=0.0 2024-09-14 20:56:41,058 INFO [train.py:1198] (0/2) Epoch 11, batch 750, loss[loss=0.2873, ctc_loss=0.203, cr_loss=0.4215, over 20108.00 frames. ], tot_loss[loss=0.2655, ctc_loss=0.1857, cr_loss=0.3988, over 4015542.47 frames. ], batch size: 80, lr: 7.80e-03, grad_scale: 32.0 2024-09-14 20:56:53,223 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.035e+02 2.206e+02 2.455e+02 4.066e+02, threshold=4.412e+02, percent-clipped=0.0 2024-09-14 20:56:56,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-14 20:57:06,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=12.0 2024-09-14 20:57:40,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=183308.16666666666, ans=0.0 2024-09-14 20:57:55,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=183336.5, ans=0.1 2024-09-14 20:57:56,932 INFO [train.py:1198] (0/2) Epoch 11, batch 800, loss[loss=0.3012, ctc_loss=0.2225, cr_loss=0.3935, over 19512.00 frames. ], tot_loss[loss=0.2658, ctc_loss=0.1862, cr_loss=0.3983, over 4025471.43 frames. ], batch size: 90, lr: 7.80e-03, grad_scale: 32.0 2024-09-14 20:58:19,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=183364.83333333334, ans=0.125 2024-09-14 20:58:36,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2024-09-14 20:58:42,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=183393.16666666666, ans=0.125 2024-09-14 20:58:42,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-14 20:59:13,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=183449.83333333334, ans=0.125 2024-09-14 20:59:18,246 INFO [train.py:1198] (0/2) Epoch 11, batch 850, loss[loss=0.2444, ctc_loss=0.1694, cr_loss=0.3752, over 19817.00 frames. ], tot_loss[loss=0.2648, ctc_loss=0.1854, cr_loss=0.397, over 4035260.37 frames. ], batch size: 44, lr: 7.80e-03, grad_scale: 32.0 2024-09-14 20:59:30,347 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.078e+02 2.290e+02 2.606e+02 4.624e+02, threshold=4.579e+02, percent-clipped=1.0 2024-09-14 20:59:43,052 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=22.5 2024-09-14 20:59:50,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=183534.83333333334, ans=0.0 2024-09-14 21:00:10,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=183563.16666666666, ans=0.125 2024-09-14 21:00:17,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=183591.5, ans=0.125 2024-09-14 21:00:22,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=183591.5, ans=0.125 2024-09-14 21:00:24,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=183591.5, ans=0.0 2024-09-14 21:00:34,178 INFO [train.py:1198] (0/2) Epoch 11, batch 900, loss[loss=0.3514, ctc_loss=0.2612, cr_loss=0.4511, over 14009.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1843, cr_loss=0.396, over 4042756.43 frames. ], batch size: 149, lr: 7.80e-03, grad_scale: 32.0 2024-09-14 21:00:45,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=183619.83333333334, ans=0.0 2024-09-14 21:01:13,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=183676.5, ans=0.125 2024-09-14 21:01:45,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=183733.16666666666, ans=10.0 2024-09-14 21:01:49,882 INFO [train.py:1198] (0/2) Epoch 11, batch 950, loss[loss=0.2858, ctc_loss=0.2026, cr_loss=0.4163, over 20929.00 frames. ], tot_loss[loss=0.2625, ctc_loss=0.1836, cr_loss=0.3947, over 4058451.99 frames. ], batch size: 60, lr: 7.79e-03, grad_scale: 32.0 2024-09-14 21:01:56,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=183761.5, ans=0.015 2024-09-14 21:01:57,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=183761.5, ans=0.1 2024-09-14 21:02:01,778 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.043e+02 2.313e+02 2.471e+02 3.545e+02, threshold=4.626e+02, percent-clipped=0.0 2024-09-14 21:02:05,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=183789.83333333334, ans=0.125 2024-09-14 21:02:35,046 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=12.0 2024-09-14 21:02:48,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=183874.83333333334, ans=0.125 2024-09-14 21:03:04,853 INFO [train.py:1198] (0/2) Epoch 11, batch 1000, loss[loss=0.2194, ctc_loss=0.1521, cr_loss=0.3364, over 21051.00 frames. ], tot_loss[loss=0.2626, ctc_loss=0.1835, cr_loss=0.3953, over 4069493.43 frames. ], batch size: 53, lr: 7.79e-03, grad_scale: 32.0 2024-09-14 21:03:18,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=183931.5, ans=0.125 2024-09-14 21:03:28,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.92 vs. limit=6.0 2024-09-14 21:03:57,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2024-09-14 21:04:06,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=184016.5, ans=0.125 2024-09-14 21:04:11,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=184016.5, ans=0.125 2024-09-14 21:04:22,955 INFO [train.py:1198] (0/2) Epoch 11, batch 1050, loss[loss=0.2777, ctc_loss=0.1966, cr_loss=0.4055, over 20855.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1828, cr_loss=0.3937, over 4082853.55 frames. ], batch size: 65, lr: 7.79e-03, grad_scale: 32.0 2024-09-14 21:04:34,942 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.069e+02 2.217e+02 2.437e+02 3.461e+02, threshold=4.435e+02, percent-clipped=0.0 2024-09-14 21:04:38,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=184073.16666666666, ans=0.125 2024-09-14 21:04:46,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=184073.16666666666, ans=0.025 2024-09-14 21:05:02,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-09-14 21:05:30,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=184158.16666666666, ans=0.125 2024-09-14 21:05:33,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=184158.16666666666, ans=0.0 2024-09-14 21:05:41,964 INFO [train.py:1198] (0/2) Epoch 11, batch 1100, loss[loss=0.2593, ctc_loss=0.1807, cr_loss=0.3928, over 20968.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1828, cr_loss=0.3942, over 4091063.81 frames. ], batch size: 58, lr: 7.78e-03, grad_scale: 32.0 2024-09-14 21:05:43,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=184186.5, ans=0.125 2024-09-14 21:05:45,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=184186.5, ans=0.0 2024-09-14 21:05:48,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=184186.5, ans=0.125 2024-09-14 21:06:39,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=22.5 2024-09-14 21:06:53,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=184299.83333333334, ans=0.04949747468305833 2024-09-14 21:06:54,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=184299.83333333334, ans=0.125 2024-09-14 21:06:57,287 INFO [train.py:1198] (0/2) Epoch 11, batch 1150, loss[loss=0.2732, ctc_loss=0.1896, cr_loss=0.4178, over 20306.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.1832, cr_loss=0.3942, over 4091139.66 frames. ], batch size: 74, lr: 7.78e-03, grad_scale: 32.0 2024-09-14 21:07:09,167 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.033e+02 2.198e+02 2.586e+02 3.807e+02, threshold=4.395e+02, percent-clipped=0.0 2024-09-14 21:07:20,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=184356.5, ans=0.125 2024-09-14 21:07:20,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184356.5, ans=0.1 2024-09-14 21:07:21,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=184356.5, ans=0.09899494936611666 2024-09-14 21:08:12,658 INFO [train.py:1198] (0/2) Epoch 11, batch 1200, loss[loss=0.2677, ctc_loss=0.1868, cr_loss=0.4044, over 20948.00 frames. ], tot_loss[loss=0.2625, ctc_loss=0.1835, cr_loss=0.3947, over 4089579.99 frames. ], batch size: 60, lr: 7.78e-03, grad_scale: 32.0 2024-09-14 21:08:16,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=184469.83333333334, ans=0.0 2024-09-14 21:08:47,532 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 21:09:14,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=184583.16666666666, ans=0.0 2024-09-14 21:09:28,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=184583.16666666666, ans=0.05 2024-09-14 21:09:30,722 INFO [train.py:1198] (0/2) Epoch 11, batch 1250, loss[loss=0.2735, ctc_loss=0.1914, cr_loss=0.4105, over 20872.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1825, cr_loss=0.3935, over 4081883.81 frames. ], batch size: 57, lr: 7.77e-03, grad_scale: 32.0 2024-09-14 21:09:35,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=184611.5, ans=0.04949747468305833 2024-09-14 21:09:38,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=184611.5, ans=0.125 2024-09-14 21:09:42,744 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.712e+02 2.028e+02 2.201e+02 2.379e+02 3.643e+02, threshold=4.402e+02, percent-clipped=0.0 2024-09-14 21:09:53,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=184639.83333333334, ans=0.0 2024-09-14 21:10:15,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-09-14 21:10:49,405 INFO [train.py:1198] (0/2) Epoch 11, batch 1300, loss[loss=0.2865, ctc_loss=0.2029, cr_loss=0.4182, over 19266.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1827, cr_loss=0.3943, over 4091289.39 frames. ], batch size: 90, lr: 7.77e-03, grad_scale: 32.0 2024-09-14 21:12:04,056 INFO [train.py:1198] (0/2) Epoch 11, batch 1350, loss[loss=0.235, ctc_loss=0.1622, cr_loss=0.3641, over 20340.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.1831, cr_loss=0.3945, over 4092715.88 frames. ], batch size: 45, lr: 7.77e-03, grad_scale: 32.0 2024-09-14 21:12:16,027 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.702e+02 2.009e+02 2.122e+02 2.250e+02 3.486e+02, threshold=4.244e+02, percent-clipped=0.0 2024-09-14 21:12:28,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184923.16666666666, ans=0.1 2024-09-14 21:12:37,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=22.5 2024-09-14 21:13:04,316 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 21:13:08,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=185008.16666666666, ans=0.1 2024-09-14 21:13:19,032 INFO [train.py:1198] (0/2) Epoch 11, batch 1400, loss[loss=0.2267, ctc_loss=0.155, cr_loss=0.3586, over 20978.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.1832, cr_loss=0.3944, over 4095530.86 frames. ], batch size: 48, lr: 7.77e-03, grad_scale: 16.0 2024-09-14 21:13:37,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2024-09-14 21:13:47,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=185093.16666666666, ans=10.0 2024-09-14 21:14:10,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=185121.5, ans=0.035 2024-09-14 21:14:11,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=185121.5, ans=0.0 2024-09-14 21:14:13,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=185121.5, ans=0.125 2024-09-14 21:14:31,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=185149.83333333334, ans=0.1 2024-09-14 21:14:33,895 INFO [train.py:1198] (0/2) Epoch 11, batch 1450, loss[loss=0.2943, ctc_loss=0.203, cr_loss=0.4565, over 20955.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1825, cr_loss=0.3938, over 4116193.89 frames. ], batch size: 67, lr: 7.76e-03, grad_scale: 16.0 2024-09-14 21:14:47,625 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.054e+02 2.210e+02 2.365e+02 5.285e+02, threshold=4.419e+02, percent-clipped=2.0 2024-09-14 21:15:30,283 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 21:15:46,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=185291.5, ans=0.0 2024-09-14 21:15:48,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=185291.5, ans=0.5 2024-09-14 21:15:52,462 INFO [train.py:1198] (0/2) Epoch 11, batch 1500, loss[loss=0.3016, ctc_loss=0.2113, cr_loss=0.4514, over 20846.00 frames. ], tot_loss[loss=0.2626, ctc_loss=0.1836, cr_loss=0.395, over 4110042.40 frames. ], batch size: 65, lr: 7.76e-03, grad_scale: 16.0 2024-09-14 21:16:06,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=185348.16666666666, ans=0.125 2024-09-14 21:16:12,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=185348.16666666666, ans=0.125 2024-09-14 21:16:43,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=185404.83333333334, ans=0.0 2024-09-14 21:16:51,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=185404.83333333334, ans=0.1 2024-09-14 21:17:10,454 INFO [train.py:1198] (0/2) Epoch 11, batch 1550, loss[loss=0.2203, ctc_loss=0.1522, cr_loss=0.3407, over 21049.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1834, cr_loss=0.395, over 4108749.30 frames. ], batch size: 53, lr: 7.76e-03, grad_scale: 16.0 2024-09-14 21:17:23,950 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.056e+02 2.230e+02 2.418e+02 4.179e+02, threshold=4.460e+02, percent-clipped=0.0 2024-09-14 21:17:34,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=185489.83333333334, ans=0.125 2024-09-14 21:18:00,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2024-09-14 21:18:26,083 INFO [train.py:1198] (0/2) Epoch 11, batch 1600, loss[loss=0.2841, ctc_loss=0.2037, cr_loss=0.4023, over 20973.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1816, cr_loss=0.3924, over 4119562.88 frames. ], batch size: 58, lr: 7.75e-03, grad_scale: 32.0 2024-09-14 21:18:28,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=185603.16666666666, ans=0.125 2024-09-14 21:18:55,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=185659.83333333334, ans=0.0 2024-09-14 21:19:33,077 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-09-14 21:19:41,299 INFO [train.py:1198] (0/2) Epoch 11, batch 1650, loss[loss=0.2529, ctc_loss=0.1771, cr_loss=0.3791, over 21057.00 frames. ], tot_loss[loss=0.2613, ctc_loss=0.1825, cr_loss=0.3937, over 4116219.14 frames. ], batch size: 56, lr: 7.75e-03, grad_scale: 32.0 2024-09-14 21:19:55,032 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.099e+02 2.332e+02 2.734e+02 4.290e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-14 21:20:03,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=185773.16666666666, ans=10.0 2024-09-14 21:20:03,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=22.5 2024-09-14 21:20:46,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=185858.16666666666, ans=0.0 2024-09-14 21:20:59,782 INFO [train.py:1198] (0/2) Epoch 11, batch 1700, loss[loss=0.2745, ctc_loss=0.1897, cr_loss=0.4236, over 20917.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1834, cr_loss=0.3947, over 4103798.55 frames. ], batch size: 60, lr: 7.75e-03, grad_scale: 32.0 2024-09-14 21:21:32,024 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 21:21:34,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=185943.16666666666, ans=0.0 2024-09-14 21:21:44,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=185971.5, ans=0.025 2024-09-14 21:22:12,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-09-14 21:22:18,078 INFO [train.py:1198] (0/2) Epoch 11, batch 1750, loss[loss=0.2155, ctc_loss=0.1477, cr_loss=0.3393, over 20986.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1829, cr_loss=0.3934, over 4107053.79 frames. ], batch size: 49, lr: 7.75e-03, grad_scale: 32.0 2024-09-14 21:22:28,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186028.16666666666, ans=0.1 2024-09-14 21:22:31,577 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.062e+02 2.241e+02 2.492e+02 4.229e+02, threshold=4.481e+02, percent-clipped=0.0 2024-09-14 21:22:41,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.08 vs. limit=10.0 2024-09-14 21:23:32,880 INFO [train.py:1198] (0/2) Epoch 11, batch 1800, loss[loss=0.2343, ctc_loss=0.1618, cr_loss=0.3628, over 20963.00 frames. ], tot_loss[loss=0.2634, ctc_loss=0.1843, cr_loss=0.3959, over 4098922.32 frames. ], batch size: 50, lr: 7.74e-03, grad_scale: 16.0 2024-09-14 21:23:42,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=186169.83333333334, ans=0.0 2024-09-14 21:24:22,721 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 21:24:33,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=186283.16666666666, ans=0.1 2024-09-14 21:24:47,554 INFO [train.py:1198] (0/2) Epoch 11, batch 1850, loss[loss=0.2715, ctc_loss=0.1908, cr_loss=0.4033, over 20694.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1845, cr_loss=0.3962, over 4098330.71 frames. ], batch size: 71, lr: 7.74e-03, grad_scale: 16.0 2024-09-14 21:25:02,705 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.057e+02 2.167e+02 2.409e+02 4.086e+02, threshold=4.333e+02, percent-clipped=0.0 2024-09-14 21:25:03,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=186339.83333333334, ans=0.125 2024-09-14 21:25:05,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-09-14 21:25:36,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=186396.5, ans=0.1 2024-09-14 21:25:39,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=186396.5, ans=12.0 2024-09-14 21:26:03,136 INFO [train.py:1198] (0/2) Epoch 11, batch 1900, loss[loss=0.2672, ctc_loss=0.1856, cr_loss=0.4081, over 20887.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1844, cr_loss=0.3962, over 4103686.94 frames. ], batch size: 54, lr: 7.74e-03, grad_scale: 16.0 2024-09-14 21:27:03,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=186538.16666666666, ans=0.05 2024-09-14 21:27:15,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186566.5, ans=0.1 2024-09-14 21:27:21,408 INFO [train.py:1198] (0/2) Epoch 11, batch 1950, loss[loss=0.2758, ctc_loss=0.1929, cr_loss=0.4146, over 20707.00 frames. ], tot_loss[loss=0.2643, ctc_loss=0.1849, cr_loss=0.397, over 4094575.08 frames. ], batch size: 71, lr: 7.73e-03, grad_scale: 16.0 2024-09-14 21:27:36,464 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.006e+02 2.164e+02 2.327e+02 3.353e+02, threshold=4.328e+02, percent-clipped=0.0 2024-09-14 21:27:45,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=186623.16666666666, ans=0.2 2024-09-14 21:27:50,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-14 21:28:39,550 INFO [train.py:1198] (0/2) Epoch 11, batch 2000, loss[loss=0.2922, ctc_loss=0.2058, cr_loss=0.432, over 19969.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.1849, cr_loss=0.3966, over 4083905.35 frames. ], batch size: 80, lr: 7.73e-03, grad_scale: 32.0 2024-09-14 21:29:35,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186821.5, ans=0.1 2024-09-14 21:29:55,199 INFO [train.py:1198] (0/2) Epoch 11, batch 2050, loss[loss=0.2753, ctc_loss=0.192, cr_loss=0.4163, over 20664.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1846, cr_loss=0.3968, over 4079893.19 frames. ], batch size: 66, lr: 7.73e-03, grad_scale: 32.0 2024-09-14 21:30:10,167 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.062e+02 2.212e+02 2.510e+02 4.514e+02, threshold=4.424e+02, percent-clipped=1.0 2024-09-14 21:30:21,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2024-09-14 21:30:41,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=186963.16666666666, ans=0.0 2024-09-14 21:30:55,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=186991.5, ans=0.125 2024-09-14 21:30:58,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=186991.5, ans=0.0 2024-09-14 21:31:10,056 INFO [train.py:1198] (0/2) Epoch 11, batch 2100, loss[loss=0.2644, ctc_loss=0.1903, cr_loss=0.3703, over 20040.00 frames. ], tot_loss[loss=0.2653, ctc_loss=0.1857, cr_loss=0.3981, over 4063672.46 frames. ], batch size: 80, lr: 7.72e-03, grad_scale: 32.0 2024-09-14 21:31:10,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=187019.83333333334, ans=0.125 2024-09-14 21:31:11,978 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 21:31:19,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=187019.83333333334, ans=0.125 2024-09-14 21:31:23,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=187048.16666666666, ans=0.125 2024-09-14 21:31:27,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2024-09-14 21:31:52,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=187076.5, ans=0.125 2024-09-14 21:32:28,249 INFO [train.py:1198] (0/2) Epoch 11, batch 2150, loss[loss=0.2197, ctc_loss=0.1524, cr_loss=0.3362, over 20964.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1855, cr_loss=0.3983, over 4072248.35 frames. ], batch size: 51, lr: 7.72e-03, grad_scale: 32.0 2024-09-14 21:32:30,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=187161.5, ans=0.0 2024-09-14 21:32:40,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=187161.5, ans=0.125 2024-09-14 21:32:43,112 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.021e+02 2.178e+02 2.400e+02 3.234e+02, threshold=4.357e+02, percent-clipped=0.0 2024-09-14 21:32:55,603 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 21:33:01,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=187218.16666666666, ans=0.0 2024-09-14 21:33:31,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=187274.83333333334, ans=0.125 2024-09-14 21:33:46,145 INFO [train.py:1198] (0/2) Epoch 11, batch 2200, loss[loss=0.2681, ctc_loss=0.1857, cr_loss=0.412, over 21071.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1845, cr_loss=0.3974, over 4070953.82 frames. ], batch size: 59, lr: 7.72e-03, grad_scale: 32.0 2024-09-14 21:33:53,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=187303.16666666666, ans=0.125 2024-09-14 21:34:04,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=187331.5, ans=0.125 2024-09-14 21:35:00,837 INFO [train.py:1198] (0/2) Epoch 11, batch 2250, loss[loss=0.2722, ctc_loss=0.1933, cr_loss=0.3946, over 20677.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.1852, cr_loss=0.3988, over 4077416.62 frames. ], batch size: 68, lr: 7.72e-03, grad_scale: 32.0 2024-09-14 21:35:15,866 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.089e+02 2.236e+02 2.465e+02 4.073e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-14 21:35:49,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=187529.83333333334, ans=0.0 2024-09-14 21:36:16,017 INFO [train.py:1198] (0/2) Epoch 11, batch 2300, loss[loss=0.2618, ctc_loss=0.1836, cr_loss=0.3907, over 20658.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1844, cr_loss=0.398, over 4096638.21 frames. ], batch size: 68, lr: 7.71e-03, grad_scale: 32.0 2024-09-14 21:36:16,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=187586.5, ans=10.0 2024-09-14 21:36:44,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187643.16666666666, ans=0.1 2024-09-14 21:36:51,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=187643.16666666666, ans=0.025 2024-09-14 21:36:58,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=22.5 2024-09-14 21:37:01,677 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-14 21:37:30,645 INFO [train.py:1198] (0/2) Epoch 11, batch 2350, loss[loss=0.223, ctc_loss=0.1547, cr_loss=0.3415, over 20974.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1836, cr_loss=0.3965, over 4100280.67 frames. ], batch size: 48, lr: 7.71e-03, grad_scale: 16.0 2024-09-14 21:37:47,226 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.075e+02 2.284e+02 2.653e+02 4.046e+02, threshold=4.567e+02, percent-clipped=0.0 2024-09-14 21:37:53,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=22.5 2024-09-14 21:38:05,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=187784.83333333334, ans=0.07 2024-09-14 21:38:26,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=187813.16666666666, ans=0.125 2024-09-14 21:38:41,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=187841.5, ans=0.125 2024-09-14 21:38:48,557 INFO [train.py:1198] (0/2) Epoch 11, batch 2400, loss[loss=0.2767, ctc_loss=0.1944, cr_loss=0.4111, over 20856.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1836, cr_loss=0.3962, over 4092546.02 frames. ], batch size: 59, lr: 7.71e-03, grad_scale: 32.0 2024-09-14 21:38:53,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=187869.83333333334, ans=0.0 2024-09-14 21:39:02,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=187869.83333333334, ans=0.2 2024-09-14 21:39:35,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=187954.83333333334, ans=0.2 2024-09-14 21:39:58,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=187983.16666666666, ans=0.025 2024-09-14 21:40:01,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2024-09-14 21:40:06,675 INFO [train.py:1198] (0/2) Epoch 11, batch 2450, loss[loss=0.3045, ctc_loss=0.2181, cr_loss=0.4322, over 18071.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.1831, cr_loss=0.3947, over 4088105.33 frames. ], batch size: 108, lr: 7.70e-03, grad_scale: 32.0 2024-09-14 21:40:19,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2024-09-14 21:40:23,126 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.704e+02 1.984e+02 2.150e+02 2.403e+02 5.578e+02, threshold=4.300e+02, percent-clipped=2.0 2024-09-14 21:41:15,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=188124.83333333334, ans=0.125 2024-09-14 21:41:20,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=188153.16666666666, ans=0.5 2024-09-14 21:41:21,418 INFO [train.py:1198] (0/2) Epoch 11, batch 2500, loss[loss=0.2592, ctc_loss=0.1788, cr_loss=0.4018, over 20783.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1828, cr_loss=0.3949, over 4088774.89 frames. ], batch size: 56, lr: 7.70e-03, grad_scale: 32.0 2024-09-14 21:41:32,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=188153.16666666666, ans=0.07 2024-09-14 21:41:44,545 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-14 21:41:48,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=188181.5, ans=0.09899494936611666 2024-09-14 21:41:54,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=188209.83333333334, ans=0.1 2024-09-14 21:42:25,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=188266.5, ans=0.0 2024-09-14 21:42:36,493 INFO [train.py:1198] (0/2) Epoch 11, batch 2550, loss[loss=0.2821, ctc_loss=0.1937, cr_loss=0.442, over 20974.00 frames. ], tot_loss[loss=0.2628, ctc_loss=0.1838, cr_loss=0.3953, over 4085777.32 frames. ], batch size: 64, lr: 7.70e-03, grad_scale: 32.0 2024-09-14 21:42:39,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=188294.83333333334, ans=0.0 2024-09-14 21:42:42,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=188294.83333333334, ans=0.05 2024-09-14 21:42:53,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.038e+02 2.245e+02 2.484e+02 5.799e+02, threshold=4.490e+02, percent-clipped=2.0 2024-09-14 21:43:28,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=188379.83333333334, ans=0.125 2024-09-14 21:43:54,771 INFO [train.py:1198] (0/2) Epoch 11, batch 2600, loss[loss=0.2799, ctc_loss=0.199, cr_loss=0.4042, over 20952.00 frames. ], tot_loss[loss=0.2625, ctc_loss=0.1836, cr_loss=0.3947, over 4084507.53 frames. ], batch size: 64, lr: 7.70e-03, grad_scale: 32.0 2024-09-14 21:44:25,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=188493.16666666666, ans=0.2 2024-09-14 21:45:12,323 INFO [train.py:1198] (0/2) Epoch 11, batch 2650, loss[loss=0.2535, ctc_loss=0.1752, cr_loss=0.3915, over 20796.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1838, cr_loss=0.3957, over 4095538.08 frames. ], batch size: 56, lr: 7.69e-03, grad_scale: 32.0 2024-09-14 21:45:17,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=188578.16666666666, ans=0.125 2024-09-14 21:45:28,924 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.023e+02 2.156e+02 2.344e+02 3.861e+02, threshold=4.313e+02, percent-clipped=0.0 2024-09-14 21:45:56,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=188663.16666666666, ans=0.5 2024-09-14 21:45:58,311 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 21:46:27,969 INFO [train.py:1198] (0/2) Epoch 11, batch 2700, loss[loss=0.2738, ctc_loss=0.1903, cr_loss=0.4176, over 20872.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.183, cr_loss=0.3954, over 4108110.30 frames. ], batch size: 57, lr: 7.69e-03, grad_scale: 32.0 2024-09-14 21:46:47,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=188748.16666666666, ans=0.0 2024-09-14 21:46:52,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=188748.16666666666, ans=0.125 2024-09-14 21:47:14,984 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2024-09-14 21:47:42,714 INFO [train.py:1198] (0/2) Epoch 11, batch 2750, loss[loss=0.2159, ctc_loss=0.1461, cr_loss=0.3492, over 20318.00 frames. ], tot_loss[loss=0.2626, ctc_loss=0.1833, cr_loss=0.3969, over 4109944.50 frames. ], batch size: 45, lr: 7.69e-03, grad_scale: 32.0 2024-09-14 21:47:59,284 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.035e+02 2.175e+02 2.452e+02 3.657e+02, threshold=4.349e+02, percent-clipped=0.0 2024-09-14 21:48:17,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=188918.16666666666, ans=0.125 2024-09-14 21:48:19,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=188918.16666666666, ans=0.125 2024-09-14 21:48:22,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=188918.16666666666, ans=0.125 2024-09-14 21:48:27,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=188946.5, ans=0.2 2024-09-14 21:48:57,611 INFO [train.py:1198] (0/2) Epoch 11, batch 2800, loss[loss=0.2669, ctc_loss=0.1899, cr_loss=0.3849, over 20973.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1832, cr_loss=0.3962, over 4104914.77 frames. ], batch size: 64, lr: 7.68e-03, grad_scale: 32.0 2024-09-14 21:49:08,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=189003.16666666666, ans=0.1 2024-09-14 21:49:21,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=189031.5, ans=0.025 2024-09-14 21:49:32,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=189059.83333333334, ans=0.125 2024-09-14 21:49:49,518 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2024-09-14 21:50:03,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=189116.5, ans=0.125 2024-09-14 21:50:15,632 INFO [train.py:1198] (0/2) Epoch 11, batch 2850, loss[loss=0.2805, ctc_loss=0.1959, cr_loss=0.4226, over 20998.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1841, cr_loss=0.3972, over 4095805.56 frames. ], batch size: 63, lr: 7.68e-03, grad_scale: 16.0 2024-09-14 21:50:32,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=189173.16666666666, ans=0.025 2024-09-14 21:50:36,732 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.063e+02 2.238e+02 2.511e+02 4.868e+02, threshold=4.476e+02, percent-clipped=1.0 2024-09-14 21:50:46,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=189173.16666666666, ans=0.0 2024-09-14 21:51:18,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=189258.16666666666, ans=0.0 2024-09-14 21:51:20,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=189258.16666666666, ans=0.5 2024-09-14 21:51:33,995 INFO [train.py:1198] (0/2) Epoch 11, batch 2900, loss[loss=0.2921, ctc_loss=0.2007, cr_loss=0.4572, over 20839.00 frames. ], tot_loss[loss=0.2643, ctc_loss=0.1847, cr_loss=0.3977, over 4092136.92 frames. ], batch size: 65, lr: 7.68e-03, grad_scale: 16.0 2024-09-14 21:51:39,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=189286.5, ans=0.125 2024-09-14 21:51:48,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=189314.83333333334, ans=0.0 2024-09-14 21:52:03,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=189343.16666666666, ans=0.0 2024-09-14 21:52:07,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189343.16666666666, ans=0.1 2024-09-14 21:52:24,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=189371.5, ans=0.0 2024-09-14 21:52:30,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=189371.5, ans=0.0 2024-09-14 21:52:30,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=189371.5, ans=0.0 2024-09-14 21:52:30,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=189371.5, ans=0.0 2024-09-14 21:52:45,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=189399.83333333334, ans=0.2 2024-09-14 21:52:49,439 INFO [train.py:1198] (0/2) Epoch 11, batch 2950, loss[loss=0.2726, ctc_loss=0.1881, cr_loss=0.4225, over 20840.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1845, cr_loss=0.398, over 4101757.13 frames. ], batch size: 65, lr: 7.68e-03, grad_scale: 16.0 2024-09-14 21:53:07,521 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.053e+02 2.227e+02 2.462e+02 3.658e+02, threshold=4.453e+02, percent-clipped=0.0 2024-09-14 21:53:11,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=189456.5, ans=0.125 2024-09-14 21:53:58,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=189541.5, ans=0.125 2024-09-14 21:54:05,338 INFO [train.py:1198] (0/2) Epoch 11, batch 3000, loss[loss=0.2399, ctc_loss=0.1659, cr_loss=0.3701, over 20792.00 frames. ], tot_loss[loss=0.2645, ctc_loss=0.1849, cr_loss=0.398, over 4094971.46 frames. ], batch size: 53, lr: 7.67e-03, grad_scale: 16.0 2024-09-14 21:54:05,339 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 21:54:22,115 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.6260, 3.7867, 4.5582, 4.5213, 3.9876, 4.5607, 3.4275, 3.6344], device='cuda:0') 2024-09-14 21:54:25,338 INFO [train.py:1230] (0/2) Epoch 11, validation: loss=0.05216, ctc_loss=0.05216, cr_loss=9.981e-15, over 944034.00 frames. 2024-09-14 21:54:25,339 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 21:54:50,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.69 vs. limit=10.0 2024-09-14 21:55:26,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=189654.83333333334, ans=0.02 2024-09-14 21:55:32,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=189683.16666666666, ans=0.2 2024-09-14 21:55:32,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=189683.16666666666, ans=0.125 2024-09-14 21:55:43,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=189711.5, ans=0.125 2024-09-14 21:55:44,552 INFO [train.py:1198] (0/2) Epoch 11, batch 3050, loss[loss=0.2559, ctc_loss=0.1793, cr_loss=0.3832, over 20766.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1845, cr_loss=0.3975, over 4104529.00 frames. ], batch size: 53, lr: 7.67e-03, grad_scale: 16.0 2024-09-14 21:55:52,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189711.5, ans=0.1 2024-09-14 21:56:01,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=189739.83333333334, ans=0.125 2024-09-14 21:56:05,129 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.057e+02 2.201e+02 2.435e+02 3.335e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-14 21:56:11,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=189739.83333333334, ans=0.1 2024-09-14 21:56:29,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=189768.16666666666, ans=0.0 2024-09-14 21:56:31,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-09-14 21:56:33,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=189796.5, ans=0.125 2024-09-14 21:56:38,812 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-09-14 21:56:39,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=189796.5, ans=0.125 2024-09-14 21:56:56,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189824.83333333334, ans=0.1 2024-09-14 21:57:02,451 INFO [train.py:1198] (0/2) Epoch 11, batch 3100, loss[loss=0.2694, ctc_loss=0.187, cr_loss=0.412, over 20647.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1842, cr_loss=0.3961, over 4103069.63 frames. ], batch size: 66, lr: 7.67e-03, grad_scale: 16.0 2024-09-14 21:57:15,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2024-09-14 21:57:17,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=189881.5, ans=0.0 2024-09-14 21:57:37,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=189909.83333333334, ans=0.0 2024-09-14 21:58:08,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=189966.5, ans=0.2 2024-09-14 21:58:18,051 INFO [train.py:1198] (0/2) Epoch 11, batch 3150, loss[loss=0.2635, ctc_loss=0.1859, cr_loss=0.3876, over 20979.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1844, cr_loss=0.3955, over 4087504.89 frames. ], batch size: 55, lr: 7.66e-03, grad_scale: 16.0 2024-09-14 21:58:36,127 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.063e+02 2.220e+02 2.457e+02 5.443e+02, threshold=4.441e+02, percent-clipped=2.0 2024-09-14 21:58:37,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=190023.16666666666, ans=0.2 2024-09-14 21:58:43,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2024-09-14 21:58:54,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=190051.5, ans=10.0 2024-09-14 21:59:28,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2024-09-14 21:59:33,603 INFO [train.py:1198] (0/2) Epoch 11, batch 3200, loss[loss=0.3105, ctc_loss=0.2233, cr_loss=0.4359, over 14226.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1847, cr_loss=0.3969, over 4089979.08 frames. ], batch size: 149, lr: 7.66e-03, grad_scale: 32.0 2024-09-14 21:59:48,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190164.83333333334, ans=0.1 2024-09-14 22:00:25,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.11 vs. limit=15.0 2024-09-14 22:00:34,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=190221.5, ans=0.125 2024-09-14 22:00:44,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=190249.83333333334, ans=0.0 2024-09-14 22:00:49,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-09-14 22:00:51,835 INFO [train.py:1198] (0/2) Epoch 11, batch 3250, loss[loss=0.2315, ctc_loss=0.1612, cr_loss=0.3516, over 21064.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1849, cr_loss=0.3963, over 4083811.11 frames. ], batch size: 53, lr: 7.66e-03, grad_scale: 32.0 2024-09-14 22:00:56,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=190278.16666666666, ans=0.125 2024-09-14 22:00:56,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=190278.16666666666, ans=0.125 2024-09-14 22:00:58,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=190278.16666666666, ans=0.125 2024-09-14 22:01:10,014 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.686e+02 1.992e+02 2.124e+02 2.302e+02 3.184e+02, threshold=4.248e+02, percent-clipped=0.0 2024-09-14 22:01:23,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=190334.83333333334, ans=0.025 2024-09-14 22:01:50,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=190363.16666666666, ans=0.2 2024-09-14 22:02:10,283 INFO [train.py:1198] (0/2) Epoch 11, batch 3300, loss[loss=0.2421, ctc_loss=0.169, cr_loss=0.3656, over 21056.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1843, cr_loss=0.396, over 4096307.60 frames. ], batch size: 56, lr: 7.66e-03, grad_scale: 32.0 2024-09-14 22:02:12,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=12.0 2024-09-14 22:02:33,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=190448.16666666666, ans=0.125 2024-09-14 22:02:35,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=190448.16666666666, ans=0.125 2024-09-14 22:02:48,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190476.5, ans=0.1 2024-09-14 22:03:24,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-09-14 22:03:26,662 INFO [train.py:1198] (0/2) Epoch 11, batch 3350, loss[loss=0.2386, ctc_loss=0.1616, cr_loss=0.3853, over 20962.00 frames. ], tot_loss[loss=0.2628, ctc_loss=0.1837, cr_loss=0.3954, over 4090606.78 frames. ], batch size: 52, lr: 7.65e-03, grad_scale: 32.0 2024-09-14 22:03:43,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=190589.83333333334, ans=0.0 2024-09-14 22:03:44,497 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.145e+02 2.309e+02 2.665e+02 4.687e+02, threshold=4.618e+02, percent-clipped=1.0 2024-09-14 22:03:52,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=190589.83333333334, ans=0.0 2024-09-14 22:04:01,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190618.16666666666, ans=0.1 2024-09-14 22:04:18,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=22.5 2024-09-14 22:04:39,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=190674.83333333334, ans=0.125 2024-09-14 22:04:41,917 INFO [train.py:1198] (0/2) Epoch 11, batch 3400, loss[loss=0.2916, ctc_loss=0.2024, cr_loss=0.446, over 21052.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.1833, cr_loss=0.3953, over 4103337.08 frames. ], batch size: 62, lr: 7.65e-03, grad_scale: 32.0 2024-09-14 22:04:54,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=190703.16666666666, ans=0.0 2024-09-14 22:05:00,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=190731.5, ans=0.125 2024-09-14 22:05:27,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=190788.16666666666, ans=0.125 2024-09-14 22:05:56,893 INFO [train.py:1198] (0/2) Epoch 11, batch 3450, loss[loss=0.2313, ctc_loss=0.1576, cr_loss=0.3685, over 21059.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1842, cr_loss=0.3969, over 4098944.53 frames. ], batch size: 53, lr: 7.65e-03, grad_scale: 32.0 2024-09-14 22:05:59,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.83 vs. limit=22.5 2024-09-14 22:06:16,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=190873.16666666666, ans=0.125 2024-09-14 22:06:17,702 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.076e+02 2.245e+02 2.566e+02 4.508e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-14 22:06:40,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=190901.5, ans=0.2 2024-09-14 22:06:54,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=190929.83333333334, ans=0.125 2024-09-14 22:07:18,369 INFO [train.py:1198] (0/2) Epoch 11, batch 3500, loss[loss=0.2527, ctc_loss=0.1761, cr_loss=0.3832, over 20884.00 frames. ], tot_loss[loss=0.2634, ctc_loss=0.1841, cr_loss=0.3965, over 4095094.16 frames. ], batch size: 54, lr: 7.65e-03, grad_scale: 32.0 2024-09-14 22:07:20,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=190986.5, ans=0.125 2024-09-14 22:07:23,726 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-14 22:08:22,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=191099.83333333334, ans=0.125 2024-09-14 22:08:22,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=191099.83333333334, ans=0.04949747468305833 2024-09-14 22:08:33,677 INFO [train.py:1198] (0/2) Epoch 11, batch 3550, loss[loss=0.2751, ctc_loss=0.1924, cr_loss=0.4135, over 20773.00 frames. ], tot_loss[loss=0.2644, ctc_loss=0.1848, cr_loss=0.3977, over 4098517.12 frames. ], batch size: 56, lr: 7.64e-03, grad_scale: 32.0 2024-09-14 22:08:48,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-14 22:08:51,880 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.034e+02 2.207e+02 2.412e+02 4.031e+02, threshold=4.415e+02, percent-clipped=0.0 2024-09-14 22:08:58,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=191156.5, ans=0.025 2024-09-14 22:09:10,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=191184.83333333334, ans=0.0 2024-09-14 22:09:20,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=191213.16666666666, ans=0.0 2024-09-14 22:09:22,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=191213.16666666666, ans=0.0 2024-09-14 22:09:38,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=191241.5, ans=0.0 2024-09-14 22:09:41,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=191241.5, ans=0.07 2024-09-14 22:09:49,210 INFO [train.py:1198] (0/2) Epoch 11, batch 3600, loss[loss=0.2236, ctc_loss=0.1511, cr_loss=0.3621, over 20970.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.185, cr_loss=0.3979, over 4101049.37 frames. ], batch size: 50, lr: 7.64e-03, grad_scale: 32.0 2024-09-14 22:09:58,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=191269.83333333334, ans=0.0 2024-09-14 22:10:21,253 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.71 vs. limit=15.0 2024-09-14 22:10:27,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=191326.5, ans=0.0 2024-09-14 22:10:28,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=191326.5, ans=0.0 2024-09-14 22:10:39,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=191354.83333333334, ans=0.125 2024-09-14 22:10:48,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2024-09-14 22:11:04,268 INFO [train.py:1198] (0/2) Epoch 11, batch 3650, loss[loss=0.27, ctc_loss=0.192, cr_loss=0.3902, over 20664.00 frames. ], tot_loss[loss=0.2648, ctc_loss=0.1852, cr_loss=0.398, over 4092784.81 frames. ], batch size: 66, lr: 7.64e-03, grad_scale: 32.0 2024-09-14 22:11:07,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=191411.5, ans=0.0 2024-09-14 22:11:09,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=191411.5, ans=0.0 2024-09-14 22:11:10,699 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 22:11:15,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=191411.5, ans=0.125 2024-09-14 22:11:22,479 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.121e+02 2.378e+02 2.732e+02 5.615e+02, threshold=4.755e+02, percent-clipped=3.0 2024-09-14 22:12:13,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=191524.83333333334, ans=0.2 2024-09-14 22:12:14,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=191524.83333333334, ans=0.125 2024-09-14 22:12:22,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=191553.16666666666, ans=0.0 2024-09-14 22:12:23,407 INFO [train.py:1198] (0/2) Epoch 11, batch 3700, loss[loss=0.2297, ctc_loss=0.1592, cr_loss=0.3525, over 20798.00 frames. ], tot_loss[loss=0.2634, ctc_loss=0.1839, cr_loss=0.3976, over 4103567.61 frames. ], batch size: 53, lr: 7.63e-03, grad_scale: 32.0 2024-09-14 22:12:46,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=191581.5, ans=0.0 2024-09-14 22:12:46,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=191581.5, ans=0.125 2024-09-14 22:12:48,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=191581.5, ans=0.125 2024-09-14 22:12:49,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=191581.5, ans=0.125 2024-09-14 22:13:31,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=191666.5, ans=0.0 2024-09-14 22:13:41,892 INFO [train.py:1198] (0/2) Epoch 11, batch 3750, loss[loss=0.2752, ctc_loss=0.1935, cr_loss=0.4088, over 20841.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1832, cr_loss=0.3958, over 4099409.11 frames. ], batch size: 65, lr: 7.63e-03, grad_scale: 32.0 2024-09-14 22:13:56,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=191723.16666666666, ans=0.0 2024-09-14 22:14:00,188 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.133e+02 2.343e+02 2.728e+02 4.551e+02, threshold=4.686e+02, percent-clipped=0.0 2024-09-14 22:14:06,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=191723.16666666666, ans=0.125 2024-09-14 22:14:57,197 INFO [train.py:1198] (0/2) Epoch 11, batch 3800, loss[loss=0.2156, ctc_loss=0.1453, cr_loss=0.3512, over 20968.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.1829, cr_loss=0.3956, over 4100459.49 frames. ], batch size: 48, lr: 7.63e-03, grad_scale: 32.0 2024-09-14 22:15:25,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=15.0 2024-09-14 22:15:55,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.99 vs. limit=10.0 2024-09-14 22:15:56,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2024-09-14 22:16:03,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=191949.83333333334, ans=0.2 2024-09-14 22:16:05,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=191949.83333333334, ans=0.125 2024-09-14 22:16:12,637 INFO [train.py:1198] (0/2) Epoch 11, batch 3850, loss[loss=0.2529, ctc_loss=0.176, cr_loss=0.3844, over 21016.00 frames. ], tot_loss[loss=0.2619, ctc_loss=0.1829, cr_loss=0.3949, over 4089169.23 frames. ], batch size: 63, lr: 7.63e-03, grad_scale: 32.0 2024-09-14 22:16:30,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.082e+02 2.386e+02 2.720e+02 5.191e+02, threshold=4.772e+02, percent-clipped=1.0 2024-09-14 22:17:30,770 INFO [train.py:1198] (0/2) Epoch 11, batch 3900, loss[loss=0.2301, ctc_loss=0.1547, cr_loss=0.3771, over 21049.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1825, cr_loss=0.3949, over 4095559.36 frames. ], batch size: 53, lr: 7.62e-03, grad_scale: 32.0 2024-09-14 22:17:55,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=192148.16666666666, ans=0.0 2024-09-14 22:17:57,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2024-09-14 22:18:04,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192176.5, ans=0.1 2024-09-14 22:18:36,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=192233.16666666666, ans=0.0 2024-09-14 22:18:45,032 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 22:18:49,268 INFO [train.py:1198] (0/2) Epoch 11, batch 3950, loss[loss=0.2707, ctc_loss=0.1888, cr_loss=0.4091, over 21024.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1826, cr_loss=0.3956, over 4107836.05 frames. ], batch size: 63, lr: 7.62e-03, grad_scale: 32.0 2024-09-14 22:19:04,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=192289.83333333334, ans=0.125 2024-09-14 22:19:07,079 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.695e+02 2.031e+02 2.161e+02 2.445e+02 3.958e+02, threshold=4.322e+02, percent-clipped=0.0 2024-09-14 22:19:44,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=192346.5, ans=0.0 2024-09-14 22:19:59,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=192374.83333333334, ans=0.0 2024-09-14 22:20:05,157 INFO [train.py:1198] (0/2) Epoch 11, batch 4000, loss[loss=0.2414, ctc_loss=0.166, cr_loss=0.3769, over 21002.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1819, cr_loss=0.3942, over 4113848.96 frames. ], batch size: 55, lr: 7.62e-03, grad_scale: 32.0 2024-09-14 22:20:17,435 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=12.0 2024-09-14 22:20:50,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=12.0 2024-09-14 22:21:05,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=192516.5, ans=0.125 2024-09-14 22:21:19,989 INFO [train.py:1198] (0/2) Epoch 11, batch 4050, loss[loss=0.2493, ctc_loss=0.1762, cr_loss=0.3655, over 21030.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1827, cr_loss=0.3943, over 4090619.26 frames. ], batch size: 63, lr: 7.61e-03, grad_scale: 32.0 2024-09-14 22:21:38,007 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.075e+02 2.181e+02 2.376e+02 3.176e+02, threshold=4.362e+02, percent-clipped=0.0 2024-09-14 22:21:39,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=192573.16666666666, ans=0.2 2024-09-14 22:21:47,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=12.0 2024-09-14 22:22:17,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=192629.83333333334, ans=0.125 2024-09-14 22:22:19,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=192658.16666666666, ans=0.07 2024-09-14 22:22:23,920 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-68000.pt 2024-09-14 22:22:31,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=192658.16666666666, ans=0.125 2024-09-14 22:22:36,845 INFO [train.py:1198] (0/2) Epoch 11, batch 4100, loss[loss=0.2346, ctc_loss=0.1624, cr_loss=0.361, over 20962.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.1829, cr_loss=0.3954, over 4099269.47 frames. ], batch size: 50, lr: 7.61e-03, grad_scale: 32.0 2024-09-14 22:22:41,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=192686.5, ans=0.0 2024-09-14 22:22:45,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=192686.5, ans=0.2 2024-09-14 22:23:09,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=192743.16666666666, ans=0.025 2024-09-14 22:23:12,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=192743.16666666666, ans=0.0 2024-09-14 22:23:20,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=192743.16666666666, ans=0.2 2024-09-14 22:23:41,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=192799.83333333334, ans=0.125 2024-09-14 22:23:54,808 INFO [train.py:1198] (0/2) Epoch 11, batch 4150, loss[loss=0.2783, ctc_loss=0.1931, cr_loss=0.4262, over 20859.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1822, cr_loss=0.3941, over 4091739.87 frames. ], batch size: 57, lr: 7.61e-03, grad_scale: 32.0 2024-09-14 22:24:07,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=192828.16666666666, ans=0.125 2024-09-14 22:24:11,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=192856.5, ans=0.125 2024-09-14 22:24:12,951 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.063e+02 2.243e+02 2.473e+02 4.794e+02, threshold=4.485e+02, percent-clipped=1.0 2024-09-14 22:24:41,361 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=22.5 2024-09-14 22:25:01,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=192941.5, ans=0.0 2024-09-14 22:25:13,699 INFO [train.py:1198] (0/2) Epoch 11, batch 4200, loss[loss=0.2832, ctc_loss=0.1995, cr_loss=0.4186, over 20376.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1824, cr_loss=0.3943, over 4092353.56 frames. ], batch size: 74, lr: 7.61e-03, grad_scale: 32.0 2024-09-14 22:25:27,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=192998.16666666666, ans=0.1 2024-09-14 22:25:30,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=192998.16666666666, ans=0.125 2024-09-14 22:25:39,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=192998.16666666666, ans=0.2 2024-09-14 22:25:40,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=192998.16666666666, ans=0.125 2024-09-14 22:25:54,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=193026.5, ans=0.04949747468305833 2024-09-14 22:26:25,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=193083.16666666666, ans=0.0 2024-09-14 22:26:28,301 INFO [train.py:1198] (0/2) Epoch 11, batch 4250, loss[loss=0.2282, ctc_loss=0.1565, cr_loss=0.3589, over 20898.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1815, cr_loss=0.3928, over 4097009.33 frames. ], batch size: 54, lr: 7.60e-03, grad_scale: 16.0 2024-09-14 22:26:28,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=193111.5, ans=0.025 2024-09-14 22:26:47,942 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.033e+02 2.225e+02 2.442e+02 5.640e+02, threshold=4.450e+02, percent-clipped=1.0 2024-09-14 22:26:54,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=193139.83333333334, ans=0.125 2024-09-14 22:27:21,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193196.5, ans=0.1 2024-09-14 22:27:35,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=193224.83333333334, ans=0.125 2024-09-14 22:27:43,883 INFO [train.py:1198] (0/2) Epoch 11, batch 4300, loss[loss=0.2368, ctc_loss=0.1612, cr_loss=0.3778, over 20791.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1815, cr_loss=0.3929, over 4111261.50 frames. ], batch size: 53, lr: 7.60e-03, grad_scale: 16.0 2024-09-14 22:28:50,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=193366.5, ans=0.2 2024-09-14 22:28:53,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=193366.5, ans=0.125 2024-09-14 22:28:59,145 INFO [train.py:1198] (0/2) Epoch 11, batch 4350, loss[loss=0.2555, ctc_loss=0.1782, cr_loss=0.3865, over 20940.00 frames. ], tot_loss[loss=0.2604, ctc_loss=0.1817, cr_loss=0.3932, over 4109242.13 frames. ], batch size: 60, lr: 7.60e-03, grad_scale: 16.0 2024-09-14 22:29:11,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=193394.83333333334, ans=0.0 2024-09-14 22:29:19,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=12.0 2024-09-14 22:29:21,636 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.735e+02 2.059e+02 2.250e+02 2.557e+02 4.118e+02, threshold=4.500e+02, percent-clipped=0.0 2024-09-14 22:29:42,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=193451.5, ans=0.0 2024-09-14 22:29:59,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=193479.83333333334, ans=0.125 2024-09-14 22:30:20,480 INFO [train.py:1198] (0/2) Epoch 11, batch 4400, loss[loss=0.2334, ctc_loss=0.1607, cr_loss=0.3635, over 20984.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1814, cr_loss=0.3928, over 4108254.38 frames. ], batch size: 50, lr: 7.60e-03, grad_scale: 32.0 2024-09-14 22:30:58,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=193593.16666666666, ans=0.125 2024-09-14 22:31:29,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=193649.83333333334, ans=0.0 2024-09-14 22:31:36,549 INFO [train.py:1198] (0/2) Epoch 11, batch 4450, loss[loss=0.2303, ctc_loss=0.1616, cr_loss=0.3434, over 20964.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1822, cr_loss=0.394, over 4097907.82 frames. ], batch size: 51, lr: 7.59e-03, grad_scale: 32.0 2024-09-14 22:31:42,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=193678.16666666666, ans=0.0 2024-09-14 22:31:47,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=193678.16666666666, ans=0.125 2024-09-14 22:31:53,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-14 22:31:55,939 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.076e+02 2.170e+02 2.410e+02 3.388e+02, threshold=4.341e+02, percent-clipped=0.0 2024-09-14 22:32:25,102 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-09-14 22:32:50,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=193819.83333333334, ans=0.025 2024-09-14 22:32:51,471 INFO [train.py:1198] (0/2) Epoch 11, batch 4500, loss[loss=0.2907, ctc_loss=0.2062, cr_loss=0.4227, over 20178.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1821, cr_loss=0.3942, over 4107227.21 frames. ], batch size: 74, lr: 7.59e-03, grad_scale: 32.0 2024-09-14 22:33:18,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=193848.16666666666, ans=0.0 2024-09-14 22:33:42,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=193904.83333333334, ans=0.0 2024-09-14 22:34:06,554 INFO [train.py:1198] (0/2) Epoch 11, batch 4550, loss[loss=0.2646, ctc_loss=0.1822, cr_loss=0.4124, over 20991.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1827, cr_loss=0.3952, over 4096099.02 frames. ], batch size: 58, lr: 7.59e-03, grad_scale: 32.0 2024-09-14 22:34:26,152 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.021e+02 2.156e+02 2.337e+02 5.025e+02, threshold=4.313e+02, percent-clipped=1.0 2024-09-14 22:35:24,163 INFO [train.py:1198] (0/2) Epoch 11, batch 4600, loss[loss=0.2972, ctc_loss=0.2126, cr_loss=0.4229, over 20712.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.1832, cr_loss=0.3957, over 4097918.03 frames. ], batch size: 71, lr: 7.58e-03, grad_scale: 32.0 2024-09-14 22:36:28,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=194216.5, ans=0.125 2024-09-14 22:36:36,533 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2024-09-14 22:36:41,921 INFO [train.py:1198] (0/2) Epoch 11, batch 4650, loss[loss=0.2723, ctc_loss=0.1908, cr_loss=0.4074, over 20828.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1842, cr_loss=0.3969, over 4078303.37 frames. ], batch size: 59, lr: 7.58e-03, grad_scale: 32.0 2024-09-14 22:37:01,566 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.092e+02 2.261e+02 2.510e+02 4.190e+02, threshold=4.522e+02, percent-clipped=0.0 2024-09-14 22:37:57,002 INFO [train.py:1198] (0/2) Epoch 11, batch 4700, loss[loss=0.2866, ctc_loss=0.2057, cr_loss=0.4044, over 20271.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.1835, cr_loss=0.396, over 4087899.60 frames. ], batch size: 74, lr: 7.58e-03, grad_scale: 32.0 2024-09-14 22:37:57,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-14 22:37:58,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=194386.5, ans=0.125 2024-09-14 22:38:24,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=194414.83333333334, ans=0.0 2024-09-14 22:38:33,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=194443.16666666666, ans=0.125 2024-09-14 22:38:34,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=194443.16666666666, ans=0.05 2024-09-14 22:38:36,833 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.44 vs. limit=15.0 2024-09-14 22:39:12,309 INFO [train.py:1198] (0/2) Epoch 11, batch 4750, loss[loss=0.2744, ctc_loss=0.1923, cr_loss=0.4105, over 21020.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1816, cr_loss=0.3933, over 4100603.80 frames. ], batch size: 61, lr: 7.58e-03, grad_scale: 32.0 2024-09-14 22:39:31,988 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.095e+02 2.360e+02 2.671e+02 3.932e+02, threshold=4.721e+02, percent-clipped=0.0 2024-09-14 22:39:34,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=194556.5, ans=0.125 2024-09-14 22:40:11,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=194641.5, ans=10.0 2024-09-14 22:40:31,072 INFO [train.py:1198] (0/2) Epoch 11, batch 4800, loss[loss=0.2352, ctc_loss=0.1635, cr_loss=0.3582, over 20956.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1809, cr_loss=0.3929, over 4102565.74 frames. ], batch size: 51, lr: 7.57e-03, grad_scale: 32.0 2024-09-14 22:40:32,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194669.83333333334, ans=0.1 2024-09-14 22:41:06,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=194726.5, ans=0.125 2024-09-14 22:41:33,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=194783.16666666666, ans=0.025 2024-09-14 22:41:47,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.58 vs. limit=5.0 2024-09-14 22:41:49,691 INFO [train.py:1198] (0/2) Epoch 11, batch 4850, loss[loss=0.2758, ctc_loss=0.1924, cr_loss=0.4168, over 20641.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1812, cr_loss=0.3924, over 4091397.91 frames. ], batch size: 66, lr: 7.57e-03, grad_scale: 32.0 2024-09-14 22:41:51,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194811.5, ans=0.1 2024-09-14 22:41:53,566 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=12.0 2024-09-14 22:42:02,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-09-14 22:42:09,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.035e+02 2.137e+02 2.381e+02 4.678e+02, threshold=4.274e+02, percent-clipped=0.0 2024-09-14 22:42:25,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-14 22:42:27,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=194868.16666666666, ans=0.125 2024-09-14 22:42:30,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=194868.16666666666, ans=0.125 2024-09-14 22:42:31,490 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-14 22:42:55,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=194924.83333333334, ans=0.025 2024-09-14 22:43:05,485 INFO [train.py:1198] (0/2) Epoch 11, batch 4900, loss[loss=0.3085, ctc_loss=0.2234, cr_loss=0.4257, over 17913.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1813, cr_loss=0.3933, over 4097970.30 frames. ], batch size: 108, lr: 7.57e-03, grad_scale: 32.0 2024-09-14 22:43:14,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=194953.16666666666, ans=0.125 2024-09-14 22:43:26,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=194981.5, ans=0.0 2024-09-14 22:43:38,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=195009.83333333334, ans=0.0 2024-09-14 22:43:43,419 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=12.0 2024-09-14 22:43:47,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=195009.83333333334, ans=0.125 2024-09-14 22:43:47,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=195009.83333333334, ans=0.0 2024-09-14 22:44:00,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.91 vs. limit=15.0 2024-09-14 22:44:04,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.18 vs. limit=22.5 2024-09-14 22:44:16,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=195066.5, ans=0.07 2024-09-14 22:44:20,313 INFO [train.py:1198] (0/2) Epoch 11, batch 4950, loss[loss=0.2663, ctc_loss=0.1848, cr_loss=0.4074, over 20947.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1814, cr_loss=0.3944, over 4106935.01 frames. ], batch size: 60, lr: 7.57e-03, grad_scale: 32.0 2024-09-14 22:44:23,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=195094.83333333334, ans=0.02 2024-09-14 22:44:39,792 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.117e+02 2.305e+02 2.605e+02 3.873e+02, threshold=4.611e+02, percent-clipped=0.0 2024-09-14 22:44:52,372 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-09-14 22:45:35,061 INFO [train.py:1198] (0/2) Epoch 11, batch 5000, loss[loss=0.2868, ctc_loss=0.2002, cr_loss=0.4329, over 20963.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1814, cr_loss=0.3942, over 4105858.21 frames. ], batch size: 58, lr: 7.56e-03, grad_scale: 32.0 2024-09-14 22:45:36,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=195236.5, ans=0.2 2024-09-14 22:45:55,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=195264.83333333334, ans=0.125 2024-09-14 22:46:00,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195264.83333333334, ans=0.1 2024-09-14 22:46:32,230 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.08 vs. limit=10.0 2024-09-14 22:46:49,009 INFO [train.py:1198] (0/2) Epoch 11, batch 5050, loss[loss=0.2729, ctc_loss=0.1907, cr_loss=0.4111, over 20946.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.182, cr_loss=0.3948, over 4099703.16 frames. ], batch size: 60, lr: 7.56e-03, grad_scale: 32.0 2024-09-14 22:47:08,293 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.103e+02 2.237e+02 2.505e+02 5.290e+02, threshold=4.475e+02, percent-clipped=1.0 2024-09-14 22:48:06,006 INFO [train.py:1198] (0/2) Epoch 11, batch 5100, loss[loss=0.255, ctc_loss=0.1767, cr_loss=0.3915, over 21088.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1827, cr_loss=0.3956, over 4084821.95 frames. ], batch size: 59, lr: 7.56e-03, grad_scale: 32.0 2024-09-14 22:48:06,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=195519.83333333334, ans=0.125 2024-09-14 22:48:11,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=12.0 2024-09-14 22:48:36,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.63 vs. limit=15.0 2024-09-14 22:48:47,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195576.5, ans=0.1 2024-09-14 22:48:50,663 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-09-14 22:49:06,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=195633.16666666666, ans=0.025 2024-09-14 22:49:19,257 INFO [train.py:1198] (0/2) Epoch 11, batch 5150, loss[loss=0.2627, ctc_loss=0.1813, cr_loss=0.4072, over 20996.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.183, cr_loss=0.3958, over 4082987.42 frames. ], batch size: 63, lr: 7.55e-03, grad_scale: 32.0 2024-09-14 22:49:21,428 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.23 vs. limit=10.0 2024-09-14 22:49:38,852 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.067e+02 2.222e+02 2.445e+02 7.397e+02, threshold=4.443e+02, percent-clipped=1.0 2024-09-14 22:49:40,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=195689.83333333334, ans=0.125 2024-09-14 22:49:49,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=195718.16666666666, ans=0.125 2024-09-14 22:50:31,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=195774.83333333334, ans=22.5 2024-09-14 22:50:36,769 INFO [train.py:1198] (0/2) Epoch 11, batch 5200, loss[loss=0.2292, ctc_loss=0.158, cr_loss=0.3561, over 20953.00 frames. ], tot_loss[loss=0.2633, ctc_loss=0.1839, cr_loss=0.3974, over 4082616.09 frames. ], batch size: 50, lr: 7.55e-03, grad_scale: 32.0 2024-09-14 22:50:38,424 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 22:51:33,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=195888.16666666666, ans=0.1 2024-09-14 22:51:33,911 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=22.5 2024-09-14 22:51:36,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-09-14 22:51:50,781 INFO [train.py:1198] (0/2) Epoch 11, batch 5250, loss[loss=0.2864, ctc_loss=0.2047, cr_loss=0.4087, over 18108.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.1831, cr_loss=0.3954, over 4079816.75 frames. ], batch size: 108, lr: 7.55e-03, grad_scale: 16.0 2024-09-14 22:52:02,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=195944.83333333334, ans=0.2 2024-09-14 22:52:11,370 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.049e+02 2.194e+02 2.440e+02 5.103e+02, threshold=4.387e+02, percent-clipped=1.0 2024-09-14 22:52:13,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=195973.16666666666, ans=0.0 2024-09-14 22:52:13,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=195973.16666666666, ans=0.0 2024-09-14 22:52:16,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=195973.16666666666, ans=0.0 2024-09-14 22:52:32,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2024-09-14 22:53:05,857 INFO [train.py:1198] (0/2) Epoch 11, batch 5300, loss[loss=0.2329, ctc_loss=0.1595, cr_loss=0.3673, over 21070.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1818, cr_loss=0.3938, over 4099348.16 frames. ], batch size: 53, lr: 7.55e-03, grad_scale: 16.0 2024-09-14 22:53:10,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=196086.5, ans=0.0 2024-09-14 22:53:13,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=196086.5, ans=0.125 2024-09-14 22:53:44,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=196143.16666666666, ans=0.125 2024-09-14 22:53:56,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=196171.5, ans=0.0 2024-09-14 22:54:08,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=196199.83333333334, ans=0.125 2024-09-14 22:54:19,784 INFO [train.py:1198] (0/2) Epoch 11, batch 5350, loss[loss=0.3095, ctc_loss=0.229, cr_loss=0.4027, over 14549.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.182, cr_loss=0.394, over 4094201.87 frames. ], batch size: 149, lr: 7.54e-03, grad_scale: 16.0 2024-09-14 22:54:40,135 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.012e+02 2.142e+02 2.285e+02 4.157e+02, threshold=4.283e+02, percent-clipped=0.0 2024-09-14 22:55:29,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=196341.5, ans=0.125 2024-09-14 22:55:33,563 INFO [train.py:1198] (0/2) Epoch 11, batch 5400, loss[loss=0.2637, ctc_loss=0.184, cr_loss=0.3982, over 21057.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1825, cr_loss=0.3945, over 4098727.28 frames. ], batch size: 59, lr: 7.54e-03, grad_scale: 16.0 2024-09-14 22:55:53,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=196398.16666666666, ans=15.0 2024-09-14 22:56:15,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=196426.5, ans=0.025 2024-09-14 22:56:39,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=196483.16666666666, ans=0.2 2024-09-14 22:56:47,786 INFO [train.py:1198] (0/2) Epoch 11, batch 5450, loss[loss=0.247, ctc_loss=0.1731, cr_loss=0.3692, over 20816.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.183, cr_loss=0.395, over 4105447.54 frames. ], batch size: 59, lr: 7.54e-03, grad_scale: 16.0 2024-09-14 22:57:11,381 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.050e+02 2.185e+02 2.424e+02 3.814e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-14 22:57:41,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.04 vs. limit=15.0 2024-09-14 22:58:03,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=196653.16666666666, ans=0.0 2024-09-14 22:58:04,562 INFO [train.py:1198] (0/2) Epoch 11, batch 5500, loss[loss=0.2441, ctc_loss=0.1681, cr_loss=0.3804, over 20794.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.1834, cr_loss=0.3965, over 4095404.78 frames. ], batch size: 53, lr: 7.54e-03, grad_scale: 16.0 2024-09-14 22:58:10,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=196653.16666666666, ans=0.0 2024-09-14 22:58:19,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=196681.5, ans=0.125 2024-09-14 22:58:48,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=196738.16666666666, ans=0.125 2024-09-14 22:58:58,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=196738.16666666666, ans=0.125 2024-09-14 22:59:06,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=196766.5, ans=0.125 2024-09-14 22:59:13,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=196766.5, ans=0.125 2024-09-14 22:59:21,253 INFO [train.py:1198] (0/2) Epoch 11, batch 5550, loss[loss=0.3124, ctc_loss=0.2358, cr_loss=0.3828, over 13786.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1843, cr_loss=0.3978, over 4090034.55 frames. ], batch size: 150, lr: 7.53e-03, grad_scale: 16.0 2024-09-14 22:59:41,979 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.051e+02 2.210e+02 2.430e+02 4.499e+02, threshold=4.420e+02, percent-clipped=1.0 2024-09-14 22:59:54,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=196851.5, ans=0.125 2024-09-14 23:00:06,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=196879.83333333334, ans=0.025 2024-09-14 23:00:15,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=196879.83333333334, ans=0.125 2024-09-14 23:00:35,301 INFO [train.py:1198] (0/2) Epoch 11, batch 5600, loss[loss=0.2622, ctc_loss=0.1834, cr_loss=0.394, over 20833.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1837, cr_loss=0.3973, over 4092277.86 frames. ], batch size: 59, lr: 7.53e-03, grad_scale: 32.0 2024-09-14 23:01:14,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=196993.16666666666, ans=0.025 2024-09-14 23:01:40,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=197049.83333333334, ans=0.05 2024-09-14 23:01:46,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=197049.83333333334, ans=0.0 2024-09-14 23:01:48,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=197078.16666666666, ans=0.125 2024-09-14 23:01:49,569 INFO [train.py:1198] (0/2) Epoch 11, batch 5650, loss[loss=0.2764, ctc_loss=0.1921, cr_loss=0.4218, over 20848.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1836, cr_loss=0.397, over 4088408.15 frames. ], batch size: 65, lr: 7.53e-03, grad_scale: 16.0 2024-09-14 23:02:04,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2024-09-14 23:02:11,750 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.693e+02 2.041e+02 2.164e+02 2.431e+02 3.749e+02, threshold=4.329e+02, percent-clipped=0.0 2024-09-14 23:02:41,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197163.16666666666, ans=0.1 2024-09-14 23:02:43,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=197163.16666666666, ans=0.0 2024-09-14 23:02:43,712 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-14 23:02:46,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=197163.16666666666, ans=0.04949747468305833 2024-09-14 23:02:47,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=197191.5, ans=0.0 2024-09-14 23:03:04,344 INFO [train.py:1198] (0/2) Epoch 11, batch 5700, loss[loss=0.3006, ctc_loss=0.2127, cr_loss=0.4393, over 20957.00 frames. ], tot_loss[loss=0.2622, ctc_loss=0.183, cr_loss=0.396, over 4083946.69 frames. ], batch size: 64, lr: 7.52e-03, grad_scale: 16.0 2024-09-14 23:03:07,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=197219.83333333334, ans=0.0 2024-09-14 23:03:20,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=197248.16666666666, ans=0.125 2024-09-14 23:03:22,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=197248.16666666666, ans=0.125 2024-09-14 23:03:32,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=197276.5, ans=0.125 2024-09-14 23:03:44,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=197276.5, ans=0.2 2024-09-14 23:04:01,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-09-14 23:04:18,501 INFO [train.py:1198] (0/2) Epoch 11, batch 5750, loss[loss=0.3027, ctc_loss=0.2188, cr_loss=0.4196, over 18135.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1835, cr_loss=0.3967, over 4081380.71 frames. ], batch size: 108, lr: 7.52e-03, grad_scale: 16.0 2024-09-14 23:04:40,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.036e+02 2.196e+02 2.468e+02 8.245e+02, threshold=4.392e+02, percent-clipped=1.0 2024-09-14 23:05:19,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=197474.83333333334, ans=0.125 2024-09-14 23:05:27,283 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=12.0 2024-09-14 23:05:32,588 INFO [train.py:1198] (0/2) Epoch 11, batch 5800, loss[loss=0.2382, ctc_loss=0.168, cr_loss=0.3509, over 21070.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1838, cr_loss=0.3967, over 4074747.70 frames. ], batch size: 59, lr: 7.52e-03, grad_scale: 16.0 2024-09-14 23:05:44,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=197503.16666666666, ans=10.0 2024-09-14 23:06:28,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=197588.16666666666, ans=0.0 2024-09-14 23:06:44,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197616.5, ans=0.1 2024-09-14 23:06:48,726 INFO [train.py:1198] (0/2) Epoch 11, batch 5850, loss[loss=0.2951, ctc_loss=0.2036, cr_loss=0.4576, over 20698.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1836, cr_loss=0.3969, over 4084133.21 frames. ], batch size: 68, lr: 7.52e-03, grad_scale: 16.0 2024-09-14 23:06:59,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=197644.83333333334, ans=0.0 2024-09-14 23:07:05,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=197673.16666666666, ans=0.0 2024-09-14 23:07:10,753 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.049e+02 2.261e+02 2.520e+02 4.441e+02, threshold=4.521e+02, percent-clipped=1.0 2024-09-14 23:07:15,948 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=22.5 2024-09-14 23:07:20,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=197701.5, ans=0.125 2024-09-14 23:07:27,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=197701.5, ans=0.125 2024-09-14 23:07:32,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=197729.83333333334, ans=0.125 2024-09-14 23:07:38,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=197729.83333333334, ans=0.0 2024-09-14 23:07:44,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=22.5 2024-09-14 23:07:52,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=197758.16666666666, ans=0.0 2024-09-14 23:07:59,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=197758.16666666666, ans=0.0 2024-09-14 23:08:05,399 INFO [train.py:1198] (0/2) Epoch 11, batch 5900, loss[loss=0.2996, ctc_loss=0.2151, cr_loss=0.4225, over 20146.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1837, cr_loss=0.3968, over 4082270.57 frames. ], batch size: 80, lr: 7.51e-03, grad_scale: 16.0 2024-09-14 23:08:05,719 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 23:08:16,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-09-14 23:08:48,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=197871.5, ans=0.2 2024-09-14 23:08:50,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=197871.5, ans=0.0 2024-09-14 23:08:56,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=197871.5, ans=0.125 2024-09-14 23:09:13,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=197899.83333333334, ans=0.035 2024-09-14 23:09:19,535 INFO [train.py:1198] (0/2) Epoch 11, batch 5950, loss[loss=0.2943, ctc_loss=0.2062, cr_loss=0.4402, over 20942.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1836, cr_loss=0.3965, over 4077218.39 frames. ], batch size: 60, lr: 7.51e-03, grad_scale: 16.0 2024-09-14 23:09:41,489 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.739e+02 2.187e+02 2.368e+02 2.660e+02 5.289e+02, threshold=4.737e+02, percent-clipped=1.0 2024-09-14 23:09:47,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=197984.83333333334, ans=0.125 2024-09-14 23:10:15,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=198013.16666666666, ans=0.125 2024-09-14 23:10:33,211 INFO [train.py:1198] (0/2) Epoch 11, batch 6000, loss[loss=0.2501, ctc_loss=0.1749, cr_loss=0.3762, over 21037.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1821, cr_loss=0.3949, over 4093400.74 frames. ], batch size: 62, lr: 7.51e-03, grad_scale: 32.0 2024-09-14 23:10:33,212 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 23:10:54,092 INFO [train.py:1230] (0/2) Epoch 11, validation: loss=0.05224, ctc_loss=0.05224, cr_loss=9.702e-15, over 944034.00 frames. 2024-09-14 23:10:54,092 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 23:11:00,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=198069.83333333334, ans=0.125 2024-09-14 23:11:10,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=198098.16666666666, ans=0.0 2024-09-14 23:11:28,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198126.5, ans=0.1 2024-09-14 23:12:07,828 INFO [train.py:1198] (0/2) Epoch 11, batch 6050, loss[loss=0.2986, ctc_loss=0.2086, cr_loss=0.4502, over 21012.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1821, cr_loss=0.3941, over 4083611.33 frames. ], batch size: 63, lr: 7.51e-03, grad_scale: 32.0 2024-09-14 23:12:12,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=198211.5, ans=0.125 2024-09-14 23:12:30,012 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.761e+02 2.042e+02 2.233e+02 2.421e+02 4.039e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-14 23:12:41,915 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.07 vs. limit=15.0 2024-09-14 23:12:49,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=198268.16666666666, ans=0.5 2024-09-14 23:13:17,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=198324.83333333334, ans=0.0 2024-09-14 23:13:22,621 INFO [train.py:1198] (0/2) Epoch 11, batch 6100, loss[loss=0.2634, ctc_loss=0.1841, cr_loss=0.3968, over 20269.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1815, cr_loss=0.3932, over 4080026.84 frames. ], batch size: 74, lr: 7.50e-03, grad_scale: 32.0 2024-09-14 23:13:30,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=198353.16666666666, ans=0.125 2024-09-14 23:13:59,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=198409.83333333334, ans=0.1 2024-09-14 23:14:13,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=198438.16666666666, ans=0.0 2024-09-14 23:14:36,746 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 23:14:37,852 INFO [train.py:1198] (0/2) Epoch 11, batch 6150, loss[loss=0.2364, ctc_loss=0.166, cr_loss=0.3516, over 20796.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1816, cr_loss=0.3931, over 4081986.33 frames. ], batch size: 53, lr: 7.50e-03, grad_scale: 32.0 2024-09-14 23:14:50,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=198494.83333333334, ans=0.0 2024-09-14 23:15:00,100 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.029e+02 2.171e+02 2.361e+02 3.596e+02, threshold=4.341e+02, percent-clipped=0.0 2024-09-14 23:15:06,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198551.5, ans=0.1 2024-09-14 23:15:42,334 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 23:15:53,428 INFO [train.py:1198] (0/2) Epoch 11, batch 6200, loss[loss=0.2347, ctc_loss=0.1599, cr_loss=0.3741, over 20378.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1815, cr_loss=0.392, over 4065506.45 frames. ], batch size: 45, lr: 7.50e-03, grad_scale: 32.0 2024-09-14 23:17:06,034 INFO [train.py:1198] (0/2) Epoch 11, batch 6250, loss[loss=0.2846, ctc_loss=0.1931, cr_loss=0.4573, over 20958.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1825, cr_loss=0.3924, over 4030522.11 frames. ], batch size: 64, lr: 7.50e-03, grad_scale: 32.0 2024-09-14 23:17:12,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198778.16666666666, ans=0.1 2024-09-14 23:17:20,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198806.5, ans=0.1 2024-09-14 23:17:27,612 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.085e+02 2.323e+02 2.630e+02 3.401e+02, threshold=4.647e+02, percent-clipped=0.0 2024-09-14 23:17:35,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=198834.83333333334, ans=0.0 2024-09-14 23:17:45,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=198834.83333333334, ans=0.125 2024-09-14 23:17:53,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=198863.16666666666, ans=0.5 2024-09-14 23:18:18,754 INFO [train.py:1198] (0/2) Epoch 11, batch 6300, loss[loss=0.2982, ctc_loss=0.2137, cr_loss=0.4226, over 20046.00 frames. ], tot_loss[loss=0.2658, ctc_loss=0.1867, cr_loss=0.3958, over 3957926.71 frames. ], batch size: 80, lr: 7.49e-03, grad_scale: 32.0 2024-09-14 23:19:28,446 INFO [train.py:1198] (0/2) Epoch 11, batch 6350, loss[loss=0.3419, ctc_loss=0.2581, cr_loss=0.4192, over 14327.00 frames. ], tot_loss[loss=0.2743, ctc_loss=0.1945, cr_loss=0.399, over 3742987.08 frames. ], batch size: 150, lr: 7.49e-03, grad_scale: 32.0 2024-09-14 23:19:49,898 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.274e+02 2.540e+02 2.740e+02 3.993e+02, threshold=5.080e+02, percent-clipped=0.0 2024-09-14 23:20:07,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=199118.16666666666, ans=0.09899494936611666 2024-09-14 23:20:12,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=199146.5, ans=0.0 2024-09-14 23:20:26,581 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-11.pt 2024-09-14 23:21:14,916 INFO [train.py:1198] (0/2) Epoch 12, batch 0, loss[loss=0.2663, ctc_loss=0.1826, cr_loss=0.4182, over 20830.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.1826, cr_loss=0.4182, over 20830.00 frames. ], batch size: 59, lr: 7.17e-03, grad_scale: 32.0 2024-09-14 23:21:14,917 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-14 23:21:28,175 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.4159, 5.3991, 5.3634, 4.9417], device='cuda:0') 2024-09-14 23:21:33,497 INFO [train.py:1230] (0/2) Epoch 12, validation: loss=0.0521, ctc_loss=0.0521, cr_loss=1e-14, over 944034.00 frames. 2024-09-14 23:21:33,497 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-14 23:21:58,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=199206.0, ans=0.2 2024-09-14 23:22:11,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=199234.33333333334, ans=0.2 2024-09-14 23:22:35,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=199291.0, ans=0.1 2024-09-14 23:22:41,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=199291.0, ans=0.125 2024-09-14 23:22:51,658 INFO [train.py:1198] (0/2) Epoch 12, batch 50, loss[loss=0.3177, ctc_loss=0.2242, cr_loss=0.4672, over 18175.00 frames. ], tot_loss[loss=0.2625, ctc_loss=0.183, cr_loss=0.3976, over 912728.70 frames. ], batch size: 108, lr: 7.17e-03, grad_scale: 32.0 2024-09-14 23:23:01,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2024-09-14 23:23:11,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=199347.66666666666, ans=0.0 2024-09-14 23:23:13,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-14 23:23:16,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=199347.66666666666, ans=0.0 2024-09-14 23:23:16,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=15.0 2024-09-14 23:23:16,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-09-14 23:23:27,778 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.054e+02 2.268e+02 2.482e+02 4.845e+02, threshold=4.536e+02, percent-clipped=0.0 2024-09-14 23:23:28,587 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-09-14 23:24:02,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=199432.66666666666, ans=0.0 2024-09-14 23:24:06,555 INFO [train.py:1198] (0/2) Epoch 12, batch 100, loss[loss=0.2994, ctc_loss=0.2107, cr_loss=0.4431, over 20969.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.183, cr_loss=0.397, over 1616298.00 frames. ], batch size: 64, lr: 7.17e-03, grad_scale: 32.0 2024-09-14 23:24:08,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=199461.0, ans=0.0 2024-09-14 23:24:14,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=199461.0, ans=0.1 2024-09-14 23:24:21,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=199489.33333333334, ans=0.125 2024-09-14 23:24:35,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=199517.66666666666, ans=0.0 2024-09-14 23:24:58,215 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-14 23:25:24,605 INFO [train.py:1198] (0/2) Epoch 12, batch 150, loss[loss=0.284, ctc_loss=0.1999, cr_loss=0.4206, over 20972.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.1845, cr_loss=0.3985, over 2158648.75 frames. ], batch size: 64, lr: 7.17e-03, grad_scale: 32.0 2024-09-14 23:26:00,581 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.665e+02 2.051e+02 2.164e+02 2.318e+02 3.095e+02, threshold=4.327e+02, percent-clipped=0.0 2024-09-14 23:26:13,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=199687.66666666666, ans=0.2 2024-09-14 23:26:39,418 INFO [train.py:1198] (0/2) Epoch 12, batch 200, loss[loss=0.3018, ctc_loss=0.2099, cr_loss=0.4595, over 21007.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1824, cr_loss=0.3965, over 2586144.51 frames. ], batch size: 63, lr: 7.16e-03, grad_scale: 32.0 2024-09-14 23:26:53,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=199772.66666666666, ans=0.0 2024-09-14 23:27:01,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2024-09-14 23:27:08,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=199801.0, ans=0.125 2024-09-14 23:27:58,189 INFO [train.py:1198] (0/2) Epoch 12, batch 250, loss[loss=0.2733, ctc_loss=0.1912, cr_loss=0.4108, over 20852.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1815, cr_loss=0.3958, over 2937500.68 frames. ], batch size: 65, lr: 7.16e-03, grad_scale: 32.0 2024-09-14 23:28:01,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=199886.0, ans=0.025 2024-09-14 23:28:34,831 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.766e+02 2.052e+02 2.123e+02 2.278e+02 3.665e+02, threshold=4.245e+02, percent-clipped=0.0 2024-09-14 23:29:01,787 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-14 23:29:14,130 INFO [train.py:1198] (0/2) Epoch 12, batch 300, loss[loss=0.2507, ctc_loss=0.1726, cr_loss=0.3907, over 21047.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1811, cr_loss=0.3944, over 3193093.74 frames. ], batch size: 63, lr: 7.16e-03, grad_scale: 32.0 2024-09-14 23:29:19,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=200027.66666666666, ans=0.125 2024-09-14 23:29:19,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=200027.66666666666, ans=0.0 2024-09-14 23:29:37,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=200056.0, ans=0.0 2024-09-14 23:29:40,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=22.5 2024-09-14 23:29:48,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.09 vs. limit=22.5 2024-09-14 23:30:15,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=200141.0, ans=0.07 2024-09-14 23:30:31,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=22.5 2024-09-14 23:30:33,246 INFO [train.py:1198] (0/2) Epoch 12, batch 350, loss[loss=0.2779, ctc_loss=0.1918, cr_loss=0.4304, over 20696.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.181, cr_loss=0.394, over 3398078.02 frames. ], batch size: 68, lr: 7.16e-03, grad_scale: 32.0 2024-09-14 23:30:33,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=200169.33333333334, ans=0.125 2024-09-14 23:30:46,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2024-09-14 23:31:08,885 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.048e+02 2.277e+02 2.512e+02 3.323e+02, threshold=4.553e+02, percent-clipped=0.0 2024-09-14 23:31:11,494 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-09-14 23:31:30,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2024-09-14 23:31:45,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=200282.66666666666, ans=0.09899494936611666 2024-09-14 23:31:47,888 INFO [train.py:1198] (0/2) Epoch 12, batch 400, loss[loss=0.2345, ctc_loss=0.1644, cr_loss=0.3501, over 20965.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1804, cr_loss=0.393, over 3553990.26 frames. ], batch size: 51, lr: 7.15e-03, grad_scale: 32.0 2024-09-14 23:33:02,960 INFO [train.py:1198] (0/2) Epoch 12, batch 450, loss[loss=0.2812, ctc_loss=0.1966, cr_loss=0.423, over 21031.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1811, cr_loss=0.3939, over 3674780.47 frames. ], batch size: 63, lr: 7.15e-03, grad_scale: 32.0 2024-09-14 23:33:30,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=200481.0, ans=0.2 2024-09-14 23:33:32,320 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.67 vs. limit=10.0 2024-09-14 23:33:42,258 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.085e+02 2.237e+02 2.396e+02 3.420e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-14 23:33:46,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=15.0 2024-09-14 23:34:21,389 INFO [train.py:1198] (0/2) Epoch 12, batch 500, loss[loss=0.2504, ctc_loss=0.1744, cr_loss=0.3796, over 21069.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1817, cr_loss=0.3949, over 3772243.48 frames. ], batch size: 56, lr: 7.15e-03, grad_scale: 32.0 2024-09-14 23:34:36,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=200622.66666666666, ans=0.1 2024-09-14 23:34:50,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.98 vs. limit=10.0 2024-09-14 23:35:36,074 INFO [train.py:1198] (0/2) Epoch 12, batch 550, loss[loss=0.2523, ctc_loss=0.1704, cr_loss=0.4096, over 21056.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1816, cr_loss=0.3954, over 3843977.11 frames. ], batch size: 56, lr: 7.15e-03, grad_scale: 32.0 2024-09-14 23:36:15,276 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.011e+02 2.141e+02 2.400e+02 3.199e+02, threshold=4.282e+02, percent-clipped=0.0 2024-09-14 23:36:30,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=200821.0, ans=0.0 2024-09-14 23:36:31,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=200821.0, ans=0.125 2024-09-14 23:36:54,619 INFO [train.py:1198] (0/2) Epoch 12, batch 600, loss[loss=0.2573, ctc_loss=0.1799, cr_loss=0.3871, over 21046.00 frames. ], tot_loss[loss=0.2604, ctc_loss=0.1814, cr_loss=0.3952, over 3901384.29 frames. ], batch size: 62, lr: 7.14e-03, grad_scale: 32.0 2024-09-14 23:37:06,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-09-14 23:37:29,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=200934.33333333334, ans=0.5 2024-09-14 23:37:58,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=200991.0, ans=0.0 2024-09-14 23:38:01,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=200991.0, ans=0.0 2024-09-14 23:38:10,064 INFO [train.py:1198] (0/2) Epoch 12, batch 650, loss[loss=0.2782, ctc_loss=0.1909, cr_loss=0.4366, over 20870.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1805, cr_loss=0.3945, over 3953898.09 frames. ], batch size: 57, lr: 7.14e-03, grad_scale: 32.0 2024-09-14 23:38:33,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2024-09-14 23:38:46,439 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.028e+02 2.221e+02 2.579e+02 3.998e+02, threshold=4.442e+02, percent-clipped=0.0 2024-09-14 23:38:54,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=201104.33333333334, ans=0.125 2024-09-14 23:39:00,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=201104.33333333334, ans=0.125 2024-09-14 23:39:02,014 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 23:39:04,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=201104.33333333334, ans=0.125 2024-09-14 23:39:07,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=201104.33333333334, ans=0.0 2024-09-14 23:39:28,730 INFO [train.py:1198] (0/2) Epoch 12, batch 700, loss[loss=0.2565, ctc_loss=0.1811, cr_loss=0.3772, over 20923.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1805, cr_loss=0.3939, over 3978288.87 frames. ], batch size: 60, lr: 7.14e-03, grad_scale: 32.0 2024-09-14 23:39:32,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=201161.0, ans=0.0 2024-09-14 23:39:40,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2024-09-14 23:39:42,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=201189.33333333334, ans=0.125 2024-09-14 23:39:45,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=201189.33333333334, ans=0.2 2024-09-14 23:39:54,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=201189.33333333334, ans=0.125 2024-09-14 23:40:08,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=201217.66666666666, ans=0.125 2024-09-14 23:40:09,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=201217.66666666666, ans=0.05 2024-09-14 23:40:34,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=201274.33333333334, ans=0.125 2024-09-14 23:40:44,471 INFO [train.py:1198] (0/2) Epoch 12, batch 750, loss[loss=0.3066, ctc_loss=0.211, cr_loss=0.4779, over 20679.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1812, cr_loss=0.395, over 4012291.82 frames. ], batch size: 66, lr: 7.14e-03, grad_scale: 32.0 2024-09-14 23:41:20,638 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.117e+02 2.254e+02 2.486e+02 3.486e+02, threshold=4.508e+02, percent-clipped=0.0 2024-09-14 23:41:25,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=201359.33333333334, ans=0.125 2024-09-14 23:41:51,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=201416.0, ans=0.2 2024-09-14 23:42:02,821 INFO [train.py:1198] (0/2) Epoch 12, batch 800, loss[loss=0.2372, ctc_loss=0.1621, cr_loss=0.3755, over 20874.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1819, cr_loss=0.3964, over 4031271.30 frames. ], batch size: 54, lr: 7.13e-03, grad_scale: 32.0 2024-09-14 23:42:12,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=201444.33333333334, ans=0.0 2024-09-14 23:42:19,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=201472.66666666666, ans=0.0 2024-09-14 23:42:22,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=201472.66666666666, ans=0.125 2024-09-14 23:42:24,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2024-09-14 23:42:53,173 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-09-14 23:43:03,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=201557.66666666666, ans=0.125 2024-09-14 23:43:03,548 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 23:43:17,831 INFO [train.py:1198] (0/2) Epoch 12, batch 850, loss[loss=0.26, ctc_loss=0.18, cr_loss=0.3999, over 21035.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1819, cr_loss=0.3965, over 4043600.85 frames. ], batch size: 61, lr: 7.13e-03, grad_scale: 32.0 2024-09-14 23:43:21,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=201586.0, ans=0.125 2024-09-14 23:43:55,538 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.028e+02 2.126e+02 2.326e+02 4.010e+02, threshold=4.252e+02, percent-clipped=0.0 2024-09-14 23:44:08,101 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-14 23:44:33,266 INFO [train.py:1198] (0/2) Epoch 12, batch 900, loss[loss=0.208, ctc_loss=0.1427, cr_loss=0.3265, over 20970.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1816, cr_loss=0.3962, over 4052256.11 frames. ], batch size: 48, lr: 7.13e-03, grad_scale: 16.0 2024-09-14 23:44:47,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=201727.66666666666, ans=0.125 2024-09-14 23:44:50,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=201756.0, ans=0.125 2024-09-14 23:44:59,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-14 23:45:14,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=201784.33333333334, ans=0.125 2024-09-14 23:45:22,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2024-09-14 23:45:49,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=201841.0, ans=0.025 2024-09-14 23:45:51,687 INFO [train.py:1198] (0/2) Epoch 12, batch 950, loss[loss=0.2696, ctc_loss=0.1892, cr_loss=0.4023, over 20770.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1812, cr_loss=0.3953, over 4064181.43 frames. ], batch size: 56, lr: 7.13e-03, grad_scale: 16.0 2024-09-14 23:46:06,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.35 vs. limit=15.0 2024-09-14 23:46:29,252 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.059e+02 2.168e+02 2.366e+02 3.474e+02, threshold=4.336e+02, percent-clipped=0.0 2024-09-14 23:46:34,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-09-14 23:46:52,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-09-14 23:46:58,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=201982.66666666666, ans=0.125 2024-09-14 23:46:59,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=201982.66666666666, ans=10.0 2024-09-14 23:47:06,714 INFO [train.py:1198] (0/2) Epoch 12, batch 1000, loss[loss=0.2933, ctc_loss=0.2082, cr_loss=0.4257, over 18209.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.181, cr_loss=0.3945, over 4068485.50 frames. ], batch size: 108, lr: 7.12e-03, grad_scale: 16.0 2024-09-14 23:47:08,942 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-14 23:47:17,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=202011.0, ans=0.2 2024-09-14 23:47:44,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=202067.66666666666, ans=0.0 2024-09-14 23:48:00,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2024-09-14 23:48:19,419 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-14 23:48:22,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=202124.33333333334, ans=0.125 2024-09-14 23:48:24,971 INFO [train.py:1198] (0/2) Epoch 12, batch 1050, loss[loss=0.2778, ctc_loss=0.1979, cr_loss=0.3996, over 19482.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1806, cr_loss=0.3937, over 4071416.24 frames. ], batch size: 90, lr: 7.12e-03, grad_scale: 16.0 2024-09-14 23:49:02,330 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.055e+02 2.178e+02 2.434e+02 4.992e+02, threshold=4.356e+02, percent-clipped=1.0 2024-09-14 23:49:38,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=202294.33333333334, ans=0.125 2024-09-14 23:49:39,785 INFO [train.py:1198] (0/2) Epoch 12, batch 1100, loss[loss=0.2562, ctc_loss=0.1765, cr_loss=0.3985, over 20775.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1807, cr_loss=0.3938, over 4074001.95 frames. ], batch size: 56, lr: 7.12e-03, grad_scale: 16.0 2024-09-14 23:50:12,173 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-09-14 23:50:46,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=202407.66666666666, ans=0.125 2024-09-14 23:50:51,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-14 23:50:57,992 INFO [train.py:1198] (0/2) Epoch 12, batch 1150, loss[loss=0.2525, ctc_loss=0.1731, cr_loss=0.397, over 20887.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1815, cr_loss=0.3943, over 4075615.05 frames. ], batch size: 54, lr: 7.12e-03, grad_scale: 16.0 2024-09-14 23:51:01,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=202436.0, ans=0.0 2024-09-14 23:51:12,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=22.5 2024-09-14 23:51:35,470 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.071e+02 2.230e+02 2.568e+02 3.703e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-14 23:51:37,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.69 vs. limit=10.0 2024-09-14 23:51:39,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=202492.66666666666, ans=0.2 2024-09-14 23:51:46,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=202521.0, ans=0.025 2024-09-14 23:51:51,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=202521.0, ans=0.125 2024-09-14 23:51:54,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202521.0, ans=0.1 2024-09-14 23:51:54,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=202521.0, ans=0.125 2024-09-14 23:52:13,615 INFO [train.py:1198] (0/2) Epoch 12, batch 1200, loss[loss=0.2765, ctc_loss=0.1903, cr_loss=0.4309, over 21062.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1818, cr_loss=0.3949, over 4084069.97 frames. ], batch size: 62, lr: 7.11e-03, grad_scale: 32.0 2024-09-14 23:52:15,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=202577.66666666666, ans=0.0 2024-09-14 23:52:18,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=202577.66666666666, ans=0.125 2024-09-14 23:52:39,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=202606.0, ans=0.0 2024-09-14 23:52:45,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=202634.33333333334, ans=0.125 2024-09-14 23:52:46,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=202634.33333333334, ans=0.0 2024-09-14 23:52:55,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.37 vs. limit=10.0 2024-09-14 23:53:15,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2024-09-14 23:53:23,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202691.0, ans=0.1 2024-09-14 23:53:31,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2024-09-14 23:53:32,070 INFO [train.py:1198] (0/2) Epoch 12, batch 1250, loss[loss=0.256, ctc_loss=0.177, cr_loss=0.3947, over 20787.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1811, cr_loss=0.3946, over 4090574.23 frames. ], batch size: 53, lr: 7.11e-03, grad_scale: 32.0 2024-09-14 23:53:39,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=202719.33333333334, ans=0.125 2024-09-14 23:53:44,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202719.33333333334, ans=0.1 2024-09-14 23:54:10,035 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.054e+02 2.172e+02 2.318e+02 4.479e+02, threshold=4.343e+02, percent-clipped=1.0 2024-09-14 23:54:15,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=15.0 2024-09-14 23:54:19,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-14 23:54:27,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=202804.33333333334, ans=0.025 2024-09-14 23:54:42,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=202832.66666666666, ans=0.1 2024-09-14 23:54:47,842 INFO [train.py:1198] (0/2) Epoch 12, batch 1300, loss[loss=0.2799, ctc_loss=0.1932, cr_loss=0.4334, over 20663.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.1801, cr_loss=0.3931, over 4101765.12 frames. ], batch size: 68, lr: 7.11e-03, grad_scale: 32.0 2024-09-14 23:54:49,953 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.765e-02 2024-09-14 23:56:03,991 INFO [train.py:1198] (0/2) Epoch 12, batch 1350, loss[loss=0.2737, ctc_loss=0.1919, cr_loss=0.4093, over 20728.00 frames. ], tot_loss[loss=0.2604, ctc_loss=0.1814, cr_loss=0.395, over 4098014.34 frames. ], batch size: 71, lr: 7.11e-03, grad_scale: 32.0 2024-09-14 23:56:44,804 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.112e+02 2.313e+02 2.522e+02 3.662e+02, threshold=4.625e+02, percent-clipped=0.0 2024-09-14 23:56:56,446 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2024-09-14 23:57:22,837 INFO [train.py:1198] (0/2) Epoch 12, batch 1400, loss[loss=0.2732, ctc_loss=0.1941, cr_loss=0.3954, over 21052.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.181, cr_loss=0.3948, over 4107951.61 frames. ], batch size: 62, lr: 7.10e-03, grad_scale: 32.0 2024-09-14 23:57:41,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=203172.66666666666, ans=0.1 2024-09-14 23:58:15,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=22.5 2024-09-14 23:58:38,412 INFO [train.py:1198] (0/2) Epoch 12, batch 1450, loss[loss=0.2921, ctc_loss=0.2058, cr_loss=0.4315, over 20662.00 frames. ], tot_loss[loss=0.2604, ctc_loss=0.1813, cr_loss=0.3951, over 4100479.05 frames. ], batch size: 68, lr: 7.10e-03, grad_scale: 32.0 2024-09-14 23:59:05,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=203314.33333333334, ans=0.125 2024-09-14 23:59:13,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=203342.66666666666, ans=0.2 2024-09-14 23:59:19,277 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.029e+02 2.186e+02 2.387e+02 3.305e+02, threshold=4.372e+02, percent-clipped=0.0 2024-09-14 23:59:54,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=203399.33333333334, ans=0.0 2024-09-14 23:59:57,360 INFO [train.py:1198] (0/2) Epoch 12, batch 1500, loss[loss=0.28, ctc_loss=0.1949, cr_loss=0.4253, over 20727.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.181, cr_loss=0.3941, over 4089459.52 frames. ], batch size: 71, lr: 7.10e-03, grad_scale: 32.0 2024-09-15 00:01:06,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=203541.0, ans=0.125 2024-09-15 00:01:12,180 INFO [train.py:1198] (0/2) Epoch 12, batch 1550, loss[loss=0.2494, ctc_loss=0.1729, cr_loss=0.3823, over 20880.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1808, cr_loss=0.3942, over 4088494.11 frames. ], batch size: 57, lr: 7.10e-03, grad_scale: 32.0 2024-09-15 00:01:15,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=203569.33333333334, ans=0.025 2024-09-15 00:01:40,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=203626.0, ans=0.5 2024-09-15 00:01:49,468 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.133e+02 2.264e+02 2.612e+02 3.617e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-15 00:01:59,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=203654.33333333334, ans=10.0 2024-09-15 00:01:59,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.40 vs. limit=15.0 2024-09-15 00:02:02,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=203654.33333333334, ans=0.2 2024-09-15 00:02:04,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=203654.33333333334, ans=0.125 2024-09-15 00:02:09,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=203654.33333333334, ans=0.125 2024-09-15 00:02:26,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=203682.66666666666, ans=0.125 2024-09-15 00:02:28,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.09 vs. limit=10.0 2024-09-15 00:02:30,961 INFO [train.py:1198] (0/2) Epoch 12, batch 1600, loss[loss=0.2564, ctc_loss=0.1755, cr_loss=0.4047, over 21032.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1817, cr_loss=0.3951, over 4084446.15 frames. ], batch size: 62, lr: 7.09e-03, grad_scale: 32.0 2024-09-15 00:02:34,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.23 vs. limit=10.0 2024-09-15 00:03:12,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-15 00:03:25,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=203796.0, ans=0.125 2024-09-15 00:03:34,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=203824.33333333334, ans=0.025 2024-09-15 00:03:34,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=203824.33333333334, ans=0.1 2024-09-15 00:03:43,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-15 00:03:45,915 INFO [train.py:1198] (0/2) Epoch 12, batch 1650, loss[loss=0.2276, ctc_loss=0.1524, cr_loss=0.3757, over 20965.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1823, cr_loss=0.397, over 4089964.03 frames. ], batch size: 48, lr: 7.09e-03, grad_scale: 32.0 2024-09-15 00:04:14,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=203909.33333333334, ans=0.5 2024-09-15 00:04:23,526 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.121e+02 2.258e+02 2.449e+02 4.454e+02, threshold=4.517e+02, percent-clipped=0.0 2024-09-15 00:04:50,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=15.0 2024-09-15 00:05:04,406 INFO [train.py:1198] (0/2) Epoch 12, batch 1700, loss[loss=0.3302, ctc_loss=0.2371, cr_loss=0.4653, over 18274.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1822, cr_loss=0.396, over 4069681.60 frames. ], batch size: 108, lr: 7.09e-03, grad_scale: 32.0 2024-09-15 00:05:06,107 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-72000.pt 2024-09-15 00:05:32,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=204022.66666666666, ans=0.125 2024-09-15 00:05:32,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=204022.66666666666, ans=0.125 2024-09-15 00:05:34,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=204051.0, ans=0.125 2024-09-15 00:05:36,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204051.0, ans=0.1 2024-09-15 00:06:20,318 INFO [train.py:1198] (0/2) Epoch 12, batch 1750, loss[loss=0.25, ctc_loss=0.1758, cr_loss=0.3708, over 20937.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.182, cr_loss=0.3958, over 4074609.84 frames. ], batch size: 60, lr: 7.09e-03, grad_scale: 32.0 2024-09-15 00:06:22,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=204136.0, ans=0.025 2024-09-15 00:06:33,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=204136.0, ans=0.125 2024-09-15 00:06:34,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=204164.33333333334, ans=0.0 2024-09-15 00:06:57,908 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.094e+02 2.312e+02 2.586e+02 3.910e+02, threshold=4.625e+02, percent-clipped=0.0 2024-09-15 00:07:17,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=204221.0, ans=0.0 2024-09-15 00:07:38,291 INFO [train.py:1198] (0/2) Epoch 12, batch 1800, loss[loss=0.2723, ctc_loss=0.1944, cr_loss=0.3892, over 20968.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1823, cr_loss=0.3965, over 4081080.82 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 32.0 2024-09-15 00:07:42,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2024-09-15 00:08:13,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=204334.33333333334, ans=0.125 2024-09-15 00:08:22,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=204362.66666666666, ans=0.125 2024-09-15 00:08:53,807 INFO [train.py:1198] (0/2) Epoch 12, batch 1850, loss[loss=0.2983, ctc_loss=0.213, cr_loss=0.4262, over 19359.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1824, cr_loss=0.3965, over 4087095.78 frames. ], batch size: 90, lr: 7.08e-03, grad_scale: 32.0 2024-09-15 00:09:27,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=204476.0, ans=0.2 2024-09-15 00:09:31,925 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.050e+02 2.188e+02 2.385e+02 3.212e+02, threshold=4.377e+02, percent-clipped=0.0 2024-09-15 00:09:32,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2024-09-15 00:09:52,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=204504.33333333334, ans=0.125 2024-09-15 00:10:07,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=204532.66666666666, ans=0.125 2024-09-15 00:10:10,089 INFO [train.py:1198] (0/2) Epoch 12, batch 1900, loss[loss=0.258, ctc_loss=0.1803, cr_loss=0.3886, over 20345.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1807, cr_loss=0.3944, over 4096926.10 frames. ], batch size: 74, lr: 7.08e-03, grad_scale: 32.0 2024-09-15 00:10:42,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=204617.66666666666, ans=0.125 2024-09-15 00:11:02,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=204646.0, ans=0.125 2024-09-15 00:11:20,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=204674.33333333334, ans=0.125 2024-09-15 00:11:29,586 INFO [train.py:1198] (0/2) Epoch 12, batch 1950, loss[loss=0.2925, ctc_loss=0.2043, cr_loss=0.4412, over 20704.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1807, cr_loss=0.3956, over 4107609.99 frames. ], batch size: 71, lr: 7.08e-03, grad_scale: 32.0 2024-09-15 00:11:43,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=204731.0, ans=0.025 2024-09-15 00:12:07,529 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.713e+02 2.027e+02 2.133e+02 2.345e+02 3.140e+02, threshold=4.266e+02, percent-clipped=0.0 2024-09-15 00:12:09,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=204759.33333333334, ans=0.2 2024-09-15 00:12:26,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=204787.66666666666, ans=0.025 2024-09-15 00:12:44,724 INFO [train.py:1198] (0/2) Epoch 12, batch 2000, loss[loss=0.2418, ctc_loss=0.1685, cr_loss=0.3664, over 20955.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1799, cr_loss=0.3934, over 4094907.72 frames. ], batch size: 49, lr: 7.08e-03, grad_scale: 32.0 2024-09-15 00:12:49,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204844.33333333334, ans=0.1 2024-09-15 00:13:08,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.87 vs. limit=12.0 2024-09-15 00:13:47,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=204957.66666666666, ans=10.0 2024-09-15 00:14:02,599 INFO [train.py:1198] (0/2) Epoch 12, batch 2050, loss[loss=0.2781, ctc_loss=0.1948, cr_loss=0.4165, over 20997.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1797, cr_loss=0.3919, over 4076422.13 frames. ], batch size: 63, lr: 7.07e-03, grad_scale: 32.0 2024-09-15 00:14:02,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=204986.0, ans=0.125 2024-09-15 00:14:33,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=205042.66666666666, ans=0.125 2024-09-15 00:14:39,886 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.109e+02 2.263e+02 2.516e+02 3.982e+02, threshold=4.525e+02, percent-clipped=0.0 2024-09-15 00:15:09,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=15.0 2024-09-15 00:15:11,795 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 00:15:17,500 INFO [train.py:1198] (0/2) Epoch 12, batch 2100, loss[loss=0.2331, ctc_loss=0.1611, cr_loss=0.3601, over 20974.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1807, cr_loss=0.3933, over 4087576.08 frames. ], batch size: 50, lr: 7.07e-03, grad_scale: 32.0 2024-09-15 00:15:31,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-15 00:16:05,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=205212.66666666666, ans=0.125 2024-09-15 00:16:35,124 INFO [train.py:1198] (0/2) Epoch 12, batch 2150, loss[loss=0.251, ctc_loss=0.1711, cr_loss=0.3993, over 21085.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1811, cr_loss=0.3942, over 4084893.80 frames. ], batch size: 59, lr: 7.07e-03, grad_scale: 32.0 2024-09-15 00:16:57,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-15 00:17:04,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=205326.0, ans=0.0 2024-09-15 00:17:12,275 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.124e+02 2.316e+02 2.764e+02 4.368e+02, threshold=4.631e+02, percent-clipped=0.0 2024-09-15 00:17:21,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=205354.33333333334, ans=0.0 2024-09-15 00:17:27,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=205354.33333333334, ans=0.0 2024-09-15 00:17:38,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=205382.66666666666, ans=0.125 2024-09-15 00:17:42,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=205382.66666666666, ans=0.125 2024-09-15 00:17:44,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=205382.66666666666, ans=0.1 2024-09-15 00:17:49,759 INFO [train.py:1198] (0/2) Epoch 12, batch 2200, loss[loss=0.2594, ctc_loss=0.1845, cr_loss=0.3745, over 21062.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1818, cr_loss=0.396, over 4080891.18 frames. ], batch size: 59, lr: 7.07e-03, grad_scale: 32.0 2024-09-15 00:18:02,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=205411.0, ans=0.0 2024-09-15 00:18:46,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=205496.0, ans=0.125 2024-09-15 00:19:08,735 INFO [train.py:1198] (0/2) Epoch 12, batch 2250, loss[loss=0.3393, ctc_loss=0.2503, cr_loss=0.4451, over 13938.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1807, cr_loss=0.3944, over 4085979.63 frames. ], batch size: 150, lr: 7.06e-03, grad_scale: 32.0 2024-09-15 00:19:21,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=205552.66666666666, ans=0.125 2024-09-15 00:19:46,626 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.721e+02 2.049e+02 2.205e+02 2.394e+02 3.210e+02, threshold=4.410e+02, percent-clipped=0.0 2024-09-15 00:19:48,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=205609.33333333334, ans=0.0 2024-09-15 00:20:09,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=205666.0, ans=0.125 2024-09-15 00:20:10,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2024-09-15 00:20:24,623 INFO [train.py:1198] (0/2) Epoch 12, batch 2300, loss[loss=0.2811, ctc_loss=0.2003, cr_loss=0.4042, over 20845.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1806, cr_loss=0.3944, over 4085004.75 frames. ], batch size: 65, lr: 7.06e-03, grad_scale: 32.0 2024-09-15 00:20:34,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=205694.33333333334, ans=0.125 2024-09-15 00:20:49,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=205722.66666666666, ans=0.025 2024-09-15 00:20:53,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=205751.0, ans=0.125 2024-09-15 00:20:56,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=205751.0, ans=0.1 2024-09-15 00:21:22,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=205779.33333333334, ans=0.125 2024-09-15 00:21:25,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=205807.66666666666, ans=0.1 2024-09-15 00:21:33,172 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 00:21:33,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=205807.66666666666, ans=0.125 2024-09-15 00:21:43,053 INFO [train.py:1198] (0/2) Epoch 12, batch 2350, loss[loss=0.2906, ctc_loss=0.2046, cr_loss=0.4299, over 20874.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.181, cr_loss=0.3948, over 4075667.26 frames. ], batch size: 65, lr: 7.06e-03, grad_scale: 32.0 2024-09-15 00:22:07,996 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-09-15 00:22:08,304 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-15 00:22:20,846 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.085e+02 2.225e+02 2.542e+02 4.079e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-15 00:22:42,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=205949.33333333334, ans=0.125 2024-09-15 00:22:58,597 INFO [train.py:1198] (0/2) Epoch 12, batch 2400, loss[loss=0.2241, ctc_loss=0.1511, cr_loss=0.3653, over 20950.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1803, cr_loss=0.3943, over 4096648.96 frames. ], batch size: 48, lr: 7.06e-03, grad_scale: 32.0 2024-09-15 00:23:02,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=205977.66666666666, ans=0.2 2024-09-15 00:24:04,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=206091.0, ans=0.0 2024-09-15 00:24:05,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=206091.0, ans=0.2 2024-09-15 00:24:14,626 INFO [train.py:1198] (0/2) Epoch 12, batch 2450, loss[loss=0.2555, ctc_loss=0.1781, cr_loss=0.387, over 21084.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1807, cr_loss=0.3948, over 4097698.50 frames. ], batch size: 59, lr: 7.05e-03, grad_scale: 32.0 2024-09-15 00:24:39,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=22.5 2024-09-15 00:24:49,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=206176.0, ans=0.0 2024-09-15 00:24:55,290 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.068e+02 2.363e+02 2.661e+02 5.748e+02, threshold=4.727e+02, percent-clipped=1.0 2024-09-15 00:25:15,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=12.0 2024-09-15 00:25:24,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2024-09-15 00:25:32,785 INFO [train.py:1198] (0/2) Epoch 12, batch 2500, loss[loss=0.2898, ctc_loss=0.2061, cr_loss=0.4187, over 20973.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1802, cr_loss=0.3936, over 4091507.37 frames. ], batch size: 58, lr: 7.05e-03, grad_scale: 32.0 2024-09-15 00:25:46,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206289.33333333334, ans=0.1 2024-09-15 00:26:02,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.91 vs. limit=15.0 2024-09-15 00:26:23,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=206346.0, ans=0.0 2024-09-15 00:26:36,142 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2024-09-15 00:26:46,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2024-09-15 00:26:47,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=206402.66666666666, ans=0.0 2024-09-15 00:26:48,438 INFO [train.py:1198] (0/2) Epoch 12, batch 2550, loss[loss=0.2971, ctc_loss=0.2069, cr_loss=0.4507, over 20965.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.1799, cr_loss=0.3937, over 4098689.67 frames. ], batch size: 64, lr: 7.05e-03, grad_scale: 32.0 2024-09-15 00:26:50,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=206402.66666666666, ans=0.2 2024-09-15 00:27:00,723 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 00:27:26,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=206459.33333333334, ans=0.2 2024-09-15 00:27:29,057 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.032e+02 2.192e+02 2.408e+02 3.885e+02, threshold=4.384e+02, percent-clipped=0.0 2024-09-15 00:27:45,205 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-09-15 00:28:07,139 INFO [train.py:1198] (0/2) Epoch 12, batch 2600, loss[loss=0.267, ctc_loss=0.185, cr_loss=0.41, over 20238.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1796, cr_loss=0.3927, over 4103527.96 frames. ], batch size: 74, lr: 7.05e-03, grad_scale: 32.0 2024-09-15 00:28:18,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=206544.33333333334, ans=0.95 2024-09-15 00:28:20,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2024-09-15 00:28:45,528 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 00:29:14,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=206657.66666666666, ans=0.0 2024-09-15 00:29:23,315 INFO [train.py:1198] (0/2) Epoch 12, batch 2650, loss[loss=0.2647, ctc_loss=0.1849, cr_loss=0.3987, over 20908.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1786, cr_loss=0.3914, over 4106424.06 frames. ], batch size: 54, lr: 7.04e-03, grad_scale: 32.0 2024-09-15 00:29:34,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=206686.0, ans=0.0 2024-09-15 00:30:01,255 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.717e+02 1.967e+02 2.078e+02 2.242e+02 3.678e+02, threshold=4.157e+02, percent-clipped=0.0 2024-09-15 00:30:14,158 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.38 vs. limit=10.0 2024-09-15 00:30:25,813 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=22.5 2024-09-15 00:30:41,652 INFO [train.py:1198] (0/2) Epoch 12, batch 2700, loss[loss=0.2733, ctc_loss=0.1905, cr_loss=0.4142, over 19355.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.179, cr_loss=0.3924, over 4106277.10 frames. ], batch size: 90, lr: 7.04e-03, grad_scale: 32.0 2024-09-15 00:30:52,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=206827.66666666666, ans=0.125 2024-09-15 00:30:55,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=206856.0, ans=0.2 2024-09-15 00:31:07,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=206856.0, ans=15.0 2024-09-15 00:31:08,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206856.0, ans=0.1 2024-09-15 00:31:37,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=206912.66666666666, ans=0.0 2024-09-15 00:31:56,749 INFO [train.py:1198] (0/2) Epoch 12, batch 2750, loss[loss=0.2139, ctc_loss=0.1458, cr_loss=0.3401, over 20991.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.18, cr_loss=0.3937, over 4104229.67 frames. ], batch size: 52, lr: 7.04e-03, grad_scale: 32.0 2024-09-15 00:32:19,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=206997.66666666666, ans=0.09899494936611666 2024-09-15 00:32:34,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.050e+02 2.149e+02 2.309e+02 3.718e+02, threshold=4.299e+02, percent-clipped=0.0 2024-09-15 00:32:41,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=207054.33333333334, ans=0.0 2024-09-15 00:32:51,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=207054.33333333334, ans=0.125 2024-09-15 00:33:15,746 INFO [train.py:1198] (0/2) Epoch 12, batch 2800, loss[loss=0.2837, ctc_loss=0.1981, cr_loss=0.4279, over 20670.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1808, cr_loss=0.3947, over 4090129.58 frames. ], batch size: 66, lr: 7.04e-03, grad_scale: 32.0 2024-09-15 00:33:16,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=207111.0, ans=0.035 2024-09-15 00:33:16,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.37 vs. limit=10.0 2024-09-15 00:33:35,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=207139.33333333334, ans=0.5 2024-09-15 00:34:10,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=207196.0, ans=0.125 2024-09-15 00:34:15,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=207224.33333333334, ans=0.0 2024-09-15 00:34:22,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=207224.33333333334, ans=0.125 2024-09-15 00:34:31,458 INFO [train.py:1198] (0/2) Epoch 12, batch 2850, loss[loss=0.2629, ctc_loss=0.1834, cr_loss=0.3979, over 21078.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.181, cr_loss=0.3952, over 4088670.27 frames. ], batch size: 59, lr: 7.03e-03, grad_scale: 32.0 2024-09-15 00:34:37,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=207252.66666666666, ans=0.2 2024-09-15 00:34:48,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=207281.0, ans=0.0 2024-09-15 00:35:08,979 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.052e+02 2.191e+02 2.360e+02 5.826e+02, threshold=4.382e+02, percent-clipped=1.0 2024-09-15 00:35:21,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.10 vs. limit=10.0 2024-09-15 00:35:38,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=207366.0, ans=0.125 2024-09-15 00:35:41,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=207366.0, ans=0.0 2024-09-15 00:35:45,881 INFO [train.py:1198] (0/2) Epoch 12, batch 2900, loss[loss=0.2764, ctc_loss=0.1937, cr_loss=0.4139, over 20879.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1813, cr_loss=0.3958, over 4088316.67 frames. ], batch size: 57, lr: 7.03e-03, grad_scale: 64.0 2024-09-15 00:37:04,498 INFO [train.py:1198] (0/2) Epoch 12, batch 2950, loss[loss=0.2225, ctc_loss=0.1494, cr_loss=0.3654, over 20989.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1805, cr_loss=0.3956, over 4103494.51 frames. ], batch size: 52, lr: 7.03e-03, grad_scale: 64.0 2024-09-15 00:37:42,346 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.060e+02 2.201e+02 2.426e+02 3.999e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-15 00:37:45,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=207592.66666666666, ans=0.125 2024-09-15 00:38:19,893 INFO [train.py:1198] (0/2) Epoch 12, batch 3000, loss[loss=0.2372, ctc_loss=0.1618, cr_loss=0.3775, over 21020.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1809, cr_loss=0.396, over 4098909.09 frames. ], batch size: 63, lr: 7.03e-03, grad_scale: 64.0 2024-09-15 00:38:19,893 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 00:38:50,662 INFO [train.py:1230] (0/2) Epoch 12, validation: loss=0.04992, ctc_loss=0.04992, cr_loss=1.002e-14, over 944034.00 frames. 2024-09-15 00:38:50,663 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 00:39:12,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=207706.0, ans=0.125 2024-09-15 00:39:54,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=207791.0, ans=0.2 2024-09-15 00:40:05,821 INFO [train.py:1198] (0/2) Epoch 12, batch 3050, loss[loss=0.3187, ctc_loss=0.2264, cr_loss=0.4613, over 19370.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1809, cr_loss=0.3969, over 4097045.20 frames. ], batch size: 90, lr: 7.02e-03, grad_scale: 32.0 2024-09-15 00:40:09,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.55 vs. limit=15.0 2024-09-15 00:40:25,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=207847.66666666666, ans=0.125 2024-09-15 00:40:31,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=207847.66666666666, ans=0.125 2024-09-15 00:40:43,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=207876.0, ans=0.2 2024-09-15 00:40:44,810 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.018e+02 2.174e+02 2.362e+02 3.569e+02, threshold=4.348e+02, percent-clipped=0.0 2024-09-15 00:40:45,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=207876.0, ans=0.0 2024-09-15 00:41:20,720 INFO [train.py:1198] (0/2) Epoch 12, batch 3100, loss[loss=0.2312, ctc_loss=0.1598, cr_loss=0.3571, over 20949.00 frames. ], tot_loss[loss=0.2604, ctc_loss=0.1811, cr_loss=0.3965, over 4089989.28 frames. ], batch size: 49, lr: 7.02e-03, grad_scale: 32.0 2024-09-15 00:41:53,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=208017.66666666666, ans=0.125 2024-09-15 00:42:05,707 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=12.0 2024-09-15 00:42:39,349 INFO [train.py:1198] (0/2) Epoch 12, batch 3150, loss[loss=0.2486, ctc_loss=0.1701, cr_loss=0.3925, over 20996.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1813, cr_loss=0.3966, over 4087170.79 frames. ], batch size: 55, lr: 7.02e-03, grad_scale: 32.0 2024-09-15 00:42:39,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=208102.66666666666, ans=0.125 2024-09-15 00:43:17,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=208159.33333333334, ans=0.2 2024-09-15 00:43:18,476 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.077e+02 2.209e+02 2.364e+02 5.147e+02, threshold=4.419e+02, percent-clipped=2.0 2024-09-15 00:43:29,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=208187.66666666666, ans=0.1 2024-09-15 00:43:32,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=208187.66666666666, ans=0.125 2024-09-15 00:43:34,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=208187.66666666666, ans=0.0 2024-09-15 00:43:54,941 INFO [train.py:1198] (0/2) Epoch 12, batch 3200, loss[loss=0.2616, ctc_loss=0.1838, cr_loss=0.3892, over 20245.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1808, cr_loss=0.3948, over 4079466.97 frames. ], batch size: 74, lr: 7.02e-03, grad_scale: 32.0 2024-09-15 00:43:59,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=208244.33333333334, ans=0.04949747468305833 2024-09-15 00:44:17,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=208272.66666666666, ans=0.125 2024-09-15 00:44:33,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=208301.0, ans=0.1 2024-09-15 00:44:52,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=208329.33333333334, ans=0.0 2024-09-15 00:45:01,582 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 00:45:05,953 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 00:45:12,997 INFO [train.py:1198] (0/2) Epoch 12, batch 3250, loss[loss=0.3007, ctc_loss=0.2101, cr_loss=0.4529, over 19971.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1806, cr_loss=0.395, over 4075099.49 frames. ], batch size: 80, lr: 7.02e-03, grad_scale: 32.0 2024-09-15 00:45:38,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2024-09-15 00:45:52,316 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.024e+02 2.214e+02 2.411e+02 3.904e+02, threshold=4.428e+02, percent-clipped=0.0 2024-09-15 00:46:27,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-09-15 00:46:28,703 INFO [train.py:1198] (0/2) Epoch 12, batch 3300, loss[loss=0.2513, ctc_loss=0.1766, cr_loss=0.3733, over 19819.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1802, cr_loss=0.3939, over 4078309.00 frames. ], batch size: 44, lr: 7.01e-03, grad_scale: 32.0 2024-09-15 00:46:43,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=208556.0, ans=0.125 2024-09-15 00:46:44,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=208556.0, ans=0.125 2024-09-15 00:46:55,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-09-15 00:47:05,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=208584.33333333334, ans=0.0 2024-09-15 00:47:07,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=208584.33333333334, ans=0.125 2024-09-15 00:47:47,253 INFO [train.py:1198] (0/2) Epoch 12, batch 3350, loss[loss=0.2692, ctc_loss=0.1923, cr_loss=0.3847, over 19563.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1804, cr_loss=0.3937, over 4073325.29 frames. ], batch size: 90, lr: 7.01e-03, grad_scale: 32.0 2024-09-15 00:47:59,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=208669.33333333334, ans=0.125 2024-09-15 00:48:26,487 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.089e+02 2.360e+02 2.670e+02 4.104e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-15 00:49:02,684 INFO [train.py:1198] (0/2) Epoch 12, batch 3400, loss[loss=0.2148, ctc_loss=0.1469, cr_loss=0.3392, over 20970.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1808, cr_loss=0.3954, over 4080432.42 frames. ], batch size: 50, lr: 7.01e-03, grad_scale: 32.0 2024-09-15 00:49:15,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2024-09-15 00:49:24,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=208839.33333333334, ans=0.125 2024-09-15 00:49:33,778 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-15 00:49:45,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=208867.66666666666, ans=0.125 2024-09-15 00:49:45,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=208867.66666666666, ans=0.0 2024-09-15 00:49:49,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=208896.0, ans=0.2 2024-09-15 00:50:20,382 INFO [train.py:1198] (0/2) Epoch 12, batch 3450, loss[loss=0.2801, ctc_loss=0.1979, cr_loss=0.4112, over 20944.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1812, cr_loss=0.3962, over 4080314.32 frames. ], batch size: 60, lr: 7.01e-03, grad_scale: 32.0 2024-09-15 00:50:28,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=208952.66666666666, ans=0.125 2024-09-15 00:50:57,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.95 vs. limit=22.5 2024-09-15 00:50:59,506 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.051e+02 2.199e+02 2.429e+02 3.668e+02, threshold=4.398e+02, percent-clipped=0.0 2024-09-15 00:51:11,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=209037.66666666666, ans=0.2 2024-09-15 00:51:35,719 INFO [train.py:1198] (0/2) Epoch 12, batch 3500, loss[loss=0.2938, ctc_loss=0.2059, cr_loss=0.4397, over 20642.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1807, cr_loss=0.3958, over 4074133.61 frames. ], batch size: 68, lr: 7.00e-03, grad_scale: 32.0 2024-09-15 00:51:45,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=209094.33333333334, ans=0.0 2024-09-15 00:51:56,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=209122.66666666666, ans=0.125 2024-09-15 00:51:56,342 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.02 vs. limit=10.0 2024-09-15 00:52:51,409 INFO [train.py:1198] (0/2) Epoch 12, batch 3550, loss[loss=0.2754, ctc_loss=0.1941, cr_loss=0.4063, over 20660.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1797, cr_loss=0.3944, over 4088995.37 frames. ], batch size: 68, lr: 7.00e-03, grad_scale: 32.0 2024-09-15 00:53:03,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=209236.0, ans=0.1 2024-09-15 00:53:03,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=209236.0, ans=0.0 2024-09-15 00:53:32,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=209292.66666666666, ans=0.125 2024-09-15 00:53:33,417 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.060e+02 2.201e+02 2.397e+02 3.812e+02, threshold=4.402e+02, percent-clipped=0.0 2024-09-15 00:53:38,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=209321.0, ans=0.025 2024-09-15 00:54:08,885 INFO [train.py:1198] (0/2) Epoch 12, batch 3600, loss[loss=0.2594, ctc_loss=0.1773, cr_loss=0.4105, over 20938.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1799, cr_loss=0.3949, over 4095939.82 frames. ], batch size: 60, lr: 7.00e-03, grad_scale: 32.0 2024-09-15 00:54:34,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=209406.0, ans=0.125 2024-09-15 00:55:24,050 INFO [train.py:1198] (0/2) Epoch 12, batch 3650, loss[loss=0.2759, ctc_loss=0.1938, cr_loss=0.4107, over 20923.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1807, cr_loss=0.3963, over 4101190.33 frames. ], batch size: 60, lr: 7.00e-03, grad_scale: 32.0 2024-09-15 00:55:55,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=22.5 2024-09-15 00:56:06,625 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.703e+02 2.062e+02 2.222e+02 2.439e+02 4.103e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-15 00:56:14,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=209604.33333333334, ans=0.125 2024-09-15 00:56:26,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=209632.66666666666, ans=0.09899494936611666 2024-09-15 00:56:32,880 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=22.5 2024-09-15 00:56:38,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=209632.66666666666, ans=0.0 2024-09-15 00:56:42,768 INFO [train.py:1198] (0/2) Epoch 12, batch 3700, loss[loss=0.2593, ctc_loss=0.1796, cr_loss=0.3987, over 21015.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1804, cr_loss=0.3961, over 4104473.19 frames. ], batch size: 61, lr: 6.99e-03, grad_scale: 32.0 2024-09-15 00:56:52,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2024-09-15 00:57:09,625 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=15.0 2024-09-15 00:57:21,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.34 vs. limit=10.0 2024-09-15 00:57:25,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=209717.66666666666, ans=0.0 2024-09-15 00:57:36,099 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-09-15 00:57:57,963 INFO [train.py:1198] (0/2) Epoch 12, batch 3750, loss[loss=0.2718, ctc_loss=0.1858, cr_loss=0.4302, over 21058.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1804, cr_loss=0.3953, over 4096509.41 frames. ], batch size: 59, lr: 6.99e-03, grad_scale: 32.0 2024-09-15 00:57:58,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=22.5 2024-09-15 00:58:36,888 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.054e+02 2.173e+02 2.361e+02 3.984e+02, threshold=4.345e+02, percent-clipped=0.0 2024-09-15 00:58:40,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-09-15 00:58:50,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=209887.66666666666, ans=0.1 2024-09-15 00:59:15,879 INFO [train.py:1198] (0/2) Epoch 12, batch 3800, loss[loss=0.296, ctc_loss=0.2108, cr_loss=0.4257, over 20169.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.1798, cr_loss=0.3945, over 4104734.51 frames. ], batch size: 80, lr: 6.99e-03, grad_scale: 32.0 2024-09-15 00:59:29,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=209972.66666666666, ans=0.2 2024-09-15 00:59:58,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=210001.0, ans=0.2 2024-09-15 01:00:07,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=210029.33333333334, ans=0.0 2024-09-15 01:00:17,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=210057.66666666666, ans=0.125 2024-09-15 01:00:21,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-09-15 01:00:31,090 INFO [train.py:1198] (0/2) Epoch 12, batch 3850, loss[loss=0.2394, ctc_loss=0.1663, cr_loss=0.3655, over 20943.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1802, cr_loss=0.3945, over 4088565.20 frames. ], batch size: 50, lr: 6.99e-03, grad_scale: 32.0 2024-09-15 01:00:58,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=210114.33333333334, ans=0.1 2024-09-15 01:00:58,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=210114.33333333334, ans=0.0 2024-09-15 01:01:00,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=210142.66666666666, ans=0.2 2024-09-15 01:01:11,786 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.136e+02 2.330e+02 2.610e+02 4.569e+02, threshold=4.661e+02, percent-clipped=3.0 2024-09-15 01:01:37,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=210199.33333333334, ans=0.2 2024-09-15 01:01:49,199 INFO [train.py:1198] (0/2) Epoch 12, batch 3900, loss[loss=0.3274, ctc_loss=0.2412, cr_loss=0.4308, over 14240.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.18, cr_loss=0.3941, over 4088210.66 frames. ], batch size: 150, lr: 6.98e-03, grad_scale: 16.0 2024-09-15 01:02:25,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=210284.33333333334, ans=0.2 2024-09-15 01:02:27,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=210284.33333333334, ans=0.125 2024-09-15 01:02:46,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=210312.66666666666, ans=0.0 2024-09-15 01:02:50,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=210341.0, ans=0.125 2024-09-15 01:02:51,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=210341.0, ans=0.125 2024-09-15 01:03:03,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=210369.33333333334, ans=0.125 2024-09-15 01:03:04,602 INFO [train.py:1198] (0/2) Epoch 12, batch 3950, loss[loss=0.2473, ctc_loss=0.1711, cr_loss=0.3809, over 19888.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1792, cr_loss=0.3937, over 4097549.35 frames. ], batch size: 44, lr: 6.98e-03, grad_scale: 16.0 2024-09-15 01:03:36,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=210426.0, ans=10.0 2024-09-15 01:03:45,538 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.013e+02 2.157e+02 2.298e+02 3.651e+02, threshold=4.313e+02, percent-clipped=0.0 2024-09-15 01:04:05,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=210482.66666666666, ans=0.2 2024-09-15 01:04:20,210 INFO [train.py:1198] (0/2) Epoch 12, batch 4000, loss[loss=0.2602, ctc_loss=0.1816, cr_loss=0.393, over 21000.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.1798, cr_loss=0.3946, over 4099391.22 frames. ], batch size: 61, lr: 6.98e-03, grad_scale: 32.0 2024-09-15 01:04:42,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=210539.33333333334, ans=0.125 2024-09-15 01:05:20,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=210596.0, ans=0.04949747468305833 2024-09-15 01:05:23,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=210624.33333333334, ans=0.125 2024-09-15 01:05:31,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=210624.33333333334, ans=0.0 2024-09-15 01:05:38,393 INFO [train.py:1198] (0/2) Epoch 12, batch 4050, loss[loss=0.263, ctc_loss=0.1814, cr_loss=0.4077, over 21035.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1801, cr_loss=0.3951, over 4107185.97 frames. ], batch size: 63, lr: 6.98e-03, grad_scale: 32.0 2024-09-15 01:05:43,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=210652.66666666666, ans=0.04949747468305833 2024-09-15 01:05:58,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=210681.0, ans=0.025 2024-09-15 01:06:19,062 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.721e+02 2.126e+02 2.320e+02 2.555e+02 4.216e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-15 01:06:46,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=210766.0, ans=0.125 2024-09-15 01:06:53,711 INFO [train.py:1198] (0/2) Epoch 12, batch 4100, loss[loss=0.2183, ctc_loss=0.1474, cr_loss=0.3545, over 20956.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1796, cr_loss=0.3941, over 4112102.25 frames. ], batch size: 50, lr: 6.98e-03, grad_scale: 32.0 2024-09-15 01:07:12,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=210822.66666666666, ans=0.0 2024-09-15 01:07:32,439 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=22.5 2024-09-15 01:08:02,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=210907.66666666666, ans=0.0 2024-09-15 01:08:05,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=210907.66666666666, ans=0.125 2024-09-15 01:08:12,257 INFO [train.py:1198] (0/2) Epoch 12, batch 4150, loss[loss=0.2403, ctc_loss=0.1676, cr_loss=0.3637, over 21041.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1772, cr_loss=0.3908, over 4111310.15 frames. ], batch size: 63, lr: 6.97e-03, grad_scale: 32.0 2024-09-15 01:08:35,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=15.0 2024-09-15 01:08:52,744 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.058e+02 2.176e+02 2.478e+02 4.157e+02, threshold=4.351e+02, percent-clipped=0.0 2024-09-15 01:09:23,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2024-09-15 01:09:27,782 INFO [train.py:1198] (0/2) Epoch 12, batch 4200, loss[loss=0.2756, ctc_loss=0.1868, cr_loss=0.4441, over 21024.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1774, cr_loss=0.3908, over 4107078.15 frames. ], batch size: 63, lr: 6.97e-03, grad_scale: 32.0 2024-09-15 01:09:46,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=211106.0, ans=0.0 2024-09-15 01:10:22,983 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-15 01:10:37,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=211191.0, ans=0.125 2024-09-15 01:10:46,260 INFO [train.py:1198] (0/2) Epoch 12, batch 4250, loss[loss=0.2485, ctc_loss=0.1686, cr_loss=0.3995, over 20775.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1777, cr_loss=0.3905, over 4087863.78 frames. ], batch size: 53, lr: 6.97e-03, grad_scale: 32.0 2024-09-15 01:11:26,964 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.071e+02 2.226e+02 2.559e+02 4.154e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 01:11:28,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=211276.0, ans=0.125 2024-09-15 01:12:01,421 INFO [train.py:1198] (0/2) Epoch 12, batch 4300, loss[loss=0.2901, ctc_loss=0.2024, cr_loss=0.4381, over 20107.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1792, cr_loss=0.3931, over 4079647.07 frames. ], batch size: 80, lr: 6.97e-03, grad_scale: 32.0 2024-09-15 01:12:03,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=211361.0, ans=0.125 2024-09-15 01:12:14,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2024-09-15 01:12:17,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-09-15 01:13:19,453 INFO [train.py:1198] (0/2) Epoch 12, batch 4350, loss[loss=0.2601, ctc_loss=0.1783, cr_loss=0.4087, over 20799.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1799, cr_loss=0.3942, over 4082069.52 frames. ], batch size: 53, lr: 6.96e-03, grad_scale: 32.0 2024-09-15 01:13:30,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.10 vs. limit=15.0 2024-09-15 01:13:56,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=211559.33333333334, ans=0.125 2024-09-15 01:14:00,289 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.054e+02 2.156e+02 2.364e+02 4.364e+02, threshold=4.311e+02, percent-clipped=0.0 2024-09-15 01:14:08,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2024-09-15 01:14:09,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=211587.66666666666, ans=0.0 2024-09-15 01:14:12,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=211587.66666666666, ans=0.125 2024-09-15 01:14:14,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=211587.66666666666, ans=0.04949747468305833 2024-09-15 01:14:35,008 INFO [train.py:1198] (0/2) Epoch 12, batch 4400, loss[loss=0.2745, ctc_loss=0.1966, cr_loss=0.3896, over 20300.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1804, cr_loss=0.3949, over 4082527.42 frames. ], batch size: 74, lr: 6.96e-03, grad_scale: 32.0 2024-09-15 01:14:37,342 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.93 vs. limit=22.5 2024-09-15 01:14:42,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=211644.33333333334, ans=0.125 2024-09-15 01:14:45,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=211644.33333333334, ans=0.125 2024-09-15 01:14:52,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=211672.66666666666, ans=0.0 2024-09-15 01:14:57,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-09-15 01:15:20,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=211729.33333333334, ans=0.1 2024-09-15 01:15:28,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=211729.33333333334, ans=0.125 2024-09-15 01:15:37,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=211757.66666666666, ans=0.125 2024-09-15 01:15:53,461 INFO [train.py:1198] (0/2) Epoch 12, batch 4450, loss[loss=0.2468, ctc_loss=0.1704, cr_loss=0.3819, over 20792.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1797, cr_loss=0.3932, over 4089746.51 frames. ], batch size: 53, lr: 6.96e-03, grad_scale: 32.0 2024-09-15 01:16:19,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=211814.33333333334, ans=0.125 2024-09-15 01:16:34,064 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.005e+02 2.145e+02 2.285e+02 3.180e+02, threshold=4.291e+02, percent-clipped=0.0 2024-09-15 01:16:46,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211871.0, ans=0.1 2024-09-15 01:17:08,038 INFO [train.py:1198] (0/2) Epoch 12, batch 4500, loss[loss=0.2828, ctc_loss=0.1975, cr_loss=0.4266, over 20690.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1803, cr_loss=0.3945, over 4094337.73 frames. ], batch size: 68, lr: 6.96e-03, grad_scale: 32.0 2024-09-15 01:17:19,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=211927.66666666666, ans=0.125 2024-09-15 01:17:23,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=211956.0, ans=0.125 2024-09-15 01:17:25,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=211956.0, ans=0.125 2024-09-15 01:17:26,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=211956.0, ans=0.125 2024-09-15 01:18:07,674 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=15.0 2024-09-15 01:18:23,279 INFO [train.py:1198] (0/2) Epoch 12, batch 4550, loss[loss=0.2484, ctc_loss=0.1713, cr_loss=0.3855, over 20845.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1801, cr_loss=0.3937, over 4085496.71 frames. ], batch size: 65, lr: 6.95e-03, grad_scale: 32.0 2024-09-15 01:18:23,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=212069.33333333334, ans=0.125 2024-09-15 01:18:57,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=212126.0, ans=0.0 2024-09-15 01:18:57,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=212126.0, ans=0.125 2024-09-15 01:19:06,083 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.143e+02 2.292e+02 2.537e+02 7.495e+02, threshold=4.584e+02, percent-clipped=1.0 2024-09-15 01:19:21,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=212154.33333333334, ans=0.07 2024-09-15 01:19:27,697 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2024-09-15 01:19:31,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=212182.66666666666, ans=0.1 2024-09-15 01:19:40,389 INFO [train.py:1198] (0/2) Epoch 12, batch 4600, loss[loss=0.2272, ctc_loss=0.1541, cr_loss=0.3655, over 20984.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1811, cr_loss=0.3934, over 4049568.61 frames. ], batch size: 51, lr: 6.95e-03, grad_scale: 32.0 2024-09-15 01:19:45,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=212211.0, ans=0.125 2024-09-15 01:20:03,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=212239.33333333334, ans=0.125 2024-09-15 01:20:15,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=212267.66666666666, ans=0.125 2024-09-15 01:20:56,194 INFO [train.py:1198] (0/2) Epoch 12, batch 4650, loss[loss=0.2677, ctc_loss=0.1853, cr_loss=0.4117, over 20223.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.18, cr_loss=0.3922, over 4064799.83 frames. ], batch size: 74, lr: 6.95e-03, grad_scale: 32.0 2024-09-15 01:20:57,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.03 vs. limit=12.0 2024-09-15 01:20:59,522 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:21:13,521 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-15 01:21:26,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=212381.0, ans=0.0 2024-09-15 01:21:31,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=15.0 2024-09-15 01:21:39,666 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.028e+02 2.193e+02 2.342e+02 4.357e+02, threshold=4.386e+02, percent-clipped=0.0 2024-09-15 01:22:14,366 INFO [train.py:1198] (0/2) Epoch 12, batch 4700, loss[loss=0.248, ctc_loss=0.1728, cr_loss=0.3759, over 21007.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1793, cr_loss=0.3919, over 4075109.38 frames. ], batch size: 51, lr: 6.95e-03, grad_scale: 32.0 2024-09-15 01:22:35,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=212522.66666666666, ans=0.125 2024-09-15 01:22:42,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=212522.66666666666, ans=0.125 2024-09-15 01:22:56,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.88 vs. limit=10.0 2024-09-15 01:23:03,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=212579.33333333334, ans=0.0 2024-09-15 01:23:11,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=212579.33333333334, ans=0.1 2024-09-15 01:23:11,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=212579.33333333334, ans=0.0 2024-09-15 01:23:30,214 INFO [train.py:1198] (0/2) Epoch 12, batch 4750, loss[loss=0.3067, ctc_loss=0.2183, cr_loss=0.4418, over 19996.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1793, cr_loss=0.3925, over 4080723.86 frames. ], batch size: 80, lr: 6.95e-03, grad_scale: 32.0 2024-09-15 01:23:44,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=212664.33333333334, ans=0.125 2024-09-15 01:23:57,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=212664.33333333334, ans=0.95 2024-09-15 01:24:13,604 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.026e+02 2.151e+02 2.358e+02 3.126e+02, threshold=4.302e+02, percent-clipped=0.0 2024-09-15 01:24:31,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=212749.33333333334, ans=0.015 2024-09-15 01:24:48,180 INFO [train.py:1198] (0/2) Epoch 12, batch 4800, loss[loss=0.2295, ctc_loss=0.1566, cr_loss=0.3649, over 20995.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1797, cr_loss=0.3923, over 4075245.28 frames. ], batch size: 52, lr: 6.94e-03, grad_scale: 32.0 2024-09-15 01:25:49,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=212891.0, ans=0.125 2024-09-15 01:25:58,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-09-15 01:26:03,864 INFO [train.py:1198] (0/2) Epoch 12, batch 4850, loss[loss=0.2881, ctc_loss=0.2016, cr_loss=0.4324, over 18064.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1787, cr_loss=0.3915, over 4079393.84 frames. ], batch size: 108, lr: 6.94e-03, grad_scale: 32.0 2024-09-15 01:26:33,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.18 vs. limit=22.5 2024-09-15 01:26:44,725 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.032e+02 2.158e+02 2.366e+02 7.708e+02, threshold=4.315e+02, percent-clipped=1.0 2024-09-15 01:26:49,698 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=12.0 2024-09-15 01:27:22,417 INFO [train.py:1198] (0/2) Epoch 12, batch 4900, loss[loss=0.2229, ctc_loss=0.1525, cr_loss=0.3524, over 20945.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.178, cr_loss=0.3912, over 4085485.57 frames. ], batch size: 49, lr: 6.94e-03, grad_scale: 32.0 2024-09-15 01:28:14,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.92 vs. limit=10.0 2024-09-15 01:28:24,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=213174.33333333334, ans=0.05 2024-09-15 01:28:37,190 INFO [train.py:1198] (0/2) Epoch 12, batch 4950, loss[loss=0.2342, ctc_loss=0.16, cr_loss=0.3713, over 20995.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1788, cr_loss=0.3925, over 4078112.67 frames. ], batch size: 55, lr: 6.94e-03, grad_scale: 32.0 2024-09-15 01:28:56,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=213231.0, ans=0.125 2024-09-15 01:29:11,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=213259.33333333334, ans=0.025 2024-09-15 01:29:17,643 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.095e+02 2.327e+02 2.579e+02 5.653e+02, threshold=4.653e+02, percent-clipped=2.0 2024-09-15 01:29:51,743 INFO [train.py:1198] (0/2) Epoch 12, batch 5000, loss[loss=0.2639, ctc_loss=0.1843, cr_loss=0.3979, over 20926.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1795, cr_loss=0.3935, over 4087322.75 frames. ], batch size: 60, lr: 6.93e-03, grad_scale: 32.0 2024-09-15 01:29:52,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=213344.33333333334, ans=0.125 2024-09-15 01:30:02,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213344.33333333334, ans=0.1 2024-09-15 01:30:04,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=213372.66666666666, ans=0.2 2024-09-15 01:30:21,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=213401.0, ans=0.125 2024-09-15 01:30:40,845 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-15 01:30:54,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=22.5 2024-09-15 01:31:05,376 INFO [train.py:1198] (0/2) Epoch 12, batch 5050, loss[loss=0.2831, ctc_loss=0.1993, cr_loss=0.4188, over 20697.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1796, cr_loss=0.394, over 4098650.49 frames. ], batch size: 71, lr: 6.93e-03, grad_scale: 32.0 2024-09-15 01:31:06,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=213486.0, ans=15.0 2024-09-15 01:31:11,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=213486.0, ans=0.125 2024-09-15 01:31:34,918 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:31:44,530 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.109e+02 2.283e+02 2.545e+02 3.226e+02, threshold=4.566e+02, percent-clipped=0.0 2024-09-15 01:31:58,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=213571.0, ans=0.2 2024-09-15 01:32:04,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.89 vs. limit=15.0 2024-09-15 01:32:21,524 INFO [train.py:1198] (0/2) Epoch 12, batch 5100, loss[loss=0.2162, ctc_loss=0.1501, cr_loss=0.3306, over 20962.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1807, cr_loss=0.3956, over 4100793.84 frames. ], batch size: 51, lr: 6.93e-03, grad_scale: 32.0 2024-09-15 01:32:27,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=213627.66666666666, ans=0.2 2024-09-15 01:32:41,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=213656.0, ans=0.125 2024-09-15 01:32:46,089 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=12.0 2024-09-15 01:33:05,411 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2024-09-15 01:33:21,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=213741.0, ans=0.1 2024-09-15 01:33:35,653 INFO [train.py:1198] (0/2) Epoch 12, batch 5150, loss[loss=0.2501, ctc_loss=0.176, cr_loss=0.3702, over 21051.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1804, cr_loss=0.3955, over 4103035.09 frames. ], batch size: 56, lr: 6.93e-03, grad_scale: 32.0 2024-09-15 01:33:45,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=213769.33333333334, ans=0.2 2024-09-15 01:34:15,294 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.041e+02 2.271e+02 2.498e+02 3.657e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-15 01:34:18,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=213854.33333333334, ans=0.125 2024-09-15 01:34:34,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=213882.66666666666, ans=0.05 2024-09-15 01:34:47,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=213911.0, ans=0.025 2024-09-15 01:34:49,008 INFO [train.py:1198] (0/2) Epoch 12, batch 5200, loss[loss=0.2791, ctc_loss=0.1944, cr_loss=0.4235, over 20889.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1802, cr_loss=0.3956, over 4109828.59 frames. ], batch size: 57, lr: 6.93e-03, grad_scale: 32.0 2024-09-15 01:35:30,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=213967.66666666666, ans=0.025 2024-09-15 01:35:51,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=214024.33333333334, ans=0.025 2024-09-15 01:35:51,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=214024.33333333334, ans=0.0 2024-09-15 01:36:02,986 INFO [train.py:1198] (0/2) Epoch 12, batch 5250, loss[loss=0.251, ctc_loss=0.1716, cr_loss=0.397, over 20903.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1792, cr_loss=0.3935, over 4095755.50 frames. ], batch size: 54, lr: 6.92e-03, grad_scale: 32.0 2024-09-15 01:36:07,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=214052.66666666666, ans=0.0 2024-09-15 01:36:39,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=214109.33333333334, ans=0.125 2024-09-15 01:36:43,287 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.036e+02 2.149e+02 2.371e+02 5.610e+02, threshold=4.298e+02, percent-clipped=1.0 2024-09-15 01:37:06,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=214166.0, ans=0.0 2024-09-15 01:37:15,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=15.0 2024-09-15 01:37:20,006 INFO [train.py:1198] (0/2) Epoch 12, batch 5300, loss[loss=0.2634, ctc_loss=0.1835, cr_loss=0.3995, over 20040.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1787, cr_loss=0.3929, over 4097076.67 frames. ], batch size: 80, lr: 6.92e-03, grad_scale: 32.0 2024-09-15 01:37:23,752 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2024-09-15 01:37:33,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=214222.66666666666, ans=0.0 2024-09-15 01:37:36,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=214222.66666666666, ans=0.125 2024-09-15 01:37:45,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=214222.66666666666, ans=0.025 2024-09-15 01:37:48,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=214251.0, ans=0.025 2024-09-15 01:37:56,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=214251.0, ans=0.0 2024-09-15 01:38:32,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=214307.66666666666, ans=0.0 2024-09-15 01:38:34,844 INFO [train.py:1198] (0/2) Epoch 12, batch 5350, loss[loss=0.3409, ctc_loss=0.255, cr_loss=0.4293, over 14154.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.1782, cr_loss=0.3925, over 4097721.23 frames. ], batch size: 150, lr: 6.92e-03, grad_scale: 32.0 2024-09-15 01:38:35,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=214336.0, ans=0.0 2024-09-15 01:38:56,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=22.5 2024-09-15 01:39:15,271 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.055e+02 2.211e+02 2.434e+02 3.498e+02, threshold=4.423e+02, percent-clipped=0.0 2024-09-15 01:39:15,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214392.66666666666, ans=0.1 2024-09-15 01:39:17,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.00 vs. limit=10.0 2024-09-15 01:39:18,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=214421.0, ans=0.025 2024-09-15 01:39:32,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2024-09-15 01:39:41,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=214449.33333333334, ans=0.125 2024-09-15 01:39:49,612 INFO [train.py:1198] (0/2) Epoch 12, batch 5400, loss[loss=0.2518, ctc_loss=0.1767, cr_loss=0.3756, over 20990.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1789, cr_loss=0.3932, over 4091302.53 frames. ], batch size: 63, lr: 6.92e-03, grad_scale: 32.0 2024-09-15 01:40:30,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=214534.33333333334, ans=0.125 2024-09-15 01:40:52,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=214591.0, ans=0.125 2024-09-15 01:40:54,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=214591.0, ans=0.125 2024-09-15 01:41:03,581 INFO [train.py:1198] (0/2) Epoch 12, batch 5450, loss[loss=0.2097, ctc_loss=0.143, cr_loss=0.3338, over 19968.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1787, cr_loss=0.3935, over 4102416.20 frames. ], batch size: 44, lr: 6.91e-03, grad_scale: 32.0 2024-09-15 01:41:24,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=214647.66666666666, ans=0.0 2024-09-15 01:41:25,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=214647.66666666666, ans=0.0 2024-09-15 01:41:45,829 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.039e+02 2.124e+02 2.309e+02 4.463e+02, threshold=4.249e+02, percent-clipped=1.0 2024-09-15 01:42:19,657 INFO [train.py:1198] (0/2) Epoch 12, batch 5500, loss[loss=0.2163, ctc_loss=0.1462, cr_loss=0.3504, over 20914.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1788, cr_loss=0.3934, over 4097211.50 frames. ], batch size: 49, lr: 6.91e-03, grad_scale: 32.0 2024-09-15 01:42:34,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214789.33333333334, ans=0.1 2024-09-15 01:42:35,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=214789.33333333334, ans=0.125 2024-09-15 01:42:39,593 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:42:45,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2024-09-15 01:43:22,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=214874.33333333334, ans=0.125 2024-09-15 01:43:22,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=214874.33333333334, ans=0.125 2024-09-15 01:43:31,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=214874.33333333334, ans=0.125 2024-09-15 01:43:33,843 INFO [train.py:1198] (0/2) Epoch 12, batch 5550, loss[loss=0.2561, ctc_loss=0.1758, cr_loss=0.4015, over 20655.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1783, cr_loss=0.3923, over 4098368.97 frames. ], batch size: 68, lr: 6.91e-03, grad_scale: 32.0 2024-09-15 01:44:13,409 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.034e+02 2.125e+02 2.381e+02 4.036e+02, threshold=4.249e+02, percent-clipped=0.0 2024-09-15 01:44:32,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=215016.0, ans=0.2 2024-09-15 01:44:45,211 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:44:47,887 INFO [train.py:1198] (0/2) Epoch 12, batch 5600, loss[loss=0.2619, ctc_loss=0.1847, cr_loss=0.3862, over 20896.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1791, cr_loss=0.3931, over 4083005.90 frames. ], batch size: 57, lr: 6.91e-03, grad_scale: 32.0 2024-09-15 01:44:55,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=215044.33333333334, ans=0.2 2024-09-15 01:45:25,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=215101.0, ans=0.125 2024-09-15 01:46:04,710 INFO [train.py:1198] (0/2) Epoch 12, batch 5650, loss[loss=0.2376, ctc_loss=0.1611, cr_loss=0.3822, over 20722.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1785, cr_loss=0.3934, over 4092063.88 frames. ], batch size: 71, lr: 6.90e-03, grad_scale: 32.0 2024-09-15 01:46:32,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=215242.66666666666, ans=0.125 2024-09-15 01:46:44,279 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.090e+02 2.337e+02 2.625e+02 3.498e+02, threshold=4.675e+02, percent-clipped=0.0 2024-09-15 01:47:18,395 INFO [train.py:1198] (0/2) Epoch 12, batch 5700, loss[loss=0.2566, ctc_loss=0.1784, cr_loss=0.3915, over 20807.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1794, cr_loss=0.3939, over 4084297.08 frames. ], batch size: 53, lr: 6.90e-03, grad_scale: 32.0 2024-09-15 01:47:20,032 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-76000.pt 2024-09-15 01:48:27,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.77 vs. limit=12.0 2024-09-15 01:48:33,668 INFO [train.py:1198] (0/2) Epoch 12, batch 5750, loss[loss=0.2955, ctc_loss=0.2065, cr_loss=0.4448, over 20651.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1796, cr_loss=0.394, over 4075303.52 frames. ], batch size: 68, lr: 6.90e-03, grad_scale: 32.0 2024-09-15 01:48:33,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=215469.33333333334, ans=0.125 2024-09-15 01:48:48,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=215497.66666666666, ans=0.2 2024-09-15 01:48:56,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=215497.66666666666, ans=0.0 2024-09-15 01:49:14,026 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.077e+02 2.189e+02 2.401e+02 3.549e+02, threshold=4.378e+02, percent-clipped=0.0 2024-09-15 01:49:38,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=22.5 2024-09-15 01:49:39,790 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:49:50,551 INFO [train.py:1198] (0/2) Epoch 12, batch 5800, loss[loss=0.2581, ctc_loss=0.1762, cr_loss=0.4096, over 20828.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.179, cr_loss=0.3924, over 4065876.59 frames. ], batch size: 65, lr: 6.90e-03, grad_scale: 32.0 2024-09-15 01:49:58,555 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2024-09-15 01:50:04,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=215639.33333333334, ans=0.125 2024-09-15 01:50:07,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=215639.33333333334, ans=0.125 2024-09-15 01:50:14,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=215639.33333333334, ans=0.125 2024-09-15 01:51:04,342 INFO [train.py:1198] (0/2) Epoch 12, batch 5850, loss[loss=0.2429, ctc_loss=0.1659, cr_loss=0.3851, over 20862.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1795, cr_loss=0.3932, over 4082302.85 frames. ], batch size: 57, lr: 6.90e-03, grad_scale: 32.0 2024-09-15 01:51:06,340 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:51:30,999 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 01:51:43,614 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.132e+02 2.269e+02 2.517e+02 4.802e+02, threshold=4.538e+02, percent-clipped=1.0 2024-09-15 01:52:04,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=215866.0, ans=0.125 2024-09-15 01:52:17,220 INFO [train.py:1198] (0/2) Epoch 12, batch 5900, loss[loss=0.2541, ctc_loss=0.1749, cr_loss=0.3957, over 20958.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1783, cr_loss=0.3915, over 4091646.78 frames. ], batch size: 64, lr: 6.89e-03, grad_scale: 64.0 2024-09-15 01:52:33,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=215922.66666666666, ans=0.125 2024-09-15 01:52:55,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=215951.0, ans=0.025 2024-09-15 01:53:11,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=215979.33333333334, ans=0.125 2024-09-15 01:53:30,861 INFO [train.py:1198] (0/2) Epoch 12, batch 5950, loss[loss=0.232, ctc_loss=0.1572, cr_loss=0.3736, over 21007.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1793, cr_loss=0.3928, over 4085504.22 frames. ], batch size: 52, lr: 6.89e-03, grad_scale: 64.0 2024-09-15 01:53:49,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2024-09-15 01:53:57,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=216064.33333333334, ans=0.125 2024-09-15 01:54:13,177 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.691e+02 2.098e+02 2.217e+02 2.366e+02 3.588e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-15 01:54:20,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=216121.0, ans=0.0 2024-09-15 01:54:33,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=216149.33333333334, ans=0.0 2024-09-15 01:54:46,933 INFO [train.py:1198] (0/2) Epoch 12, batch 6000, loss[loss=0.3111, ctc_loss=0.2288, cr_loss=0.4114, over 14148.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1806, cr_loss=0.3948, over 4075055.12 frames. ], batch size: 150, lr: 6.89e-03, grad_scale: 64.0 2024-09-15 01:54:46,934 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 01:55:14,061 INFO [train.py:1230] (0/2) Epoch 12, validation: loss=0.04862, ctc_loss=0.04862, cr_loss=9.819e-15, over 944034.00 frames. 2024-09-15 01:55:14,062 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 01:55:27,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=216206.0, ans=0.0 2024-09-15 01:55:51,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=12.0 2024-09-15 01:56:00,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=216262.66666666666, ans=0.025 2024-09-15 01:56:22,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=216291.0, ans=0.2 2024-09-15 01:56:29,448 INFO [train.py:1198] (0/2) Epoch 12, batch 6050, loss[loss=0.268, ctc_loss=0.1897, cr_loss=0.3914, over 21060.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1799, cr_loss=0.3932, over 4082242.86 frames. ], batch size: 59, lr: 6.89e-03, grad_scale: 64.0 2024-09-15 01:56:29,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216319.33333333334, ans=0.1 2024-09-15 01:56:52,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=216347.66666666666, ans=0.0 2024-09-15 01:57:08,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=216376.0, ans=0.025 2024-09-15 01:57:09,787 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.018e+02 2.157e+02 2.427e+02 3.919e+02, threshold=4.313e+02, percent-clipped=0.0 2024-09-15 01:57:45,240 INFO [train.py:1198] (0/2) Epoch 12, batch 6100, loss[loss=0.2518, ctc_loss=0.1702, cr_loss=0.408, over 20889.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1804, cr_loss=0.3943, over 4079467.96 frames. ], batch size: 54, lr: 6.88e-03, grad_scale: 64.0 2024-09-15 01:58:00,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216489.33333333334, ans=0.1 2024-09-15 01:58:59,363 INFO [train.py:1198] (0/2) Epoch 12, batch 6150, loss[loss=0.2954, ctc_loss=0.2117, cr_loss=0.4187, over 18499.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1804, cr_loss=0.3941, over 4064671.67 frames. ], batch size: 108, lr: 6.88e-03, grad_scale: 64.0 2024-09-15 01:58:59,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=216602.66666666666, ans=0.0 2024-09-15 01:59:08,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=216602.66666666666, ans=0.2 2024-09-15 01:59:08,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216602.66666666666, ans=0.1 2024-09-15 01:59:35,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=22.5 2024-09-15 01:59:39,109 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.020e+02 2.125e+02 2.318e+02 3.120e+02, threshold=4.251e+02, percent-clipped=0.0 2024-09-15 01:59:39,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=216659.33333333334, ans=0.025 2024-09-15 01:59:54,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=216687.66666666666, ans=0.025 2024-09-15 02:00:12,959 INFO [train.py:1198] (0/2) Epoch 12, batch 6200, loss[loss=0.237, ctc_loss=0.1641, cr_loss=0.3646, over 21002.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1789, cr_loss=0.3913, over 4061349.57 frames. ], batch size: 52, lr: 6.88e-03, grad_scale: 64.0 2024-09-15 02:01:25,668 INFO [train.py:1198] (0/2) Epoch 12, batch 6250, loss[loss=0.2976, ctc_loss=0.2193, cr_loss=0.3915, over 14532.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1783, cr_loss=0.3905, over 4047184.49 frames. ], batch size: 150, lr: 6.88e-03, grad_scale: 32.0 2024-09-15 02:02:07,924 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.049e+02 2.202e+02 2.453e+02 4.543e+02, threshold=4.404e+02, percent-clipped=2.0 2024-09-15 02:02:24,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216999.33333333334, ans=0.1 2024-09-15 02:02:39,728 INFO [train.py:1198] (0/2) Epoch 12, batch 6300, loss[loss=0.2772, ctc_loss=0.1949, cr_loss=0.4116, over 18050.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1804, cr_loss=0.3923, over 4019130.60 frames. ], batch size: 108, lr: 6.88e-03, grad_scale: 32.0 2024-09-15 02:02:41,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=217027.66666666666, ans=0.125 2024-09-15 02:03:38,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=217141.0, ans=0.125 2024-09-15 02:03:50,271 INFO [train.py:1198] (0/2) Epoch 12, batch 6350, loss[loss=0.3217, ctc_loss=0.2348, cr_loss=0.4346, over 13650.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.186, cr_loss=0.3955, over 3851677.62 frames. ], batch size: 150, lr: 6.87e-03, grad_scale: 32.0 2024-09-15 02:03:56,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.64 vs. limit=5.0 2024-09-15 02:04:00,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=217169.33333333334, ans=0.0 2024-09-15 02:04:03,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=217197.66666666666, ans=0.0 2024-09-15 02:04:05,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=217197.66666666666, ans=0.125 2024-09-15 02:04:18,458 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2024-09-15 02:04:27,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=217226.0, ans=0.125 2024-09-15 02:04:30,112 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.369e+02 2.580e+02 2.709e+02 3.659e+02, threshold=5.161e+02, percent-clipped=0.0 2024-09-15 02:04:48,345 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-12.pt 2024-09-15 02:05:36,638 INFO [train.py:1198] (0/2) Epoch 13, batch 0, loss[loss=0.2518, ctc_loss=0.1739, cr_loss=0.3894, over 21076.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1739, cr_loss=0.3894, over 21076.00 frames. ], batch size: 59, lr: 6.60e-03, grad_scale: 32.0 2024-09-15 02:05:36,639 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 02:05:52,574 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.5265, 2.3027, 2.5046, 2.5130, 2.2771, 2.4381, 1.7064, 1.6697], device='cuda:0') 2024-09-15 02:05:55,016 INFO [train.py:1230] (0/2) Epoch 13, validation: loss=0.05013, ctc_loss=0.05013, cr_loss=9.52e-15, over 944034.00 frames. 2024-09-15 02:05:55,017 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 02:06:13,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=217313.83333333334, ans=0.125 2024-09-15 02:06:22,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217313.83333333334, ans=0.1 2024-09-15 02:06:32,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=217342.16666666666, ans=0.125 2024-09-15 02:06:46,242 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-09-15 02:06:47,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=217370.5, ans=0.2 2024-09-15 02:06:56,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=217398.83333333334, ans=0.025 2024-09-15 02:07:09,689 INFO [train.py:1198] (0/2) Epoch 13, batch 50, loss[loss=0.2547, ctc_loss=0.1797, cr_loss=0.3747, over 20634.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1811, cr_loss=0.3937, over 914830.71 frames. ], batch size: 66, lr: 6.60e-03, grad_scale: 32.0 2024-09-15 02:07:19,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2024-09-15 02:07:35,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217455.5, ans=0.1 2024-09-15 02:07:41,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=217483.83333333334, ans=0.125 2024-09-15 02:07:44,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217483.83333333334, ans=0.1 2024-09-15 02:07:58,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-09-15 02:08:05,091 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 2.120e+02 2.301e+02 2.578e+02 3.916e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-15 02:08:11,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=217540.5, ans=0.05 2024-09-15 02:08:21,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=217540.5, ans=0.0 2024-09-15 02:08:24,524 INFO [train.py:1198] (0/2) Epoch 13, batch 100, loss[loss=0.2719, ctc_loss=0.1871, cr_loss=0.4239, over 21063.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1819, cr_loss=0.3976, over 1614248.61 frames. ], batch size: 59, lr: 6.60e-03, grad_scale: 32.0 2024-09-15 02:08:27,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=217568.83333333334, ans=0.025 2024-09-15 02:08:38,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=217597.16666666666, ans=0.125 2024-09-15 02:08:52,907 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2024-09-15 02:08:58,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=217625.5, ans=0.2 2024-09-15 02:08:59,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-09-15 02:09:13,544 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:09:15,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=217653.83333333334, ans=0.125 2024-09-15 02:09:24,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=217653.83333333334, ans=10.0 2024-09-15 02:09:28,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217682.16666666666, ans=0.1 2024-09-15 02:09:45,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217710.5, ans=0.1 2024-09-15 02:09:46,712 INFO [train.py:1198] (0/2) Epoch 13, batch 150, loss[loss=0.2606, ctc_loss=0.1813, cr_loss=0.3965, over 21070.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1805, cr_loss=0.3961, over 2165067.16 frames. ], batch size: 59, lr: 6.60e-03, grad_scale: 32.0 2024-09-15 02:10:41,696 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.724e+02 1.973e+02 2.130e+02 2.293e+02 4.231e+02, threshold=4.260e+02, percent-clipped=0.0 2024-09-15 02:10:42,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=217795.5, ans=0.125 2024-09-15 02:11:01,230 INFO [train.py:1198] (0/2) Epoch 13, batch 200, loss[loss=0.2493, ctc_loss=0.1698, cr_loss=0.3979, over 20886.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1798, cr_loss=0.3963, over 2595414.32 frames. ], batch size: 54, lr: 6.59e-03, grad_scale: 32.0 2024-09-15 02:11:18,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=217880.5, ans=0.125 2024-09-15 02:11:28,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=217880.5, ans=0.05 2024-09-15 02:11:36,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=217908.83333333334, ans=0.025 2024-09-15 02:11:37,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=217908.83333333334, ans=0.125 2024-09-15 02:12:16,877 INFO [train.py:1198] (0/2) Epoch 13, batch 250, loss[loss=0.3004, ctc_loss=0.2152, cr_loss=0.4262, over 18174.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.1782, cr_loss=0.3939, over 2938129.59 frames. ], batch size: 108, lr: 6.59e-03, grad_scale: 32.0 2024-09-15 02:12:39,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=218022.16666666666, ans=0.0 2024-09-15 02:12:41,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=218022.16666666666, ans=0.0 2024-09-15 02:12:53,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=218050.5, ans=0.125 2024-09-15 02:12:55,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.63 vs. limit=22.5 2024-09-15 02:13:13,008 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.703e+02 2.014e+02 2.202e+02 2.449e+02 3.252e+02, threshold=4.405e+02, percent-clipped=0.0 2024-09-15 02:13:19,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=218107.16666666666, ans=0.125 2024-09-15 02:13:28,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=218107.16666666666, ans=0.1 2024-09-15 02:13:32,330 INFO [train.py:1198] (0/2) Epoch 13, batch 300, loss[loss=0.2285, ctc_loss=0.1596, cr_loss=0.3444, over 19389.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1776, cr_loss=0.392, over 3189816.35 frames. ], batch size: 43, lr: 6.59e-03, grad_scale: 32.0 2024-09-15 02:13:52,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=218163.83333333334, ans=0.125 2024-09-15 02:13:53,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=218163.83333333334, ans=0.035 2024-09-15 02:14:50,745 INFO [train.py:1198] (0/2) Epoch 13, batch 350, loss[loss=0.2125, ctc_loss=0.1452, cr_loss=0.3367, over 20261.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1781, cr_loss=0.3934, over 3402518.51 frames. ], batch size: 45, lr: 6.59e-03, grad_scale: 32.0 2024-09-15 02:15:22,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=218333.83333333334, ans=0.125 2024-09-15 02:15:30,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=218333.83333333334, ans=0.125 2024-09-15 02:15:51,060 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.038e+02 2.174e+02 2.334e+02 3.174e+02, threshold=4.349e+02, percent-clipped=0.0 2024-09-15 02:16:09,281 INFO [train.py:1198] (0/2) Epoch 13, batch 400, loss[loss=0.2738, ctc_loss=0.1892, cr_loss=0.4231, over 20997.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.1782, cr_loss=0.3947, over 3570239.21 frames. ], batch size: 64, lr: 6.59e-03, grad_scale: 32.0 2024-09-15 02:16:36,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218447.16666666666, ans=0.1 2024-09-15 02:16:39,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=218475.5, ans=0.125 2024-09-15 02:16:45,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=218475.5, ans=0.0 2024-09-15 02:16:48,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=218475.5, ans=0.0 2024-09-15 02:16:54,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=218503.83333333334, ans=0.125 2024-09-15 02:17:24,425 INFO [train.py:1198] (0/2) Epoch 13, batch 450, loss[loss=0.2546, ctc_loss=0.1789, cr_loss=0.3784, over 20877.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1779, cr_loss=0.3933, over 3684030.98 frames. ], batch size: 57, lr: 6.58e-03, grad_scale: 32.0 2024-09-15 02:17:29,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=218560.5, ans=0.1 2024-09-15 02:17:41,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-15 02:17:42,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=218588.83333333334, ans=0.025 2024-09-15 02:18:14,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=218645.5, ans=0.2 2024-09-15 02:18:21,449 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.023e+02 2.176e+02 2.401e+02 3.872e+02, threshold=4.352e+02, percent-clipped=0.0 2024-09-15 02:18:26,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=218673.83333333334, ans=0.125 2024-09-15 02:18:39,156 INFO [train.py:1198] (0/2) Epoch 13, batch 500, loss[loss=0.2899, ctc_loss=0.2027, cr_loss=0.4361, over 18215.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1782, cr_loss=0.394, over 3777033.31 frames. ], batch size: 108, lr: 6.58e-03, grad_scale: 32.0 2024-09-15 02:19:13,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=218758.83333333334, ans=0.125 2024-09-15 02:19:25,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=218787.16666666666, ans=0.125 2024-09-15 02:19:32,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.25 vs. limit=22.5 2024-09-15 02:19:43,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=218815.5, ans=0.0 2024-09-15 02:19:51,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=218815.5, ans=0.07 2024-09-15 02:19:54,060 INFO [train.py:1198] (0/2) Epoch 13, batch 550, loss[loss=0.2993, ctc_loss=0.205, cr_loss=0.4718, over 20852.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1776, cr_loss=0.3938, over 3857552.52 frames. ], batch size: 65, lr: 6.58e-03, grad_scale: 32.0 2024-09-15 02:19:59,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-15 02:20:03,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=218843.83333333334, ans=0.125 2024-09-15 02:20:27,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=218900.5, ans=0.125 2024-09-15 02:20:54,955 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.048e+02 2.205e+02 2.451e+02 4.095e+02, threshold=4.409e+02, percent-clipped=0.0 2024-09-15 02:20:56,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=218957.16666666666, ans=0.125 2024-09-15 02:21:15,693 INFO [train.py:1198] (0/2) Epoch 13, batch 600, loss[loss=0.3216, ctc_loss=0.2324, cr_loss=0.4462, over 14151.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.178, cr_loss=0.3936, over 3902076.73 frames. ], batch size: 149, lr: 6.58e-03, grad_scale: 32.0 2024-09-15 02:21:44,738 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:21:46,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.63 vs. limit=10.0 2024-09-15 02:21:47,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=219042.16666666666, ans=0.025 2024-09-15 02:21:50,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=219042.16666666666, ans=0.125 2024-09-15 02:22:07,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=219070.5, ans=0.07 2024-09-15 02:22:07,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2024-09-15 02:22:16,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=219098.83333333334, ans=0.02 2024-09-15 02:22:27,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=219098.83333333334, ans=0.125 2024-09-15 02:22:31,253 INFO [train.py:1198] (0/2) Epoch 13, batch 650, loss[loss=0.2474, ctc_loss=0.1701, cr_loss=0.3864, over 20807.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1777, cr_loss=0.3926, over 3936227.98 frames. ], batch size: 53, lr: 6.58e-03, grad_scale: 32.0 2024-09-15 02:22:31,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=219127.16666666666, ans=0.0 2024-09-15 02:22:45,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=219155.5, ans=0.125 2024-09-15 02:22:54,617 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:23:00,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219183.83333333334, ans=0.1 2024-09-15 02:23:28,492 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.137e+02 2.272e+02 2.536e+02 3.616e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-15 02:23:40,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=219240.5, ans=0.07 2024-09-15 02:23:42,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=219240.5, ans=0.125 2024-09-15 02:23:46,391 INFO [train.py:1198] (0/2) Epoch 13, batch 700, loss[loss=0.2541, ctc_loss=0.1748, cr_loss=0.3965, over 20642.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.1782, cr_loss=0.3926, over 3962638.59 frames. ], batch size: 71, lr: 6.57e-03, grad_scale: 32.0 2024-09-15 02:24:09,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=219297.16666666666, ans=6.0 2024-09-15 02:24:38,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-15 02:24:51,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=219382.16666666666, ans=0.0 2024-09-15 02:24:56,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=219382.16666666666, ans=0.125 2024-09-15 02:25:01,830 INFO [train.py:1198] (0/2) Epoch 13, batch 750, loss[loss=0.2821, ctc_loss=0.1941, cr_loss=0.4402, over 20684.00 frames. ], tot_loss[loss=0.2585, ctc_loss=0.1796, cr_loss=0.3944, over 3980757.32 frames. ], batch size: 66, lr: 6.57e-03, grad_scale: 32.0 2024-09-15 02:25:45,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=219495.5, ans=0.0 2024-09-15 02:25:53,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=219495.5, ans=0.1 2024-09-15 02:25:57,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219495.5, ans=0.1 2024-09-15 02:25:59,067 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.047e+02 2.137e+02 2.327e+02 3.350e+02, threshold=4.275e+02, percent-clipped=0.0 2024-09-15 02:26:19,807 INFO [train.py:1198] (0/2) Epoch 13, batch 800, loss[loss=0.2749, ctc_loss=0.1862, cr_loss=0.4433, over 20717.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1797, cr_loss=0.3951, over 4000028.18 frames. ], batch size: 71, lr: 6.57e-03, grad_scale: 32.0 2024-09-15 02:26:23,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=219552.16666666666, ans=0.125 2024-09-15 02:26:35,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=219580.5, ans=0.1 2024-09-15 02:27:04,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=219608.83333333334, ans=0.125 2024-09-15 02:27:09,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=219637.16666666666, ans=0.125 2024-09-15 02:27:37,813 INFO [train.py:1198] (0/2) Epoch 13, batch 850, loss[loss=0.2118, ctc_loss=0.1442, cr_loss=0.3382, over 20970.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1787, cr_loss=0.3942, over 4016559.55 frames. ], batch size: 50, lr: 6.57e-03, grad_scale: 32.0 2024-09-15 02:28:02,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=219722.16666666666, ans=0.0 2024-09-15 02:28:35,324 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.051e+02 2.169e+02 2.415e+02 3.935e+02, threshold=4.339e+02, percent-clipped=0.0 2024-09-15 02:28:53,240 INFO [train.py:1198] (0/2) Epoch 13, batch 900, loss[loss=0.203, ctc_loss=0.1374, cr_loss=0.3283, over 20965.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1778, cr_loss=0.3931, over 4043628.01 frames. ], batch size: 48, lr: 6.57e-03, grad_scale: 32.0 2024-09-15 02:29:28,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=219892.16666666666, ans=0.125 2024-09-15 02:30:09,299 INFO [train.py:1198] (0/2) Epoch 13, batch 950, loss[loss=0.2806, ctc_loss=0.1974, cr_loss=0.4162, over 20964.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1776, cr_loss=0.3926, over 4062058.39 frames. ], batch size: 58, lr: 6.56e-03, grad_scale: 32.0 2024-09-15 02:30:12,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=219977.16666666666, ans=0.125 2024-09-15 02:30:17,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=219977.16666666666, ans=0.0 2024-09-15 02:30:29,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=220005.5, ans=0.125 2024-09-15 02:31:07,050 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.053e+02 2.160e+02 2.321e+02 3.157e+02, threshold=4.319e+02, percent-clipped=0.0 2024-09-15 02:31:22,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=220090.5, ans=0.0 2024-09-15 02:31:25,288 INFO [train.py:1198] (0/2) Epoch 13, batch 1000, loss[loss=0.2199, ctc_loss=0.155, cr_loss=0.3245, over 20998.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1768, cr_loss=0.3918, over 4074888.19 frames. ], batch size: 48, lr: 6.56e-03, grad_scale: 32.0 2024-09-15 02:31:27,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=220118.83333333334, ans=0.125 2024-09-15 02:31:42,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220147.16666666666, ans=0.1 2024-09-15 02:32:11,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220175.5, ans=0.1 2024-09-15 02:32:24,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=220203.83333333334, ans=0.0 2024-09-15 02:32:29,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=220232.16666666666, ans=0.125 2024-09-15 02:32:46,956 INFO [train.py:1198] (0/2) Epoch 13, batch 1050, loss[loss=0.2469, ctc_loss=0.1717, cr_loss=0.3761, over 21069.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1773, cr_loss=0.3922, over 4068244.62 frames. ], batch size: 56, lr: 6.56e-03, grad_scale: 32.0 2024-09-15 02:33:29,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=220317.16666666666, ans=0.125 2024-09-15 02:33:43,600 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.681e+02 2.044e+02 2.167e+02 2.351e+02 4.531e+02, threshold=4.334e+02, percent-clipped=1.0 2024-09-15 02:34:01,644 INFO [train.py:1198] (0/2) Epoch 13, batch 1100, loss[loss=0.2359, ctc_loss=0.1599, cr_loss=0.3798, over 20876.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1775, cr_loss=0.3931, over 4082100.12 frames. ], batch size: 57, lr: 6.56e-03, grad_scale: 32.0 2024-09-15 02:34:01,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220402.16666666666, ans=0.1 2024-09-15 02:34:02,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220402.16666666666, ans=0.1 2024-09-15 02:34:03,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2024-09-15 02:34:15,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=220430.5, ans=0.125 2024-09-15 02:35:11,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=220515.5, ans=0.025 2024-09-15 02:35:16,917 INFO [train.py:1198] (0/2) Epoch 13, batch 1150, loss[loss=0.2494, ctc_loss=0.1726, cr_loss=0.3838, over 21047.00 frames. ], tot_loss[loss=0.2566, ctc_loss=0.1779, cr_loss=0.3932, over 4081871.89 frames. ], batch size: 62, lr: 6.55e-03, grad_scale: 32.0 2024-09-15 02:35:30,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=220572.16666666666, ans=0.125 2024-09-15 02:35:52,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=220600.5, ans=0.025 2024-09-15 02:36:00,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=220628.83333333334, ans=0.125 2024-09-15 02:36:14,104 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.085e+02 2.221e+02 2.392e+02 4.086e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-15 02:36:18,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220657.16666666666, ans=0.1 2024-09-15 02:36:32,137 INFO [train.py:1198] (0/2) Epoch 13, batch 1200, loss[loss=0.2333, ctc_loss=0.1613, cr_loss=0.3602, over 20955.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1776, cr_loss=0.3925, over 4092790.38 frames. ], batch size: 50, lr: 6.55e-03, grad_scale: 32.0 2024-09-15 02:36:34,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2024-09-15 02:36:38,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-09-15 02:37:23,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=12.0 2024-09-15 02:37:25,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=220770.5, ans=10.0 2024-09-15 02:37:49,114 INFO [train.py:1198] (0/2) Epoch 13, batch 1250, loss[loss=0.2235, ctc_loss=0.1554, cr_loss=0.3409, over 20992.00 frames. ], tot_loss[loss=0.2566, ctc_loss=0.178, cr_loss=0.3927, over 4098043.86 frames. ], batch size: 52, lr: 6.55e-03, grad_scale: 32.0 2024-09-15 02:38:47,048 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-15 02:38:48,894 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.061e+02 2.280e+02 2.497e+02 2.985e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-15 02:39:07,299 INFO [train.py:1198] (0/2) Epoch 13, batch 1300, loss[loss=0.2461, ctc_loss=0.1718, cr_loss=0.3715, over 19932.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1772, cr_loss=0.3915, over 4095940.37 frames. ], batch size: 44, lr: 6.55e-03, grad_scale: 32.0 2024-09-15 02:40:22,367 INFO [train.py:1198] (0/2) Epoch 13, batch 1350, loss[loss=0.2158, ctc_loss=0.1481, cr_loss=0.3387, over 20961.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.176, cr_loss=0.3902, over 4110928.39 frames. ], batch size: 48, lr: 6.55e-03, grad_scale: 32.0 2024-09-15 02:40:26,216 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-09-15 02:40:54,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=221167.16666666666, ans=0.0 2024-09-15 02:41:19,071 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.070e+02 2.240e+02 2.532e+02 4.939e+02, threshold=4.479e+02, percent-clipped=1.0 2024-09-15 02:41:37,041 INFO [train.py:1198] (0/2) Epoch 13, batch 1400, loss[loss=0.2499, ctc_loss=0.1719, cr_loss=0.39, over 20964.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.176, cr_loss=0.3901, over 4102985.24 frames. ], batch size: 50, lr: 6.54e-03, grad_scale: 32.0 2024-09-15 02:41:38,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=221252.16666666666, ans=0.2 2024-09-15 02:41:41,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=221252.16666666666, ans=0.0 2024-09-15 02:41:49,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=221252.16666666666, ans=0.0 2024-09-15 02:42:27,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=22.5 2024-09-15 02:42:37,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=221365.5, ans=0.125 2024-09-15 02:42:42,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-09-15 02:42:48,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.53 vs. limit=15.0 2024-09-15 02:42:52,128 INFO [train.py:1198] (0/2) Epoch 13, batch 1450, loss[loss=0.2399, ctc_loss=0.1654, cr_loss=0.3725, over 20884.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1758, cr_loss=0.3905, over 4103430.76 frames. ], batch size: 54, lr: 6.54e-03, grad_scale: 32.0 2024-09-15 02:43:44,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.60 vs. limit=6.0 2024-09-15 02:43:52,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-09-15 02:43:52,909 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 1.976e+02 2.115e+02 2.299e+02 3.468e+02, threshold=4.231e+02, percent-clipped=0.0 2024-09-15 02:43:56,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=221507.16666666666, ans=0.2 2024-09-15 02:44:05,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-15 02:44:06,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=221507.16666666666, ans=0.125 2024-09-15 02:44:13,248 INFO [train.py:1198] (0/2) Epoch 13, batch 1500, loss[loss=0.2214, ctc_loss=0.1519, cr_loss=0.3472, over 20945.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1748, cr_loss=0.3895, over 4114210.22 frames. ], batch size: 49, lr: 6.54e-03, grad_scale: 32.0 2024-09-15 02:45:03,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=221620.5, ans=0.1 2024-09-15 02:45:15,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=221648.83333333334, ans=0.1 2024-09-15 02:45:28,835 INFO [train.py:1198] (0/2) Epoch 13, batch 1550, loss[loss=0.2652, ctc_loss=0.1873, cr_loss=0.3893, over 21040.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1745, cr_loss=0.3884, over 4117480.93 frames. ], batch size: 62, lr: 6.54e-03, grad_scale: 32.0 2024-09-15 02:45:58,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.60 vs. limit=12.0 2024-09-15 02:46:07,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=221733.83333333334, ans=0.125 2024-09-15 02:46:20,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=221762.16666666666, ans=0.2 2024-09-15 02:46:25,536 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.025e+02 2.167e+02 2.353e+02 3.428e+02, threshold=4.333e+02, percent-clipped=0.0 2024-09-15 02:46:43,859 INFO [train.py:1198] (0/2) Epoch 13, batch 1600, loss[loss=0.294, ctc_loss=0.213, cr_loss=0.4051, over 14995.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1758, cr_loss=0.3904, over 4098719.73 frames. ], batch size: 149, lr: 6.54e-03, grad_scale: 32.0 2024-09-15 02:46:50,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=221818.83333333334, ans=0.125 2024-09-15 02:47:23,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=221875.5, ans=0.0 2024-09-15 02:47:34,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=221903.83333333334, ans=0.125 2024-09-15 02:47:59,146 INFO [train.py:1198] (0/2) Epoch 13, batch 1650, loss[loss=0.2497, ctc_loss=0.1671, cr_loss=0.413, over 20853.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1767, cr_loss=0.3922, over 4094988.07 frames. ], batch size: 57, lr: 6.53e-03, grad_scale: 32.0 2024-09-15 02:48:17,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=221988.83333333334, ans=0.0 2024-09-15 02:48:20,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=221988.83333333334, ans=0.125 2024-09-15 02:48:34,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=222017.16666666666, ans=0.2 2024-09-15 02:48:59,197 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.061e+02 2.203e+02 2.439e+02 3.566e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-15 02:49:11,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222073.83333333334, ans=0.1 2024-09-15 02:49:17,381 INFO [train.py:1198] (0/2) Epoch 13, batch 1700, loss[loss=0.2658, ctc_loss=0.1839, cr_loss=0.4095, over 20838.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1773, cr_loss=0.3929, over 4095181.14 frames. ], batch size: 65, lr: 6.53e-03, grad_scale: 32.0 2024-09-15 02:49:35,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=222130.5, ans=0.5 2024-09-15 02:49:53,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=222158.83333333334, ans=0.0 2024-09-15 02:50:01,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=15.0 2024-09-15 02:50:21,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=222215.5, ans=0.025 2024-09-15 02:50:34,697 INFO [train.py:1198] (0/2) Epoch 13, batch 1750, loss[loss=0.2938, ctc_loss=0.2089, cr_loss=0.4247, over 19622.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.1782, cr_loss=0.3929, over 4076425.08 frames. ], batch size: 90, lr: 6.53e-03, grad_scale: 16.0 2024-09-15 02:50:48,201 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.794e-02 2024-09-15 02:50:51,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=222272.16666666666, ans=0.125 2024-09-15 02:50:58,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=222272.16666666666, ans=0.125 2024-09-15 02:51:31,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=222328.83333333334, ans=0.2 2024-09-15 02:51:32,885 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.109e+02 2.307e+02 2.537e+02 5.137e+02, threshold=4.614e+02, percent-clipped=2.0 2024-09-15 02:51:43,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=222357.16666666666, ans=0.1 2024-09-15 02:51:49,574 INFO [train.py:1198] (0/2) Epoch 13, batch 1800, loss[loss=0.227, ctc_loss=0.1545, cr_loss=0.3625, over 21002.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1771, cr_loss=0.3916, over 4085456.23 frames. ], batch size: 52, lr: 6.53e-03, grad_scale: 16.0 2024-09-15 02:52:38,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=222470.5, ans=0.0 2024-09-15 02:52:39,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=222470.5, ans=0.0 2024-09-15 02:52:44,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=222470.5, ans=0.025 2024-09-15 02:53:05,103 INFO [train.py:1198] (0/2) Epoch 13, batch 1850, loss[loss=0.2271, ctc_loss=0.1571, cr_loss=0.3499, over 20954.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1767, cr_loss=0.3909, over 4098172.18 frames. ], batch size: 55, lr: 6.53e-03, grad_scale: 16.0 2024-09-15 02:53:30,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=222555.5, ans=0.125 2024-09-15 02:53:42,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=222583.83333333334, ans=10.0 2024-09-15 02:53:50,058 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:54:01,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=222612.16666666666, ans=0.125 2024-09-15 02:54:03,253 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.071e+02 2.261e+02 2.540e+02 4.776e+02, threshold=4.521e+02, percent-clipped=1.0 2024-09-15 02:54:09,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=222640.5, ans=0.02 2024-09-15 02:54:19,955 INFO [train.py:1198] (0/2) Epoch 13, batch 1900, loss[loss=0.2685, ctc_loss=0.1855, cr_loss=0.4148, over 20035.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1764, cr_loss=0.3913, over 4105886.17 frames. ], batch size: 80, lr: 6.52e-03, grad_scale: 16.0 2024-09-15 02:54:33,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=222697.16666666666, ans=0.1 2024-09-15 02:54:49,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=222697.16666666666, ans=0.125 2024-09-15 02:54:53,024 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:55:09,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=222753.83333333334, ans=0.125 2024-09-15 02:55:17,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2024-09-15 02:55:20,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.84 vs. limit=12.0 2024-09-15 02:55:36,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=222782.16666666666, ans=0.125 2024-09-15 02:55:40,347 INFO [train.py:1198] (0/2) Epoch 13, batch 1950, loss[loss=0.2827, ctc_loss=0.194, cr_loss=0.4435, over 20064.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1771, cr_loss=0.3923, over 4093219.38 frames. ], batch size: 80, lr: 6.52e-03, grad_scale: 16.0 2024-09-15 02:56:27,705 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=22.5 2024-09-15 02:56:37,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=222895.5, ans=0.025 2024-09-15 02:56:38,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.687e+02 2.128e+02 2.318e+02 2.625e+02 5.150e+02, threshold=4.636e+02, percent-clipped=1.0 2024-09-15 02:56:55,081 INFO [train.py:1198] (0/2) Epoch 13, batch 2000, loss[loss=0.2742, ctc_loss=0.1961, cr_loss=0.3906, over 20176.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1776, cr_loss=0.3924, over 4082859.20 frames. ], batch size: 80, lr: 6.52e-03, grad_scale: 32.0 2024-09-15 02:56:59,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=222952.16666666666, ans=0.0 2024-09-15 02:57:11,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=222980.5, ans=0.125 2024-09-15 02:57:31,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223008.83333333334, ans=0.1 2024-09-15 02:57:31,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=223008.83333333334, ans=0.025 2024-09-15 02:57:53,702 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 02:58:09,923 INFO [train.py:1198] (0/2) Epoch 13, batch 2050, loss[loss=0.2459, ctc_loss=0.1668, cr_loss=0.3953, over 21050.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1762, cr_loss=0.3906, over 4094462.74 frames. ], batch size: 56, lr: 6.52e-03, grad_scale: 32.0 2024-09-15 02:58:17,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=223093.83333333334, ans=0.0 2024-09-15 02:58:44,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=223150.5, ans=0.0 2024-09-15 02:59:08,355 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.051e+02 2.212e+02 2.362e+02 4.328e+02, threshold=4.424e+02, percent-clipped=0.0 2024-09-15 02:59:24,848 INFO [train.py:1198] (0/2) Epoch 13, batch 2100, loss[loss=0.3041, ctc_loss=0.216, cr_loss=0.4408, over 18272.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1762, cr_loss=0.3909, over 4089542.89 frames. ], batch size: 109, lr: 6.52e-03, grad_scale: 32.0 2024-09-15 02:59:50,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223263.83333333334, ans=0.1 2024-09-15 03:00:09,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=22.5 2024-09-15 03:00:31,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=223348.83333333334, ans=0.2 2024-09-15 03:00:32,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=223348.83333333334, ans=0.0 2024-09-15 03:00:34,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=223348.83333333334, ans=0.2 2024-09-15 03:00:42,944 INFO [train.py:1198] (0/2) Epoch 13, batch 2150, loss[loss=0.2058, ctc_loss=0.1376, cr_loss=0.341, over 19956.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1755, cr_loss=0.3899, over 4098837.59 frames. ], batch size: 44, lr: 6.51e-03, grad_scale: 32.0 2024-09-15 03:01:43,990 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+02 2.118e+02 2.303e+02 2.599e+02 3.957e+02, threshold=4.607e+02, percent-clipped=0.0 2024-09-15 03:01:50,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223490.5, ans=0.1 2024-09-15 03:01:56,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=223490.5, ans=10.0 2024-09-15 03:02:00,590 INFO [train.py:1198] (0/2) Epoch 13, batch 2200, loss[loss=0.2155, ctc_loss=0.1478, cr_loss=0.3386, over 20964.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1762, cr_loss=0.3905, over 4096713.88 frames. ], batch size: 50, lr: 6.51e-03, grad_scale: 32.0 2024-09-15 03:02:09,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223518.83333333334, ans=0.1 2024-09-15 03:02:11,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-09-15 03:02:42,251 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=15.0 2024-09-15 03:02:46,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=223603.83333333334, ans=0.0 2024-09-15 03:02:49,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=223603.83333333334, ans=0.0 2024-09-15 03:03:16,140 INFO [train.py:1198] (0/2) Epoch 13, batch 2250, loss[loss=0.2676, ctc_loss=0.1864, cr_loss=0.406, over 20983.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1757, cr_loss=0.3896, over 4098418.20 frames. ], batch size: 58, lr: 6.51e-03, grad_scale: 32.0 2024-09-15 03:03:16,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=223660.5, ans=0.125 2024-09-15 03:04:14,711 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 2.093e+02 2.283e+02 2.553e+02 6.758e+02, threshold=4.565e+02, percent-clipped=1.0 2024-09-15 03:04:18,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=223773.83333333334, ans=0.125 2024-09-15 03:04:27,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=223773.83333333334, ans=0.025 2024-09-15 03:04:31,380 INFO [train.py:1198] (0/2) Epoch 13, batch 2300, loss[loss=0.2469, ctc_loss=0.1727, cr_loss=0.3713, over 21076.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1751, cr_loss=0.3889, over 4103813.28 frames. ], batch size: 59, lr: 6.51e-03, grad_scale: 32.0 2024-09-15 03:04:33,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=223802.16666666666, ans=0.5 2024-09-15 03:04:54,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=223830.5, ans=0.04949747468305833 2024-09-15 03:05:25,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-09-15 03:05:29,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=223887.16666666666, ans=0.0 2024-09-15 03:05:47,215 INFO [train.py:1198] (0/2) Epoch 13, batch 2350, loss[loss=0.2901, ctc_loss=0.208, cr_loss=0.4102, over 19410.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1754, cr_loss=0.3897, over 4109624.04 frames. ], batch size: 90, lr: 6.51e-03, grad_scale: 32.0 2024-09-15 03:06:50,764 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.070e+02 2.265e+02 2.565e+02 3.309e+02, threshold=4.531e+02, percent-clipped=0.0 2024-09-15 03:07:07,012 INFO [train.py:1198] (0/2) Epoch 13, batch 2400, loss[loss=0.2924, ctc_loss=0.2058, cr_loss=0.4331, over 20253.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1765, cr_loss=0.3914, over 4093660.64 frames. ], batch size: 74, lr: 6.50e-03, grad_scale: 32.0 2024-09-15 03:07:32,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=224113.83333333334, ans=0.125 2024-09-15 03:07:33,822 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.91 vs. limit=15.0 2024-09-15 03:07:52,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=224170.5, ans=0.125 2024-09-15 03:08:10,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=224198.83333333334, ans=0.0 2024-09-15 03:08:16,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=224198.83333333334, ans=0.2 2024-09-15 03:08:22,773 INFO [train.py:1198] (0/2) Epoch 13, batch 2450, loss[loss=0.2305, ctc_loss=0.1576, cr_loss=0.3647, over 20983.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.177, cr_loss=0.3905, over 4072268.12 frames. ], batch size: 52, lr: 6.50e-03, grad_scale: 32.0 2024-09-15 03:08:55,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2024-09-15 03:09:06,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=224312.16666666666, ans=0.0 2024-09-15 03:09:20,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 1.968e+02 2.112e+02 2.233e+02 3.264e+02, threshold=4.224e+02, percent-clipped=0.0 2024-09-15 03:09:33,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=224340.5, ans=0.125 2024-09-15 03:09:37,173 INFO [train.py:1198] (0/2) Epoch 13, batch 2500, loss[loss=0.2734, ctc_loss=0.1909, cr_loss=0.4126, over 20642.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1772, cr_loss=0.3907, over 4062499.30 frames. ], batch size: 66, lr: 6.50e-03, grad_scale: 32.0 2024-09-15 03:09:59,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=224397.16666666666, ans=0.125 2024-09-15 03:10:07,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=224425.5, ans=0.1 2024-09-15 03:10:15,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-15 03:10:37,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=224482.16666666666, ans=0.0 2024-09-15 03:10:37,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=224482.16666666666, ans=0.125 2024-09-15 03:10:51,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=224510.5, ans=0.125 2024-09-15 03:10:52,322 INFO [train.py:1198] (0/2) Epoch 13, batch 2550, loss[loss=0.2396, ctc_loss=0.1637, cr_loss=0.3794, over 20987.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1772, cr_loss=0.3917, over 4079255.95 frames. ], batch size: 55, lr: 6.50e-03, grad_scale: 32.0 2024-09-15 03:11:22,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=224567.16666666666, ans=0.125 2024-09-15 03:11:41,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-15 03:11:49,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=224595.5, ans=0.1 2024-09-15 03:11:49,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=224595.5, ans=0.1 2024-09-15 03:11:52,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=224595.5, ans=0.025 2024-09-15 03:11:53,648 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.084e+02 2.228e+02 2.537e+02 4.090e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-15 03:11:58,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=224623.83333333334, ans=0.1 2024-09-15 03:12:10,206 INFO [train.py:1198] (0/2) Epoch 13, batch 2600, loss[loss=0.2421, ctc_loss=0.1664, cr_loss=0.3785, over 21012.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1774, cr_loss=0.3924, over 4075285.19 frames. ], batch size: 63, lr: 6.50e-03, grad_scale: 32.0 2024-09-15 03:12:24,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=224652.16666666666, ans=0.2 2024-09-15 03:12:44,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=224708.83333333334, ans=0.125 2024-09-15 03:12:54,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.95 vs. limit=15.0 2024-09-15 03:13:08,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=224737.16666666666, ans=0.1 2024-09-15 03:13:11,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=224765.5, ans=0.125 2024-09-15 03:13:27,651 INFO [train.py:1198] (0/2) Epoch 13, batch 2650, loss[loss=0.265, ctc_loss=0.1882, cr_loss=0.3839, over 20888.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.1778, cr_loss=0.3933, over 4091932.41 frames. ], batch size: 57, lr: 6.49e-03, grad_scale: 32.0 2024-09-15 03:13:36,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=224793.83333333334, ans=0.125 2024-09-15 03:13:41,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=224822.16666666666, ans=0.125 2024-09-15 03:14:08,904 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 03:14:18,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=224878.83333333334, ans=0.0 2024-09-15 03:14:26,944 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.696e+02 2.046e+02 2.182e+02 2.356e+02 3.113e+02, threshold=4.364e+02, percent-clipped=0.0 2024-09-15 03:14:43,291 INFO [train.py:1198] (0/2) Epoch 13, batch 2700, loss[loss=0.2373, ctc_loss=0.161, cr_loss=0.3814, over 20927.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1766, cr_loss=0.3918, over 4097120.99 frames. ], batch size: 60, lr: 6.49e-03, grad_scale: 32.0 2024-09-15 03:15:37,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=225020.5, ans=0.0 2024-09-15 03:15:50,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=225048.83333333334, ans=0.125 2024-09-15 03:15:51,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=225048.83333333334, ans=0.2 2024-09-15 03:15:58,641 INFO [train.py:1198] (0/2) Epoch 13, batch 2750, loss[loss=0.2521, ctc_loss=0.1773, cr_loss=0.3744, over 20282.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1768, cr_loss=0.3927, over 4095555.36 frames. ], batch size: 74, lr: 6.49e-03, grad_scale: 32.0 2024-09-15 03:16:01,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=225077.16666666666, ans=0.125 2024-09-15 03:16:51,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=225162.16666666666, ans=0.125 2024-09-15 03:16:56,814 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.702e+02 2.036e+02 2.256e+02 2.486e+02 3.603e+02, threshold=4.512e+02, percent-clipped=0.0 2024-09-15 03:17:16,037 INFO [train.py:1198] (0/2) Epoch 13, batch 2800, loss[loss=0.2387, ctc_loss=0.1671, cr_loss=0.3579, over 21001.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.177, cr_loss=0.3931, over 4099571.54 frames. ], batch size: 52, lr: 6.49e-03, grad_scale: 32.0 2024-09-15 03:17:22,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225218.83333333334, ans=0.1 2024-09-15 03:17:40,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=225247.16666666666, ans=0.125 2024-09-15 03:17:53,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=225275.5, ans=0.0 2024-09-15 03:18:07,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=225303.83333333334, ans=0.125 2024-09-15 03:18:07,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=225303.83333333334, ans=0.035 2024-09-15 03:18:11,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=225303.83333333334, ans=0.0 2024-09-15 03:18:34,285 INFO [train.py:1198] (0/2) Epoch 13, batch 2850, loss[loss=0.255, ctc_loss=0.1796, cr_loss=0.3771, over 20647.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1773, cr_loss=0.393, over 4093686.47 frames. ], batch size: 68, lr: 6.49e-03, grad_scale: 32.0 2024-09-15 03:18:42,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=225360.5, ans=0.125 2024-09-15 03:18:51,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=22.5 2024-09-15 03:19:01,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=225388.83333333334, ans=0.025 2024-09-15 03:19:07,862 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2024-09-15 03:19:32,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.707e+02 2.056e+02 2.193e+02 2.407e+02 3.132e+02, threshold=4.386e+02, percent-clipped=0.0 2024-09-15 03:19:33,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=225473.83333333334, ans=0.2 2024-09-15 03:19:49,189 INFO [train.py:1198] (0/2) Epoch 13, batch 2900, loss[loss=0.2451, ctc_loss=0.1678, cr_loss=0.3869, over 20929.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.177, cr_loss=0.3921, over 4090641.55 frames. ], batch size: 60, lr: 6.48e-03, grad_scale: 32.0 2024-09-15 03:20:28,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=225558.83333333334, ans=0.0 2024-09-15 03:20:45,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=225587.16666666666, ans=0.0 2024-09-15 03:20:46,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225587.16666666666, ans=0.1 2024-09-15 03:21:04,641 INFO [train.py:1198] (0/2) Epoch 13, batch 2950, loss[loss=0.2653, ctc_loss=0.1844, cr_loss=0.4044, over 21006.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1774, cr_loss=0.3925, over 4098145.64 frames. ], batch size: 63, lr: 6.48e-03, grad_scale: 32.0 2024-09-15 03:21:07,363 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2024-09-15 03:21:16,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=225643.83333333334, ans=0.125 2024-09-15 03:21:16,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=225643.83333333334, ans=0.125 2024-09-15 03:22:03,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.072e+02 2.214e+02 2.416e+02 4.093e+02, threshold=4.428e+02, percent-clipped=0.0 2024-09-15 03:22:20,051 INFO [train.py:1198] (0/2) Epoch 13, batch 3000, loss[loss=0.2311, ctc_loss=0.1613, cr_loss=0.3494, over 21080.00 frames. ], tot_loss[loss=0.2566, ctc_loss=0.178, cr_loss=0.393, over 4099856.33 frames. ], batch size: 53, lr: 6.48e-03, grad_scale: 32.0 2024-09-15 03:22:20,052 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 03:22:38,725 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6756, 3.2803, 3.4727, 3.3733], device='cuda:0') 2024-09-15 03:22:43,006 INFO [train.py:1230] (0/2) Epoch 13, validation: loss=0.0496, ctc_loss=0.0496, cr_loss=9.603e-15, over 944034.00 frames. 2024-09-15 03:22:43,007 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 03:22:54,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=22.5 2024-09-15 03:23:16,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=225842.16666666666, ans=0.2 2024-09-15 03:23:37,684 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-15 03:23:44,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=225898.83333333334, ans=0.025 2024-09-15 03:23:55,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=225898.83333333334, ans=0.125 2024-09-15 03:24:01,453 INFO [train.py:1198] (0/2) Epoch 13, batch 3050, loss[loss=0.2276, ctc_loss=0.1541, cr_loss=0.3671, over 21047.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1768, cr_loss=0.3911, over 4100807.06 frames. ], batch size: 56, lr: 6.48e-03, grad_scale: 32.0 2024-09-15 03:24:10,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=225927.16666666666, ans=0.07 2024-09-15 03:24:27,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=225955.5, ans=0.0 2024-09-15 03:24:35,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=225983.83333333334, ans=0.09899494936611666 2024-09-15 03:24:48,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=226012.16666666666, ans=0.0 2024-09-15 03:24:53,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=226012.16666666666, ans=0.125 2024-09-15 03:25:00,747 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.130e+02 2.273e+02 2.517e+02 7.134e+02, threshold=4.545e+02, percent-clipped=1.0 2024-09-15 03:25:17,353 INFO [train.py:1198] (0/2) Epoch 13, batch 3100, loss[loss=0.2942, ctc_loss=0.208, cr_loss=0.4308, over 20324.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1765, cr_loss=0.3905, over 4100987.72 frames. ], batch size: 74, lr: 6.48e-03, grad_scale: 32.0 2024-09-15 03:25:25,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=22.5 2024-09-15 03:26:08,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=226153.83333333334, ans=0.1 2024-09-15 03:26:16,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=226182.16666666666, ans=0.125 2024-09-15 03:26:17,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=226182.16666666666, ans=0.025 2024-09-15 03:26:25,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-15 03:26:32,575 INFO [train.py:1198] (0/2) Epoch 13, batch 3150, loss[loss=0.245, ctc_loss=0.1691, cr_loss=0.3796, over 20972.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1757, cr_loss=0.3896, over 4104497.05 frames. ], batch size: 58, lr: 6.47e-03, grad_scale: 32.0 2024-09-15 03:27:16,064 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.28 vs. limit=10.0 2024-09-15 03:27:24,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=226295.5, ans=0.125 2024-09-15 03:27:31,727 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.107e+02 2.272e+02 2.467e+02 3.174e+02, threshold=4.545e+02, percent-clipped=0.0 2024-09-15 03:27:37,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=226323.83333333334, ans=0.125 2024-09-15 03:27:48,141 INFO [train.py:1198] (0/2) Epoch 13, batch 3200, loss[loss=0.2909, ctc_loss=0.202, cr_loss=0.4445, over 19538.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1754, cr_loss=0.3889, over 4103463.51 frames. ], batch size: 90, lr: 6.47e-03, grad_scale: 32.0 2024-09-15 03:27:56,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=226352.16666666666, ans=0.2 2024-09-15 03:28:00,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=226352.16666666666, ans=0.2 2024-09-15 03:28:18,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=226408.83333333334, ans=0.125 2024-09-15 03:28:35,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=226437.16666666666, ans=0.1 2024-09-15 03:28:44,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-09-15 03:28:48,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=226437.16666666666, ans=0.125 2024-09-15 03:29:09,376 INFO [train.py:1198] (0/2) Epoch 13, batch 3250, loss[loss=0.2464, ctc_loss=0.1677, cr_loss=0.3934, over 20987.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1755, cr_loss=0.3891, over 4094737.30 frames. ], batch size: 58, lr: 6.47e-03, grad_scale: 32.0 2024-09-15 03:29:17,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=226493.83333333334, ans=0.125 2024-09-15 03:30:08,196 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.068e+02 2.228e+02 2.357e+02 4.229e+02, threshold=4.456e+02, percent-clipped=0.0 2024-09-15 03:30:08,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=226607.16666666666, ans=0.2 2024-09-15 03:30:21,288 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2024-09-15 03:30:25,070 INFO [train.py:1198] (0/2) Epoch 13, batch 3300, loss[loss=0.2237, ctc_loss=0.1516, cr_loss=0.3607, over 20956.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1753, cr_loss=0.3891, over 4095496.02 frames. ], batch size: 50, lr: 6.47e-03, grad_scale: 32.0 2024-09-15 03:30:40,454 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-80000.pt 2024-09-15 03:30:55,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=226692.16666666666, ans=0.0 2024-09-15 03:31:22,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=226720.5, ans=0.035 2024-09-15 03:31:38,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=226748.83333333334, ans=22.5 2024-09-15 03:31:42,155 INFO [train.py:1198] (0/2) Epoch 13, batch 3350, loss[loss=0.2268, ctc_loss=0.1528, cr_loss=0.3704, over 19878.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1748, cr_loss=0.3884, over 4096123.37 frames. ], batch size: 44, lr: 6.46e-03, grad_scale: 32.0 2024-09-15 03:31:49,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=226777.16666666666, ans=0.025 2024-09-15 03:31:57,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=22.5 2024-09-15 03:32:27,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=226862.16666666666, ans=0.125 2024-09-15 03:32:40,487 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.020e+02 2.116e+02 2.354e+02 3.758e+02, threshold=4.231e+02, percent-clipped=0.0 2024-09-15 03:32:56,779 INFO [train.py:1198] (0/2) Epoch 13, batch 3400, loss[loss=0.2617, ctc_loss=0.1813, cr_loss=0.4018, over 19991.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1753, cr_loss=0.3891, over 4096431.20 frames. ], batch size: 80, lr: 6.46e-03, grad_scale: 32.0 2024-09-15 03:33:07,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=226918.83333333334, ans=0.125 2024-09-15 03:33:43,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=227003.83333333334, ans=0.125 2024-09-15 03:33:43,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=227003.83333333334, ans=0.0 2024-09-15 03:34:00,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.38 vs. limit=22.5 2024-09-15 03:34:15,207 INFO [train.py:1198] (0/2) Epoch 13, batch 3450, loss[loss=0.2809, ctc_loss=0.1998, cr_loss=0.4055, over 19311.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1752, cr_loss=0.3884, over 4093923.24 frames. ], batch size: 90, lr: 6.46e-03, grad_scale: 32.0 2024-09-15 03:34:42,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=227088.83333333334, ans=15.0 2024-09-15 03:34:51,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227117.16666666666, ans=0.1 2024-09-15 03:35:15,780 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.054e+02 2.143e+02 2.366e+02 3.326e+02, threshold=4.286e+02, percent-clipped=0.0 2024-09-15 03:35:25,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=227173.83333333334, ans=0.05 2024-09-15 03:35:25,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=227173.83333333334, ans=0.125 2024-09-15 03:35:32,300 INFO [train.py:1198] (0/2) Epoch 13, batch 3500, loss[loss=0.2618, ctc_loss=0.1805, cr_loss=0.4066, over 21069.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1751, cr_loss=0.3888, over 4093746.84 frames. ], batch size: 59, lr: 6.46e-03, grad_scale: 32.0 2024-09-15 03:35:40,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=227202.16666666666, ans=0.2 2024-09-15 03:36:24,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=227287.16666666666, ans=0.125 2024-09-15 03:36:31,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=227315.5, ans=0.0 2024-09-15 03:36:48,056 INFO [train.py:1198] (0/2) Epoch 13, batch 3550, loss[loss=0.2411, ctc_loss=0.1641, cr_loss=0.3848, over 20892.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1754, cr_loss=0.3896, over 4096865.41 frames. ], batch size: 54, lr: 6.46e-03, grad_scale: 32.0 2024-09-15 03:37:27,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227400.5, ans=0.1 2024-09-15 03:37:47,349 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.715e+02 1.995e+02 2.164e+02 2.282e+02 3.713e+02, threshold=4.329e+02, percent-clipped=0.0 2024-09-15 03:38:03,620 INFO [train.py:1198] (0/2) Epoch 13, batch 3600, loss[loss=0.2732, ctc_loss=0.1906, cr_loss=0.4126, over 20046.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1754, cr_loss=0.3892, over 4084561.66 frames. ], batch size: 80, lr: 6.45e-03, grad_scale: 32.0 2024-09-15 03:38:23,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=227513.83333333334, ans=0.0 2024-09-15 03:38:23,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=227513.83333333334, ans=0.125 2024-09-15 03:39:16,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=227627.16666666666, ans=0.025 2024-09-15 03:39:17,838 INFO [train.py:1198] (0/2) Epoch 13, batch 3650, loss[loss=0.2903, ctc_loss=0.2034, cr_loss=0.4347, over 20670.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1765, cr_loss=0.391, over 4082288.47 frames. ], batch size: 68, lr: 6.45e-03, grad_scale: 32.0 2024-09-15 03:39:29,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=227627.16666666666, ans=22.5 2024-09-15 03:39:59,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=227683.83333333334, ans=0.0 2024-09-15 03:40:18,850 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.048e+02 2.204e+02 2.376e+02 6.268e+02, threshold=4.409e+02, percent-clipped=1.0 2024-09-15 03:40:28,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=227740.5, ans=0.125 2024-09-15 03:40:37,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=227768.83333333334, ans=0.2 2024-09-15 03:40:38,053 INFO [train.py:1198] (0/2) Epoch 13, batch 3700, loss[loss=0.2361, ctc_loss=0.1635, cr_loss=0.3628, over 20885.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1766, cr_loss=0.3916, over 4083610.85 frames. ], batch size: 54, lr: 6.45e-03, grad_scale: 32.0 2024-09-15 03:40:54,461 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.35 vs. limit=15.0 2024-09-15 03:41:25,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227853.83333333334, ans=0.1 2024-09-15 03:41:40,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-09-15 03:41:53,024 INFO [train.py:1198] (0/2) Epoch 13, batch 3750, loss[loss=0.22, ctc_loss=0.1521, cr_loss=0.3395, over 20964.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1759, cr_loss=0.3904, over 4084206.71 frames. ], batch size: 50, lr: 6.45e-03, grad_scale: 64.0 2024-09-15 03:41:56,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=227910.5, ans=0.125 2024-09-15 03:42:15,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=227938.83333333334, ans=0.125 2024-09-15 03:42:51,727 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.063e+02 2.193e+02 2.473e+02 8.023e+02, threshold=4.386e+02, percent-clipped=1.0 2024-09-15 03:43:08,143 INFO [train.py:1198] (0/2) Epoch 13, batch 3800, loss[loss=0.2777, ctc_loss=0.1953, cr_loss=0.4123, over 20958.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1758, cr_loss=0.3911, over 4093793.05 frames. ], batch size: 60, lr: 6.45e-03, grad_scale: 64.0 2024-09-15 03:43:26,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228080.5, ans=0.1 2024-09-15 03:43:50,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=228108.83333333334, ans=0.125 2024-09-15 03:44:02,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=228137.16666666666, ans=0.015 2024-09-15 03:44:03,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=228137.16666666666, ans=0.2 2024-09-15 03:44:03,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=228137.16666666666, ans=0.0 2024-09-15 03:44:23,056 INFO [train.py:1198] (0/2) Epoch 13, batch 3850, loss[loss=0.2812, ctc_loss=0.1974, cr_loss=0.419, over 20847.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1763, cr_loss=0.391, over 4078021.32 frames. ], batch size: 65, lr: 6.45e-03, grad_scale: 32.0 2024-09-15 03:44:38,423 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 03:44:43,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=228222.16666666666, ans=0.0 2024-09-15 03:45:26,160 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.056e+02 2.219e+02 2.407e+02 4.578e+02, threshold=4.439e+02, percent-clipped=1.0 2024-09-15 03:45:27,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=228307.16666666666, ans=0.0 2024-09-15 03:45:41,257 INFO [train.py:1198] (0/2) Epoch 13, batch 3900, loss[loss=0.2324, ctc_loss=0.1612, cr_loss=0.3557, over 20971.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1759, cr_loss=0.391, over 4087423.56 frames. ], batch size: 49, lr: 6.44e-03, grad_scale: 32.0 2024-09-15 03:46:28,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=228420.5, ans=0.125 2024-09-15 03:46:42,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228448.83333333334, ans=0.1 2024-09-15 03:46:42,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=228448.83333333334, ans=0.125 2024-09-15 03:46:50,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=228448.83333333334, ans=0.125 2024-09-15 03:46:58,769 INFO [train.py:1198] (0/2) Epoch 13, batch 3950, loss[loss=0.2536, ctc_loss=0.178, cr_loss=0.3784, over 20967.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1761, cr_loss=0.3915, over 4096025.32 frames. ], batch size: 52, lr: 6.44e-03, grad_scale: 32.0 2024-09-15 03:47:03,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228477.16666666666, ans=0.1 2024-09-15 03:47:06,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=228477.16666666666, ans=0.0 2024-09-15 03:47:06,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=228477.16666666666, ans=0.025 2024-09-15 03:47:33,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=228533.83333333334, ans=0.125 2024-09-15 03:47:35,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-15 03:47:58,005 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.016e+02 2.161e+02 2.409e+02 5.821e+02, threshold=4.323e+02, percent-clipped=1.0 2024-09-15 03:48:13,144 INFO [train.py:1198] (0/2) Epoch 13, batch 4000, loss[loss=0.2668, ctc_loss=0.1809, cr_loss=0.4292, over 20644.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.176, cr_loss=0.3915, over 4101912.02 frames. ], batch size: 66, lr: 6.44e-03, grad_scale: 32.0 2024-09-15 03:48:19,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=228618.83333333334, ans=0.125 2024-09-15 03:48:39,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=228647.16666666666, ans=0.0 2024-09-15 03:48:42,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=228675.5, ans=0.125 2024-09-15 03:48:55,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228675.5, ans=0.1 2024-09-15 03:49:03,880 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-15 03:49:28,335 INFO [train.py:1198] (0/2) Epoch 13, batch 4050, loss[loss=0.2784, ctc_loss=0.1925, cr_loss=0.4297, over 20973.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1755, cr_loss=0.3907, over 4107365.36 frames. ], batch size: 58, lr: 6.44e-03, grad_scale: 32.0 2024-09-15 03:49:37,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=228760.5, ans=0.125 2024-09-15 03:49:49,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228788.83333333334, ans=0.1 2024-09-15 03:50:03,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=228817.16666666666, ans=10.0 2024-09-15 03:50:03,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=228817.16666666666, ans=0.0 2024-09-15 03:50:16,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=228845.5, ans=0.125 2024-09-15 03:50:28,434 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.700e+02 2.064e+02 2.269e+02 2.553e+02 4.076e+02, threshold=4.538e+02, percent-clipped=0.0 2024-09-15 03:50:43,597 INFO [train.py:1198] (0/2) Epoch 13, batch 4100, loss[loss=0.2478, ctc_loss=0.1716, cr_loss=0.381, over 20778.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.175, cr_loss=0.3905, over 4117021.05 frames. ], batch size: 56, lr: 6.44e-03, grad_scale: 32.0 2024-09-15 03:50:53,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.49 vs. limit=10.0 2024-09-15 03:51:06,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=228930.5, ans=0.0 2024-09-15 03:51:07,528 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 03:51:18,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=228958.83333333334, ans=0.125 2024-09-15 03:51:32,959 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 03:52:02,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=229043.83333333334, ans=0.0 2024-09-15 03:52:03,889 INFO [train.py:1198] (0/2) Epoch 13, batch 4150, loss[loss=0.2412, ctc_loss=0.1682, cr_loss=0.3647, over 21036.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1758, cr_loss=0.3918, over 4120006.78 frames. ], batch size: 62, lr: 6.43e-03, grad_scale: 32.0 2024-09-15 03:52:52,585 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 03:52:58,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=229128.83333333334, ans=0.125 2024-09-15 03:53:03,885 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 1.981e+02 2.107e+02 2.340e+02 3.732e+02, threshold=4.214e+02, percent-clipped=0.0 2024-09-15 03:53:11,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=229157.16666666666, ans=0.125 2024-09-15 03:53:19,033 INFO [train.py:1198] (0/2) Epoch 13, batch 4200, loss[loss=0.2356, ctc_loss=0.1638, cr_loss=0.3588, over 21075.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1744, cr_loss=0.389, over 4112761.71 frames. ], batch size: 53, lr: 6.43e-03, grad_scale: 32.0 2024-09-15 03:53:36,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2024-09-15 03:54:17,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=229298.83333333334, ans=0.125 2024-09-15 03:54:18,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2024-09-15 03:54:23,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=229298.83333333334, ans=0.07 2024-09-15 03:54:34,192 INFO [train.py:1198] (0/2) Epoch 13, batch 4250, loss[loss=0.2611, ctc_loss=0.1801, cr_loss=0.4049, over 20908.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1748, cr_loss=0.3903, over 4118337.67 frames. ], batch size: 54, lr: 6.43e-03, grad_scale: 16.0 2024-09-15 03:54:42,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=229327.16666666666, ans=0.125 2024-09-15 03:55:15,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=229383.83333333334, ans=0.125 2024-09-15 03:55:36,918 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.023e+02 2.160e+02 2.370e+02 3.150e+02, threshold=4.320e+02, percent-clipped=0.0 2024-09-15 03:55:50,490 INFO [train.py:1198] (0/2) Epoch 13, batch 4300, loss[loss=0.2299, ctc_loss=0.1566, cr_loss=0.3662, over 19897.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1754, cr_loss=0.3912, over 4114662.95 frames. ], batch size: 44, lr: 6.43e-03, grad_scale: 16.0 2024-09-15 03:55:58,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2024-09-15 03:56:15,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=229497.16666666666, ans=0.95 2024-09-15 03:56:21,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=229525.5, ans=0.125 2024-09-15 03:56:25,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=229525.5, ans=0.125 2024-09-15 03:56:25,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=229525.5, ans=0.125 2024-09-15 03:56:36,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=229553.83333333334, ans=0.125 2024-09-15 03:57:01,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=229582.16666666666, ans=0.125 2024-09-15 03:57:06,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-15 03:57:08,880 INFO [train.py:1198] (0/2) Epoch 13, batch 4350, loss[loss=0.2262, ctc_loss=0.1533, cr_loss=0.3647, over 20971.00 frames. ], tot_loss[loss=0.2538, ctc_loss=0.1754, cr_loss=0.3917, over 4121283.95 frames. ], batch size: 49, lr: 6.43e-03, grad_scale: 16.0 2024-09-15 03:57:09,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=15.0 2024-09-15 03:57:18,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=229610.5, ans=0.125 2024-09-15 03:57:42,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=229667.16666666666, ans=0.125 2024-09-15 03:57:58,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229695.5, ans=0.1 2024-09-15 03:58:00,078 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 03:58:02,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=229695.5, ans=0.125 2024-09-15 03:58:12,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=229723.83333333334, ans=0.04949747468305833 2024-09-15 03:58:13,179 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.054e+02 2.239e+02 2.432e+02 3.142e+02, threshold=4.477e+02, percent-clipped=0.0 2024-09-15 03:58:26,469 INFO [train.py:1198] (0/2) Epoch 13, batch 4400, loss[loss=0.2308, ctc_loss=0.1574, cr_loss=0.3669, over 20829.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1751, cr_loss=0.3909, over 4113098.62 frames. ], batch size: 59, lr: 6.42e-03, grad_scale: 32.0 2024-09-15 03:58:32,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=229752.16666666666, ans=0.125 2024-09-15 03:59:19,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=229837.16666666666, ans=0.1 2024-09-15 03:59:28,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=229865.5, ans=0.05 2024-09-15 03:59:41,610 INFO [train.py:1198] (0/2) Epoch 13, batch 4450, loss[loss=0.2674, ctc_loss=0.1855, cr_loss=0.4091, over 20852.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1759, cr_loss=0.3917, over 4111950.14 frames. ], batch size: 65, lr: 6.42e-03, grad_scale: 32.0 2024-09-15 04:00:15,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.63 vs. limit=22.5 2024-09-15 04:00:32,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=229978.83333333334, ans=0.0 2024-09-15 04:00:43,867 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.099e+02 2.285e+02 2.613e+02 3.993e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-15 04:00:45,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=12.0 2024-09-15 04:00:53,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=15.0 2024-09-15 04:00:57,435 INFO [train.py:1198] (0/2) Epoch 13, batch 4500, loss[loss=0.2728, ctc_loss=0.1888, cr_loss=0.4196, over 19432.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1753, cr_loss=0.3907, over 4110396.42 frames. ], batch size: 90, lr: 6.42e-03, grad_scale: 32.0 2024-09-15 04:01:17,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=230063.83333333334, ans=0.0 2024-09-15 04:01:26,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230092.16666666666, ans=0.1 2024-09-15 04:01:47,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2024-09-15 04:02:12,521 INFO [train.py:1198] (0/2) Epoch 13, batch 4550, loss[loss=0.1992, ctc_loss=0.1327, cr_loss=0.3322, over 20286.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1755, cr_loss=0.3906, over 4091541.80 frames. ], batch size: 45, lr: 6.42e-03, grad_scale: 32.0 2024-09-15 04:02:23,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-09-15 04:02:24,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=230177.16666666666, ans=0.0 2024-09-15 04:02:40,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-09-15 04:02:59,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=22.5 2024-09-15 04:03:05,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.62 vs. limit=15.0 2024-09-15 04:03:16,968 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.711e+02 2.085e+02 2.275e+02 2.544e+02 3.349e+02, threshold=4.549e+02, percent-clipped=0.0 2024-09-15 04:03:20,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=230290.5, ans=0.0 2024-09-15 04:03:29,821 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2024-09-15 04:03:33,381 INFO [train.py:1198] (0/2) Epoch 13, batch 4600, loss[loss=0.2646, ctc_loss=0.1833, cr_loss=0.4062, over 20931.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1768, cr_loss=0.3916, over 4084235.58 frames. ], batch size: 60, lr: 6.42e-03, grad_scale: 32.0 2024-09-15 04:03:57,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=230347.16666666666, ans=0.2 2024-09-15 04:04:07,097 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.67 vs. limit=10.0 2024-09-15 04:04:09,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=230375.5, ans=0.125 2024-09-15 04:04:48,250 INFO [train.py:1198] (0/2) Epoch 13, batch 4650, loss[loss=0.3146, ctc_loss=0.2217, cr_loss=0.4641, over 18419.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1759, cr_loss=0.3906, over 4092356.07 frames. ], batch size: 108, lr: 6.41e-03, grad_scale: 32.0 2024-09-15 04:04:55,089 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-15 04:05:23,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230517.16666666666, ans=0.1 2024-09-15 04:05:39,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=230545.5, ans=0.125 2024-09-15 04:05:41,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=230545.5, ans=0.0 2024-09-15 04:05:50,266 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.016e+02 2.167e+02 2.350e+02 3.389e+02, threshold=4.334e+02, percent-clipped=0.0 2024-09-15 04:06:03,805 INFO [train.py:1198] (0/2) Epoch 13, batch 4700, loss[loss=0.273, ctc_loss=0.1911, cr_loss=0.4095, over 20299.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1756, cr_loss=0.3905, over 4092709.36 frames. ], batch size: 74, lr: 6.41e-03, grad_scale: 32.0 2024-09-15 04:06:16,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2024-09-15 04:06:37,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2024-09-15 04:06:48,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=230687.16666666666, ans=10.0 2024-09-15 04:07:05,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=230715.5, ans=0.0 2024-09-15 04:07:10,743 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-15 04:07:14,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=230715.5, ans=0.0 2024-09-15 04:07:18,812 INFO [train.py:1198] (0/2) Epoch 13, batch 4750, loss[loss=0.277, ctc_loss=0.193, cr_loss=0.4199, over 21055.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1759, cr_loss=0.3911, over 4103052.25 frames. ], batch size: 62, lr: 6.41e-03, grad_scale: 32.0 2024-09-15 04:07:34,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-09-15 04:07:56,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=230800.5, ans=0.125 2024-09-15 04:08:12,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=230828.83333333334, ans=0.035 2024-09-15 04:08:22,793 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.050e+02 2.167e+02 2.373e+02 4.077e+02, threshold=4.334e+02, percent-clipped=0.0 2024-09-15 04:08:36,402 INFO [train.py:1198] (0/2) Epoch 13, batch 4800, loss[loss=0.3254, ctc_loss=0.2384, cr_loss=0.435, over 13629.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1765, cr_loss=0.3917, over 4092219.97 frames. ], batch size: 149, lr: 6.41e-03, grad_scale: 32.0 2024-09-15 04:08:42,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=230885.5, ans=0.1 2024-09-15 04:09:11,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=230942.16666666666, ans=0.125 2024-09-15 04:09:18,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=230942.16666666666, ans=0.125 2024-09-15 04:09:33,984 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-09-15 04:09:54,125 INFO [train.py:1198] (0/2) Epoch 13, batch 4850, loss[loss=0.2896, ctc_loss=0.2035, cr_loss=0.4305, over 20959.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1766, cr_loss=0.392, over 4091777.72 frames. ], batch size: 67, lr: 6.41e-03, grad_scale: 32.0 2024-09-15 04:09:54,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231027.16666666666, ans=0.1 2024-09-15 04:10:18,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=231055.5, ans=0.0 2024-09-15 04:10:36,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=231083.83333333334, ans=0.0 2024-09-15 04:10:55,612 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.691e+02 2.070e+02 2.229e+02 2.492e+02 4.047e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-15 04:11:08,625 INFO [train.py:1198] (0/2) Epoch 13, batch 4900, loss[loss=0.2695, ctc_loss=0.1851, cr_loss=0.422, over 20687.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1762, cr_loss=0.3919, over 4099121.57 frames. ], batch size: 68, lr: 6.40e-03, grad_scale: 32.0 2024-09-15 04:11:20,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=231168.83333333334, ans=0.0 2024-09-15 04:11:34,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=231197.16666666666, ans=0.0 2024-09-15 04:11:37,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231225.5, ans=0.1 2024-09-15 04:11:50,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=231225.5, ans=0.2 2024-09-15 04:12:22,468 INFO [train.py:1198] (0/2) Epoch 13, batch 4950, loss[loss=0.2092, ctc_loss=0.1403, cr_loss=0.3443, over 20956.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1764, cr_loss=0.3918, over 4091718.54 frames. ], batch size: 49, lr: 6.40e-03, grad_scale: 32.0 2024-09-15 04:12:45,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=231338.83333333334, ans=0.125 2024-09-15 04:13:01,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231367.16666666666, ans=0.1 2024-09-15 04:13:09,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=231395.5, ans=0.125 2024-09-15 04:13:15,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2024-09-15 04:13:23,305 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.061e+02 2.191e+02 2.435e+02 4.036e+02, threshold=4.383e+02, percent-clipped=0.0 2024-09-15 04:13:28,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=231423.83333333334, ans=0.125 2024-09-15 04:13:36,782 INFO [train.py:1198] (0/2) Epoch 13, batch 5000, loss[loss=0.2662, ctc_loss=0.1857, cr_loss=0.4026, over 20816.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1766, cr_loss=0.3923, over 4088848.70 frames. ], batch size: 65, lr: 6.40e-03, grad_scale: 32.0 2024-09-15 04:13:58,729 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-09-15 04:14:13,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=231508.83333333334, ans=0.0 2024-09-15 04:14:51,157 INFO [train.py:1198] (0/2) Epoch 13, batch 5050, loss[loss=0.2717, ctc_loss=0.1897, cr_loss=0.41, over 19439.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1763, cr_loss=0.3931, over 4106753.51 frames. ], batch size: 90, lr: 6.40e-03, grad_scale: 32.0 2024-09-15 04:14:53,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231593.83333333334, ans=0.1 2024-09-15 04:14:53,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=22.5 2024-09-15 04:14:57,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=12.0 2024-09-15 04:15:00,905 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=12.0 2024-09-15 04:15:35,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231650.5, ans=0.125 2024-09-15 04:15:35,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=231650.5, ans=0.04949747468305833 2024-09-15 04:15:54,720 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.652e+02 2.020e+02 2.145e+02 2.341e+02 4.925e+02, threshold=4.290e+02, percent-clipped=1.0 2024-09-15 04:15:56,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231707.16666666666, ans=0.1 2024-09-15 04:15:58,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=231707.16666666666, ans=15.0 2024-09-15 04:16:08,608 INFO [train.py:1198] (0/2) Epoch 13, batch 5100, loss[loss=0.2589, ctc_loss=0.1755, cr_loss=0.4172, over 20861.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1745, cr_loss=0.3904, over 4111273.34 frames. ], batch size: 57, lr: 6.40e-03, grad_scale: 32.0 2024-09-15 04:16:30,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-15 04:16:34,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=231763.83333333334, ans=0.2 2024-09-15 04:16:35,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231763.83333333334, ans=0.125 2024-09-15 04:17:06,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=231848.83333333334, ans=0.2 2024-09-15 04:17:11,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.20 vs. limit=10.0 2024-09-15 04:17:24,834 INFO [train.py:1198] (0/2) Epoch 13, batch 5150, loss[loss=0.2878, ctc_loss=0.2029, cr_loss=0.4248, over 19564.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1742, cr_loss=0.3892, over 4110457.80 frames. ], batch size: 90, lr: 6.39e-03, grad_scale: 32.0 2024-09-15 04:17:25,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=231877.16666666666, ans=0.125 2024-09-15 04:18:14,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=231962.16666666666, ans=0.125 2024-09-15 04:18:22,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=231990.5, ans=0.125 2024-09-15 04:18:25,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.103e+02 2.398e+02 2.600e+02 4.837e+02, threshold=4.796e+02, percent-clipped=2.0 2024-09-15 04:18:25,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=231990.5, ans=0.2 2024-09-15 04:18:38,899 INFO [train.py:1198] (0/2) Epoch 13, batch 5200, loss[loss=0.2122, ctc_loss=0.1448, cr_loss=0.3374, over 20984.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1736, cr_loss=0.3882, over 4112592.01 frames. ], batch size: 51, lr: 6.39e-03, grad_scale: 32.0 2024-09-15 04:18:39,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2024-09-15 04:18:40,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=232018.83333333334, ans=0.2 2024-09-15 04:19:52,381 INFO [train.py:1198] (0/2) Epoch 13, batch 5250, loss[loss=0.2839, ctc_loss=0.1963, cr_loss=0.4376, over 21034.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1735, cr_loss=0.3878, over 4106182.53 frames. ], batch size: 62, lr: 6.39e-03, grad_scale: 32.0 2024-09-15 04:19:54,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=232160.5, ans=0.125 2024-09-15 04:19:54,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.65 vs. limit=10.0 2024-09-15 04:19:55,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=232160.5, ans=0.125 2024-09-15 04:20:14,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=232188.83333333334, ans=0.125 2024-09-15 04:20:20,968 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-09-15 04:20:21,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=232217.16666666666, ans=0.2 2024-09-15 04:20:24,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232217.16666666666, ans=0.1 2024-09-15 04:20:25,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=232217.16666666666, ans=0.2 2024-09-15 04:20:45,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=232245.5, ans=0.125 2024-09-15 04:20:52,483 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+02 2.002e+02 2.168e+02 2.399e+02 3.136e+02, threshold=4.335e+02, percent-clipped=0.0 2024-09-15 04:20:52,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=232273.83333333334, ans=0.125 2024-09-15 04:20:54,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=232273.83333333334, ans=0.2 2024-09-15 04:21:05,950 INFO [train.py:1198] (0/2) Epoch 13, batch 5300, loss[loss=0.2218, ctc_loss=0.1502, cr_loss=0.358, over 19964.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1749, cr_loss=0.39, over 4094992.13 frames. ], batch size: 44, lr: 6.39e-03, grad_scale: 32.0 2024-09-15 04:21:06,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232302.16666666666, ans=0.1 2024-09-15 04:21:11,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-09-15 04:21:35,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232358.83333333334, ans=0.1 2024-09-15 04:21:38,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=232358.83333333334, ans=0.125 2024-09-15 04:22:06,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=232415.5, ans=0.125 2024-09-15 04:22:19,834 INFO [train.py:1198] (0/2) Epoch 13, batch 5350, loss[loss=0.2387, ctc_loss=0.1647, cr_loss=0.3697, over 20870.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1747, cr_loss=0.3897, over 4087907.21 frames. ], batch size: 57, lr: 6.39e-03, grad_scale: 32.0 2024-09-15 04:22:30,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=232443.83333333334, ans=0.125 2024-09-15 04:22:39,247 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:23:13,140 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:23:19,919 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.073e+02 2.241e+02 2.493e+02 6.265e+02, threshold=4.482e+02, percent-clipped=2.0 2024-09-15 04:23:21,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=232557.16666666666, ans=0.0 2024-09-15 04:23:27,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=232557.16666666666, ans=0.125 2024-09-15 04:23:31,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=232585.5, ans=0.0 2024-09-15 04:23:33,066 INFO [train.py:1198] (0/2) Epoch 13, batch 5400, loss[loss=0.213, ctc_loss=0.145, cr_loss=0.3401, over 20985.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1748, cr_loss=0.3902, over 4095161.47 frames. ], batch size: 51, lr: 6.38e-03, grad_scale: 32.0 2024-09-15 04:24:47,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=232727.16666666666, ans=0.125 2024-09-15 04:24:48,893 INFO [train.py:1198] (0/2) Epoch 13, batch 5450, loss[loss=0.2924, ctc_loss=0.2108, cr_loss=0.4081, over 14842.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1753, cr_loss=0.3909, over 4091738.38 frames. ], batch size: 151, lr: 6.38e-03, grad_scale: 32.0 2024-09-15 04:25:43,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=232812.16666666666, ans=0.125 2024-09-15 04:25:47,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=232840.5, ans=0.0 2024-09-15 04:25:49,076 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.039e+02 2.194e+02 2.418e+02 3.248e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-15 04:25:52,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=232840.5, ans=0.125 2024-09-15 04:25:53,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=232840.5, ans=0.0 2024-09-15 04:26:02,164 INFO [train.py:1198] (0/2) Epoch 13, batch 5500, loss[loss=0.2227, ctc_loss=0.1523, cr_loss=0.3523, over 20786.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1758, cr_loss=0.3911, over 4078477.17 frames. ], batch size: 53, lr: 6.38e-03, grad_scale: 32.0 2024-09-15 04:26:47,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=232953.83333333334, ans=0.125 2024-09-15 04:26:58,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-09-15 04:27:05,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=232982.16666666666, ans=0.0 2024-09-15 04:27:07,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=232982.16666666666, ans=0.125 2024-09-15 04:27:07,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=232982.16666666666, ans=0.2 2024-09-15 04:27:10,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=232982.16666666666, ans=0.0 2024-09-15 04:27:18,465 INFO [train.py:1198] (0/2) Epoch 13, batch 5550, loss[loss=0.2566, ctc_loss=0.1784, cr_loss=0.3907, over 20970.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1761, cr_loss=0.3912, over 4071474.08 frames. ], batch size: 58, lr: 6.38e-03, grad_scale: 32.0 2024-09-15 04:27:42,473 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:27:58,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233067.16666666666, ans=0.1 2024-09-15 04:28:01,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233095.5, ans=0.1 2024-09-15 04:28:01,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=233095.5, ans=0.125 2024-09-15 04:28:12,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=233095.5, ans=0.025 2024-09-15 04:28:18,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.073e+02 2.221e+02 2.441e+02 3.591e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-15 04:28:24,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=233123.83333333334, ans=0.025 2024-09-15 04:28:29,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.62 vs. limit=10.0 2024-09-15 04:28:31,923 INFO [train.py:1198] (0/2) Epoch 13, batch 5600, loss[loss=0.2451, ctc_loss=0.1684, cr_loss=0.3836, over 20887.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1769, cr_loss=0.393, over 4066175.10 frames. ], batch size: 57, lr: 6.38e-03, grad_scale: 32.0 2024-09-15 04:28:48,449 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:29:10,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=233208.83333333334, ans=0.04949747468305833 2024-09-15 04:29:45,344 INFO [train.py:1198] (0/2) Epoch 13, batch 5650, loss[loss=0.2705, ctc_loss=0.1883, cr_loss=0.4107, over 20930.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1773, cr_loss=0.393, over 4070791.56 frames. ], batch size: 60, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:29:45,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=233293.83333333334, ans=0.0 2024-09-15 04:30:06,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=233322.16666666666, ans=0.125 2024-09-15 04:30:10,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=233322.16666666666, ans=0.125 2024-09-15 04:30:12,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=233322.16666666666, ans=0.0 2024-09-15 04:30:43,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=233407.16666666666, ans=0.125 2024-09-15 04:30:46,135 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 1.990e+02 2.120e+02 2.287e+02 3.734e+02, threshold=4.241e+02, percent-clipped=0.0 2024-09-15 04:30:59,526 INFO [train.py:1198] (0/2) Epoch 13, batch 5700, loss[loss=0.2948, ctc_loss=0.2102, cr_loss=0.4227, over 18413.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1772, cr_loss=0.3925, over 4067854.06 frames. ], batch size: 108, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:31:11,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=233435.5, ans=0.125 2024-09-15 04:31:32,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-09-15 04:31:47,602 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-09-15 04:32:13,579 INFO [train.py:1198] (0/2) Epoch 13, batch 5750, loss[loss=0.2435, ctc_loss=0.1694, cr_loss=0.3703, over 20960.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1766, cr_loss=0.3922, over 4069559.68 frames. ], batch size: 64, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:33:14,737 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.064e+02 2.192e+02 2.396e+02 3.460e+02, threshold=4.385e+02, percent-clipped=0.0 2024-09-15 04:33:19,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2024-09-15 04:33:20,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=233690.5, ans=0.125 2024-09-15 04:33:23,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=233690.5, ans=0.125 2024-09-15 04:33:30,313 INFO [train.py:1198] (0/2) Epoch 13, batch 5800, loss[loss=0.257, ctc_loss=0.1787, cr_loss=0.3917, over 20870.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1763, cr_loss=0.3928, over 4083565.43 frames. ], batch size: 57, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:33:33,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=233718.83333333334, ans=0.0 2024-09-15 04:33:52,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=233747.16666666666, ans=0.125 2024-09-15 04:33:57,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=233747.16666666666, ans=0.125 2024-09-15 04:34:05,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-15 04:34:30,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=233832.16666666666, ans=0.07 2024-09-15 04:34:31,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=233832.16666666666, ans=0.125 2024-09-15 04:34:39,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=233832.16666666666, ans=0.125 2024-09-15 04:34:44,699 INFO [train.py:1198] (0/2) Epoch 13, batch 5850, loss[loss=0.2511, ctc_loss=0.1728, cr_loss=0.3916, over 20980.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1762, cr_loss=0.3926, over 4087655.64 frames. ], batch size: 55, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:35:14,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=233888.83333333334, ans=0.125 2024-09-15 04:35:25,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=233917.16666666666, ans=0.125 2024-09-15 04:35:34,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233945.5, ans=0.1 2024-09-15 04:35:47,280 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.684e+02 2.052e+02 2.224e+02 2.433e+02 4.392e+02, threshold=4.449e+02, percent-clipped=1.0 2024-09-15 04:35:50,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=233973.83333333334, ans=0.125 2024-09-15 04:35:53,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=233973.83333333334, ans=0.2 2024-09-15 04:36:00,661 INFO [train.py:1198] (0/2) Epoch 13, batch 5900, loss[loss=0.2641, ctc_loss=0.1816, cr_loss=0.4129, over 20662.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1765, cr_loss=0.3926, over 4084046.56 frames. ], batch size: 66, lr: 6.37e-03, grad_scale: 32.0 2024-09-15 04:37:14,418 INFO [train.py:1198] (0/2) Epoch 13, batch 5950, loss[loss=0.2345, ctc_loss=0.1604, cr_loss=0.3707, over 20942.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1746, cr_loss=0.3896, over 4089096.52 frames. ], batch size: 60, lr: 6.36e-03, grad_scale: 32.0 2024-09-15 04:37:19,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=22.5 2024-09-15 04:37:57,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=234228.83333333334, ans=0.09899494936611666 2024-09-15 04:38:02,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=234228.83333333334, ans=0.0 2024-09-15 04:38:05,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=234228.83333333334, ans=0.125 2024-09-15 04:38:07,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=234228.83333333334, ans=0.5 2024-09-15 04:38:15,592 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.071e+02 2.203e+02 2.385e+02 5.164e+02, threshold=4.405e+02, percent-clipped=1.0 2024-09-15 04:38:16,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=234257.16666666666, ans=0.2 2024-09-15 04:38:23,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=234257.16666666666, ans=0.125 2024-09-15 04:38:28,665 INFO [train.py:1198] (0/2) Epoch 13, batch 6000, loss[loss=0.2381, ctc_loss=0.1607, cr_loss=0.3867, over 21039.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1738, cr_loss=0.3886, over 4100482.35 frames. ], batch size: 56, lr: 6.36e-03, grad_scale: 32.0 2024-09-15 04:38:28,666 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 04:38:50,144 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7927, 3.7723, 2.7591, 3.7016], device='cuda:0') 2024-09-15 04:38:50,993 INFO [train.py:1230] (0/2) Epoch 13, validation: loss=0.04815, ctc_loss=0.04815, cr_loss=9.906e-15, over 944034.00 frames. 2024-09-15 04:38:50,993 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 04:39:10,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=234313.83333333334, ans=0.0 2024-09-15 04:39:28,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=234342.16666666666, ans=0.05 2024-09-15 04:39:37,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=234370.5, ans=0.5 2024-09-15 04:39:39,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2024-09-15 04:39:50,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=234398.83333333334, ans=0.0 2024-09-15 04:40:05,225 INFO [train.py:1198] (0/2) Epoch 13, batch 6050, loss[loss=0.212, ctc_loss=0.1437, cr_loss=0.3415, over 19476.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1744, cr_loss=0.3896, over 4096957.46 frames. ], batch size: 43, lr: 6.36e-03, grad_scale: 32.0 2024-09-15 04:40:05,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=234427.16666666666, ans=0.5 2024-09-15 04:40:33,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=234483.83333333334, ans=0.0 2024-09-15 04:40:48,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=234483.83333333334, ans=0.1 2024-09-15 04:40:49,488 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:41:06,758 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.090e+02 2.204e+02 2.375e+02 4.545e+02, threshold=4.407e+02, percent-clipped=1.0 2024-09-15 04:41:12,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=234540.5, ans=0.0 2024-09-15 04:41:18,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=234540.5, ans=0.125 2024-09-15 04:41:19,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=234568.83333333334, ans=0.125 2024-09-15 04:41:21,325 INFO [train.py:1198] (0/2) Epoch 13, batch 6100, loss[loss=0.2082, ctc_loss=0.1382, cr_loss=0.3496, over 20389.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1742, cr_loss=0.3891, over 4091376.11 frames. ], batch size: 45, lr: 6.36e-03, grad_scale: 32.0 2024-09-15 04:41:25,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=234568.83333333334, ans=0.125 2024-09-15 04:41:31,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=234568.83333333334, ans=0.125 2024-09-15 04:41:42,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=234597.16666666666, ans=0.2 2024-09-15 04:42:21,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=234682.16666666666, ans=0.0 2024-09-15 04:42:33,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=234710.5, ans=0.125 2024-09-15 04:42:34,388 INFO [train.py:1198] (0/2) Epoch 13, batch 6150, loss[loss=0.2427, ctc_loss=0.1633, cr_loss=0.3969, over 20879.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1741, cr_loss=0.3895, over 4090896.17 frames. ], batch size: 57, lr: 6.36e-03, grad_scale: 32.0 2024-09-15 04:42:36,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=234710.5, ans=0.0 2024-09-15 04:43:35,985 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.073e+02 2.219e+02 2.414e+02 7.028e+02, threshold=4.438e+02, percent-clipped=2.0 2024-09-15 04:43:48,885 INFO [train.py:1198] (0/2) Epoch 13, batch 6200, loss[loss=0.2129, ctc_loss=0.1448, cr_loss=0.3406, over 20942.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1739, cr_loss=0.3894, over 4088139.08 frames. ], batch size: 48, lr: 6.35e-03, grad_scale: 32.0 2024-09-15 04:43:56,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=234852.16666666666, ans=0.125 2024-09-15 04:44:05,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=234880.5, ans=0.0 2024-09-15 04:44:15,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=234880.5, ans=0.0 2024-09-15 04:44:58,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=234965.5, ans=0.125 2024-09-15 04:45:01,608 INFO [train.py:1198] (0/2) Epoch 13, batch 6250, loss[loss=0.2407, ctc_loss=0.1651, cr_loss=0.378, over 20982.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.176, cr_loss=0.3907, over 4031948.81 frames. ], batch size: 55, lr: 6.35e-03, grad_scale: 64.0 2024-09-15 04:45:03,404 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:46:00,746 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.192e+02 2.367e+02 2.698e+02 3.520e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-15 04:46:12,893 INFO [train.py:1198] (0/2) Epoch 13, batch 6300, loss[loss=0.2785, ctc_loss=0.1994, cr_loss=0.3954, over 18405.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1812, cr_loss=0.3953, over 3961553.17 frames. ], batch size: 108, lr: 6.35e-03, grad_scale: 32.0 2024-09-15 04:46:26,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=235163.83333333334, ans=0.0 2024-09-15 04:46:38,820 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2024-09-15 04:47:03,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=235220.5, ans=0.1 2024-09-15 04:47:16,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.51 vs. limit=22.5 2024-09-15 04:47:22,424 INFO [train.py:1198] (0/2) Epoch 13, batch 6350, loss[loss=0.3139, ctc_loss=0.2298, cr_loss=0.4206, over 14231.00 frames. ], tot_loss[loss=0.2673, ctc_loss=0.1874, cr_loss=0.3993, over 3800617.37 frames. ], batch size: 149, lr: 6.35e-03, grad_scale: 32.0 2024-09-15 04:47:23,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-09-15 04:48:03,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=235362.16666666666, ans=0.125 2024-09-15 04:48:20,278 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-13.pt 2024-09-15 04:49:06,941 INFO [train.py:1198] (0/2) Epoch 14, batch 0, loss[loss=0.2805, ctc_loss=0.194, cr_loss=0.4321, over 20868.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.194, cr_loss=0.4321, over 20868.00 frames. ], batch size: 65, lr: 6.12e-03, grad_scale: 32.0 2024-09-15 04:49:06,942 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 04:49:25,003 INFO [train.py:1230] (0/2) Epoch 14, validation: loss=0.04984, ctc_loss=0.04984, cr_loss=9.807e-15, over 944034.00 frames. 2024-09-15 04:49:25,003 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 04:49:26,461 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.389e+02 2.523e+02 2.822e+02 4.019e+02, threshold=5.045e+02, percent-clipped=0.0 2024-09-15 04:50:19,812 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=12.0 2024-09-15 04:50:40,379 INFO [train.py:1198] (0/2) Epoch 14, batch 50, loss[loss=0.2534, ctc_loss=0.1759, cr_loss=0.3872, over 21090.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1756, cr_loss=0.3949, over 939513.92 frames. ], batch size: 59, lr: 6.11e-03, grad_scale: 32.0 2024-09-15 04:51:04,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=235563.33333333334, ans=0.0 2024-09-15 04:51:08,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=235563.33333333334, ans=0.2 2024-09-15 04:51:14,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=235591.66666666666, ans=0.07 2024-09-15 04:51:21,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=235591.66666666666, ans=0.125 2024-09-15 04:51:31,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.60 vs. limit=10.0 2024-09-15 04:51:55,707 INFO [train.py:1198] (0/2) Epoch 14, batch 100, loss[loss=0.2679, ctc_loss=0.1875, cr_loss=0.4023, over 20988.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1781, cr_loss=0.3969, over 1646298.19 frames. ], batch size: 64, lr: 6.11e-03, grad_scale: 32.0 2024-09-15 04:51:57,195 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.024e+02 2.145e+02 2.322e+02 3.133e+02, threshold=4.290e+02, percent-clipped=0.0 2024-09-15 04:52:20,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-15 04:52:59,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235790.0, ans=0.1 2024-09-15 04:53:12,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=235818.33333333334, ans=0.04949747468305833 2024-09-15 04:53:13,809 INFO [train.py:1198] (0/2) Epoch 14, batch 150, loss[loss=0.2923, ctc_loss=0.2028, cr_loss=0.4474, over 20828.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1795, cr_loss=0.3985, over 2185688.76 frames. ], batch size: 59, lr: 6.11e-03, grad_scale: 32.0 2024-09-15 04:53:18,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=235818.33333333334, ans=0.125 2024-09-15 04:53:32,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=12.0 2024-09-15 04:53:37,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=235846.66666666666, ans=0.0 2024-09-15 04:53:40,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235846.66666666666, ans=0.1 2024-09-15 04:54:04,866 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:54:13,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=235931.66666666666, ans=0.125 2024-09-15 04:54:18,350 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:54:18,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=235931.66666666666, ans=0.125 2024-09-15 04:54:28,534 INFO [train.py:1198] (0/2) Epoch 14, batch 200, loss[loss=0.2738, ctc_loss=0.1903, cr_loss=0.4176, over 20046.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.1776, cr_loss=0.3952, over 2608169.89 frames. ], batch size: 80, lr: 6.11e-03, grad_scale: 32.0 2024-09-15 04:54:30,056 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.027e+02 2.156e+02 2.358e+02 4.767e+02, threshold=4.313e+02, percent-clipped=1.0 2024-09-15 04:54:42,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235988.33333333334, ans=0.1 2024-09-15 04:54:46,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=235988.33333333334, ans=0.2 2024-09-15 04:54:48,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=235988.33333333334, ans=0.0 2024-09-15 04:54:54,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=235988.33333333334, ans=0.125 2024-09-15 04:55:21,322 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:55:44,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=236073.33333333334, ans=0.0 2024-09-15 04:55:46,404 INFO [train.py:1198] (0/2) Epoch 14, batch 250, loss[loss=0.2812, ctc_loss=0.1957, cr_loss=0.4275, over 20055.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1767, cr_loss=0.3949, over 2945973.36 frames. ], batch size: 80, lr: 6.11e-03, grad_scale: 32.0 2024-09-15 04:56:00,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=236130.0, ans=0.125 2024-09-15 04:56:24,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=236158.33333333334, ans=0.0 2024-09-15 04:56:35,504 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=15.0 2024-09-15 04:56:36,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=236186.66666666666, ans=0.0 2024-09-15 04:57:01,674 INFO [train.py:1198] (0/2) Epoch 14, batch 300, loss[loss=0.2729, ctc_loss=0.19, cr_loss=0.4141, over 20940.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1744, cr_loss=0.3912, over 3204233.75 frames. ], batch size: 60, lr: 6.10e-03, grad_scale: 32.0 2024-09-15 04:57:03,091 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.107e+02 2.203e+02 2.405e+02 3.359e+02, threshold=4.406e+02, percent-clipped=0.0 2024-09-15 04:57:05,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=236243.33333333334, ans=0.125 2024-09-15 04:57:12,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=236243.33333333334, ans=0.09899494936611666 2024-09-15 04:57:15,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=236271.66666666666, ans=0.125 2024-09-15 04:57:47,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=236300.0, ans=0.125 2024-09-15 04:57:52,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=236328.33333333334, ans=0.125 2024-09-15 04:58:02,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-09-15 04:58:19,366 INFO [train.py:1198] (0/2) Epoch 14, batch 350, loss[loss=0.2211, ctc_loss=0.1515, cr_loss=0.3482, over 20986.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1741, cr_loss=0.391, over 3409060.24 frames. ], batch size: 52, lr: 6.10e-03, grad_scale: 16.0 2024-09-15 04:58:21,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=236385.0, ans=0.125 2024-09-15 04:58:42,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-15 04:58:48,447 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 04:59:01,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=236441.66666666666, ans=0.0 2024-09-15 04:59:15,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=236470.0, ans=0.2 2024-09-15 04:59:18,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=236498.33333333334, ans=0.125 2024-09-15 04:59:34,564 INFO [train.py:1198] (0/2) Epoch 14, batch 400, loss[loss=0.2852, ctc_loss=0.1968, cr_loss=0.4419, over 20691.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1747, cr_loss=0.3912, over 3553011.04 frames. ], batch size: 71, lr: 6.10e-03, grad_scale: 32.0 2024-09-15 04:59:37,468 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.028e+02 2.148e+02 2.357e+02 3.906e+02, threshold=4.296e+02, percent-clipped=0.0 2024-09-15 04:59:54,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236555.0, ans=0.1 2024-09-15 05:00:09,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236583.33333333334, ans=0.1 2024-09-15 05:00:22,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236611.66666666666, ans=0.1 2024-09-15 05:00:25,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=236611.66666666666, ans=0.0 2024-09-15 05:00:47,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=236640.0, ans=15.0 2024-09-15 05:00:52,940 INFO [train.py:1198] (0/2) Epoch 14, batch 450, loss[loss=0.2246, ctc_loss=0.1519, cr_loss=0.3636, over 21070.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1751, cr_loss=0.3916, over 3671130.18 frames. ], batch size: 53, lr: 6.10e-03, grad_scale: 32.0 2024-09-15 05:02:08,282 INFO [train.py:1198] (0/2) Epoch 14, batch 500, loss[loss=0.2716, ctc_loss=0.19, cr_loss=0.4083, over 20956.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1749, cr_loss=0.3918, over 3765001.78 frames. ], batch size: 64, lr: 6.10e-03, grad_scale: 32.0 2024-09-15 05:02:11,189 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.083e+02 2.195e+02 2.401e+02 5.537e+02, threshold=4.389e+02, percent-clipped=1.0 2024-09-15 05:02:11,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=236810.0, ans=0.125 2024-09-15 05:02:38,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=236866.66666666666, ans=0.0 2024-09-15 05:02:43,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-09-15 05:03:13,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=236923.33333333334, ans=0.0 2024-09-15 05:03:26,740 INFO [train.py:1198] (0/2) Epoch 14, batch 550, loss[loss=0.2649, ctc_loss=0.1834, cr_loss=0.4076, over 19504.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1752, cr_loss=0.3922, over 3834686.96 frames. ], batch size: 90, lr: 6.10e-03, grad_scale: 32.0 2024-09-15 05:04:01,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=237008.33333333334, ans=0.125 2024-09-15 05:04:01,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=237008.33333333334, ans=0.025 2024-09-15 05:04:36,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=237065.0, ans=0.125 2024-09-15 05:04:41,119 INFO [train.py:1198] (0/2) Epoch 14, batch 600, loss[loss=0.3178, ctc_loss=0.2326, cr_loss=0.4261, over 14215.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1756, cr_loss=0.3922, over 3880909.06 frames. ], batch size: 149, lr: 6.09e-03, grad_scale: 32.0 2024-09-15 05:04:44,105 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.005e+02 2.152e+02 2.363e+02 2.998e+02, threshold=4.304e+02, percent-clipped=0.0 2024-09-15 05:04:47,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237093.33333333334, ans=0.1 2024-09-15 05:04:47,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=237093.33333333334, ans=0.5 2024-09-15 05:05:13,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237150.0, ans=0.1 2024-09-15 05:05:21,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=237150.0, ans=0.0 2024-09-15 05:05:41,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=237206.66666666666, ans=0.125 2024-09-15 05:05:56,091 INFO [train.py:1198] (0/2) Epoch 14, batch 650, loss[loss=0.2292, ctc_loss=0.1548, cr_loss=0.3721, over 20973.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1758, cr_loss=0.3928, over 3938784.25 frames. ], batch size: 55, lr: 6.09e-03, grad_scale: 32.0 2024-09-15 05:07:15,606 INFO [train.py:1198] (0/2) Epoch 14, batch 700, loss[loss=0.2517, ctc_loss=0.174, cr_loss=0.3882, over 21058.00 frames. ], tot_loss[loss=0.2538, ctc_loss=0.1753, cr_loss=0.3921, over 3969888.56 frames. ], batch size: 53, lr: 6.09e-03, grad_scale: 32.0 2024-09-15 05:07:18,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.061e+02 2.179e+02 2.353e+02 3.627e+02, threshold=4.359e+02, percent-clipped=0.0 2024-09-15 05:07:24,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=237376.66666666666, ans=0.025 2024-09-15 05:07:26,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=237376.66666666666, ans=0.0 2024-09-15 05:07:39,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=237405.0, ans=0.0 2024-09-15 05:08:02,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=237461.66666666666, ans=0.125 2024-09-15 05:08:12,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237461.66666666666, ans=0.1 2024-09-15 05:08:23,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=237490.0, ans=0.2 2024-09-15 05:08:30,847 INFO [train.py:1198] (0/2) Epoch 14, batch 750, loss[loss=0.2835, ctc_loss=0.1977, cr_loss=0.4292, over 20966.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1747, cr_loss=0.3912, over 3994732.27 frames. ], batch size: 64, lr: 6.09e-03, grad_scale: 32.0 2024-09-15 05:08:31,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=237518.33333333334, ans=0.125 2024-09-15 05:09:14,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237575.0, ans=0.1 2024-09-15 05:09:18,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=237603.33333333334, ans=0.125 2024-09-15 05:09:19,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=237603.33333333334, ans=0.025 2024-09-15 05:09:48,988 INFO [train.py:1198] (0/2) Epoch 14, batch 800, loss[loss=0.2943, ctc_loss=0.2088, cr_loss=0.4276, over 18203.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1761, cr_loss=0.3923, over 3997133.53 frames. ], batch size: 108, lr: 6.09e-03, grad_scale: 32.0 2024-09-15 05:09:53,330 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.029e+02 2.193e+02 2.419e+02 2.978e+02, threshold=4.386e+02, percent-clipped=0.0 2024-09-15 05:09:59,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=237660.0, ans=0.2 2024-09-15 05:10:01,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=237660.0, ans=0.125 2024-09-15 05:10:06,587 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-09-15 05:10:17,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=237716.66666666666, ans=0.0 2024-09-15 05:11:03,923 INFO [train.py:1198] (0/2) Epoch 14, batch 850, loss[loss=0.2024, ctc_loss=0.1355, cr_loss=0.3346, over 20308.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1768, cr_loss=0.393, over 4007695.01 frames. ], batch size: 45, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:11:34,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=237858.33333333334, ans=0.125 2024-09-15 05:11:39,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2024-09-15 05:12:03,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=237886.66666666666, ans=10.0 2024-09-15 05:12:23,309 INFO [train.py:1198] (0/2) Epoch 14, batch 900, loss[loss=0.2543, ctc_loss=0.1741, cr_loss=0.4014, over 20883.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1759, cr_loss=0.3916, over 4029188.36 frames. ], batch size: 57, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:12:25,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237943.33333333334, ans=0.1 2024-09-15 05:12:26,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=237943.33333333334, ans=0.125 2024-09-15 05:12:27,787 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.079e+02 2.233e+02 2.451e+02 3.481e+02, threshold=4.467e+02, percent-clipped=0.0 2024-09-15 05:12:49,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=237971.66666666666, ans=0.125 2024-09-15 05:12:51,896 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-84000.pt 2024-09-15 05:12:56,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2024-09-15 05:13:39,331 INFO [train.py:1198] (0/2) Epoch 14, batch 950, loss[loss=0.2571, ctc_loss=0.1791, cr_loss=0.3897, over 21008.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1751, cr_loss=0.3906, over 4051380.80 frames. ], batch size: 63, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:13:44,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-15 05:13:49,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=238085.0, ans=0.125 2024-09-15 05:13:58,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=238113.33333333334, ans=0.125 2024-09-15 05:14:10,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238141.66666666666, ans=0.1 2024-09-15 05:14:30,221 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-15 05:14:48,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=238198.33333333334, ans=22.5 2024-09-15 05:14:58,401 INFO [train.py:1198] (0/2) Epoch 14, batch 1000, loss[loss=0.2285, ctc_loss=0.1558, cr_loss=0.3635, over 20329.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1755, cr_loss=0.3914, over 4059261.89 frames. ], batch size: 45, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:15:01,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238226.66666666666, ans=0.1 2024-09-15 05:15:01,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238226.66666666666, ans=0.1 2024-09-15 05:15:03,007 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.087e+02 2.235e+02 2.382e+02 7.832e+02, threshold=4.470e+02, percent-clipped=1.0 2024-09-15 05:15:07,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=238226.66666666666, ans=0.125 2024-09-15 05:15:12,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.24 vs. limit=6.0 2024-09-15 05:15:40,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238283.33333333334, ans=0.1 2024-09-15 05:15:53,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=238311.66666666666, ans=0.125 2024-09-15 05:15:53,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=238311.66666666666, ans=0.125 2024-09-15 05:16:13,917 INFO [train.py:1198] (0/2) Epoch 14, batch 1050, loss[loss=0.2215, ctc_loss=0.1539, cr_loss=0.3379, over 20922.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1767, cr_loss=0.3927, over 4060122.22 frames. ], batch size: 50, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:16:32,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238396.66666666666, ans=0.1 2024-09-15 05:16:35,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=238396.66666666666, ans=0.125 2024-09-15 05:17:27,923 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.915e-03 2024-09-15 05:17:32,111 INFO [train.py:1198] (0/2) Epoch 14, batch 1100, loss[loss=0.2307, ctc_loss=0.1571, cr_loss=0.3683, over 21009.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1756, cr_loss=0.3917, over 4080143.58 frames. ], batch size: 61, lr: 6.08e-03, grad_scale: 32.0 2024-09-15 05:17:35,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=238510.0, ans=0.2 2024-09-15 05:17:36,637 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.070e+02 2.185e+02 2.490e+02 3.373e+02, threshold=4.371e+02, percent-clipped=0.0 2024-09-15 05:17:46,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=238538.33333333334, ans=0.0 2024-09-15 05:18:43,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=238623.33333333334, ans=15.0 2024-09-15 05:18:46,764 INFO [train.py:1198] (0/2) Epoch 14, batch 1150, loss[loss=0.2671, ctc_loss=0.1858, cr_loss=0.4065, over 21003.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1738, cr_loss=0.3899, over 4093908.56 frames. ], batch size: 61, lr: 6.07e-03, grad_scale: 32.0 2024-09-15 05:19:18,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=238708.33333333334, ans=0.0 2024-09-15 05:19:22,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=238708.33333333334, ans=0.125 2024-09-15 05:19:48,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=238765.0, ans=0.2 2024-09-15 05:19:59,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=238765.0, ans=0.125 2024-09-15 05:20:01,941 INFO [train.py:1198] (0/2) Epoch 14, batch 1200, loss[loss=0.2488, ctc_loss=0.1694, cr_loss=0.3969, over 20867.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1737, cr_loss=0.39, over 4097092.33 frames. ], batch size: 54, lr: 6.07e-03, grad_scale: 32.0 2024-09-15 05:20:09,733 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.610e+02 2.055e+02 2.194e+02 2.371e+02 5.778e+02, threshold=4.387e+02, percent-clipped=1.0 2024-09-15 05:20:12,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=238793.33333333334, ans=0.025 2024-09-15 05:20:30,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=238821.66666666666, ans=0.0 2024-09-15 05:21:19,934 INFO [train.py:1198] (0/2) Epoch 14, batch 1250, loss[loss=0.2512, ctc_loss=0.1736, cr_loss=0.3879, over 20889.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1748, cr_loss=0.3914, over 4087653.64 frames. ], batch size: 54, lr: 6.07e-03, grad_scale: 32.0 2024-09-15 05:21:38,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=238963.33333333334, ans=0.125 2024-09-15 05:22:23,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=239048.33333333334, ans=0.125 2024-09-15 05:22:34,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=239076.66666666666, ans=0.0 2024-09-15 05:22:35,379 INFO [train.py:1198] (0/2) Epoch 14, batch 1300, loss[loss=0.2239, ctc_loss=0.1503, cr_loss=0.3678, over 21068.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1746, cr_loss=0.3906, over 4080755.10 frames. ], batch size: 53, lr: 6.07e-03, grad_scale: 32.0 2024-09-15 05:22:40,000 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.698e+02 2.077e+02 2.243e+02 2.476e+02 4.179e+02, threshold=4.487e+02, percent-clipped=0.0 2024-09-15 05:23:04,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=239105.0, ans=0.125 2024-09-15 05:23:04,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=239105.0, ans=0.09899494936611666 2024-09-15 05:23:28,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=239161.66666666666, ans=0.125 2024-09-15 05:23:33,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239161.66666666666, ans=0.1 2024-09-15 05:23:34,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=239161.66666666666, ans=0.125 2024-09-15 05:23:53,706 INFO [train.py:1198] (0/2) Epoch 14, batch 1350, loss[loss=0.283, ctc_loss=0.2005, cr_loss=0.4122, over 20653.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.174, cr_loss=0.3895, over 4081539.95 frames. ], batch size: 68, lr: 6.07e-03, grad_scale: 32.0 2024-09-15 05:23:58,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=239218.33333333334, ans=0.125 2024-09-15 05:23:58,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=239218.33333333334, ans=0.0 2024-09-15 05:24:13,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239246.66666666666, ans=0.1 2024-09-15 05:24:45,004 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 05:24:50,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=239303.33333333334, ans=0.2 2024-09-15 05:24:50,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=239303.33333333334, ans=0.125 2024-09-15 05:25:00,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=239331.66666666666, ans=0.0 2024-09-15 05:25:02,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=239331.66666666666, ans=0.125 2024-09-15 05:25:08,710 INFO [train.py:1198] (0/2) Epoch 14, batch 1400, loss[loss=0.2745, ctc_loss=0.1882, cr_loss=0.4312, over 19983.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1731, cr_loss=0.3888, over 4093383.41 frames. ], batch size: 80, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:25:13,242 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.109e+02 2.226e+02 2.450e+02 4.273e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-15 05:25:13,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=239360.0, ans=0.0 2024-09-15 05:25:28,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=239388.33333333334, ans=0.125 2024-09-15 05:25:52,846 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2024-09-15 05:26:01,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=239445.0, ans=0.0 2024-09-15 05:26:10,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=239473.33333333334, ans=0.125 2024-09-15 05:26:26,841 INFO [train.py:1198] (0/2) Epoch 14, batch 1450, loss[loss=0.2384, ctc_loss=0.1686, cr_loss=0.349, over 20888.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1734, cr_loss=0.3886, over 4092978.51 frames. ], batch size: 54, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:27:10,948 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.32 vs. limit=15.0 2024-09-15 05:27:26,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2024-09-15 05:27:28,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=239615.0, ans=0.125 2024-09-15 05:27:42,258 INFO [train.py:1198] (0/2) Epoch 14, batch 1500, loss[loss=0.2408, ctc_loss=0.1646, cr_loss=0.3806, over 20978.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1731, cr_loss=0.3879, over 4100968.93 frames. ], batch size: 58, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:27:46,879 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.067e+02 2.258e+02 2.457e+02 5.718e+02, threshold=4.515e+02, percent-clipped=1.0 2024-09-15 05:27:55,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=239643.33333333334, ans=0.125 2024-09-15 05:28:53,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239756.66666666666, ans=0.1 2024-09-15 05:29:00,727 INFO [train.py:1198] (0/2) Epoch 14, batch 1550, loss[loss=0.1863, ctc_loss=0.1262, cr_loss=0.3007, over 19890.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1733, cr_loss=0.3887, over 4111613.25 frames. ], batch size: 44, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:29:08,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=239785.0, ans=0.0 2024-09-15 05:29:13,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239785.0, ans=0.1 2024-09-15 05:29:36,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239841.66666666666, ans=0.1 2024-09-15 05:29:37,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=12.0 2024-09-15 05:29:44,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=239870.0, ans=0.125 2024-09-15 05:29:59,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=239898.33333333334, ans=0.0 2024-09-15 05:30:10,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=239898.33333333334, ans=0.125 2024-09-15 05:30:16,021 INFO [train.py:1198] (0/2) Epoch 14, batch 1600, loss[loss=0.2722, ctc_loss=0.1884, cr_loss=0.4191, over 20966.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1725, cr_loss=0.3881, over 4119149.69 frames. ], batch size: 67, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:30:20,505 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.103e+02 2.239e+02 2.541e+02 3.639e+02, threshold=4.478e+02, percent-clipped=0.0 2024-09-15 05:30:41,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=239955.0, ans=0.125 2024-09-15 05:31:04,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240011.66666666666, ans=0.1 2024-09-15 05:31:17,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2024-09-15 05:31:27,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=240040.0, ans=0.125 2024-09-15 05:31:34,265 INFO [train.py:1198] (0/2) Epoch 14, batch 1650, loss[loss=0.2415, ctc_loss=0.1643, cr_loss=0.3859, over 20782.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1733, cr_loss=0.388, over 4090810.91 frames. ], batch size: 53, lr: 6.06e-03, grad_scale: 32.0 2024-09-15 05:31:48,395 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 05:31:52,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=240096.66666666666, ans=0.1 2024-09-15 05:32:49,542 INFO [train.py:1198] (0/2) Epoch 14, batch 1700, loss[loss=0.2758, ctc_loss=0.1931, cr_loss=0.4135, over 20987.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1732, cr_loss=0.3877, over 4090937.96 frames. ], batch size: 64, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:32:53,745 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.766e+02 2.028e+02 2.161e+02 2.311e+02 5.289e+02, threshold=4.322e+02, percent-clipped=1.0 2024-09-15 05:33:01,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=240210.0, ans=0.2 2024-09-15 05:33:05,141 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-09-15 05:33:22,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=240266.66666666666, ans=0.125 2024-09-15 05:33:53,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=240323.33333333334, ans=0.0 2024-09-15 05:34:04,945 INFO [train.py:1198] (0/2) Epoch 14, batch 1750, loss[loss=0.2112, ctc_loss=0.143, cr_loss=0.3411, over 19944.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1732, cr_loss=0.3878, over 4090616.71 frames. ], batch size: 44, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:34:31,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=240380.0, ans=0.125 2024-09-15 05:35:13,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=240465.0, ans=0.125 2024-09-15 05:35:23,978 INFO [train.py:1198] (0/2) Epoch 14, batch 1800, loss[loss=0.2107, ctc_loss=0.1426, cr_loss=0.3403, over 20984.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1726, cr_loss=0.3871, over 4103010.98 frames. ], batch size: 50, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:35:28,507 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.022e+02 2.173e+02 2.326e+02 2.841e+02, threshold=4.346e+02, percent-clipped=0.0 2024-09-15 05:35:53,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=240550.0, ans=0.125 2024-09-15 05:36:17,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=240578.33333333334, ans=0.2 2024-09-15 05:36:28,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=240606.66666666666, ans=0.125 2024-09-15 05:36:40,177 INFO [train.py:1198] (0/2) Epoch 14, batch 1850, loss[loss=0.2116, ctc_loss=0.1455, cr_loss=0.3303, over 20978.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1723, cr_loss=0.3867, over 4096976.91 frames. ], batch size: 49, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:37:06,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=240663.33333333334, ans=0.125 2024-09-15 05:37:32,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2024-09-15 05:37:58,079 INFO [train.py:1198] (0/2) Epoch 14, batch 1900, loss[loss=0.2571, ctc_loss=0.1768, cr_loss=0.4016, over 20933.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.173, cr_loss=0.3879, over 4090120.71 frames. ], batch size: 60, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:38:02,355 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.062e+02 2.220e+02 2.398e+02 2.948e+02, threshold=4.440e+02, percent-clipped=0.0 2024-09-15 05:38:07,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=240776.66666666666, ans=0.125 2024-09-15 05:38:08,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=240776.66666666666, ans=0.125 2024-09-15 05:38:10,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=240776.66666666666, ans=0.0 2024-09-15 05:38:30,227 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 05:38:36,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=240833.33333333334, ans=0.0 2024-09-15 05:39:13,480 INFO [train.py:1198] (0/2) Epoch 14, batch 1950, loss[loss=0.2663, ctc_loss=0.1807, cr_loss=0.428, over 21078.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1738, cr_loss=0.3891, over 4086270.59 frames. ], batch size: 59, lr: 6.05e-03, grad_scale: 32.0 2024-09-15 05:39:39,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=240946.66666666666, ans=0.125 2024-09-15 05:39:41,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=240946.66666666666, ans=0.1 2024-09-15 05:40:16,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=241031.66666666666, ans=0.125 2024-09-15 05:40:30,943 INFO [train.py:1198] (0/2) Epoch 14, batch 2000, loss[loss=0.2653, ctc_loss=0.1852, cr_loss=0.4006, over 21029.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1751, cr_loss=0.3913, over 4088709.26 frames. ], batch size: 62, lr: 6.04e-03, grad_scale: 32.0 2024-09-15 05:40:34,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2024-09-15 05:40:35,459 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.043e+02 2.236e+02 2.518e+02 3.651e+02, threshold=4.471e+02, percent-clipped=0.0 2024-09-15 05:40:40,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=241060.0, ans=0.025 2024-09-15 05:40:53,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=241088.33333333334, ans=0.125 2024-09-15 05:41:15,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=241145.0, ans=0.125 2024-09-15 05:41:19,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=241145.0, ans=0.2 2024-09-15 05:41:36,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=241173.33333333334, ans=0.125 2024-09-15 05:41:46,183 INFO [train.py:1198] (0/2) Epoch 14, batch 2050, loss[loss=0.2193, ctc_loss=0.1489, cr_loss=0.352, over 21070.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1747, cr_loss=0.3911, over 4092206.61 frames. ], batch size: 53, lr: 6.04e-03, grad_scale: 32.0 2024-09-15 05:42:15,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=241258.33333333334, ans=0.025 2024-09-15 05:42:23,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=241258.33333333334, ans=0.0 2024-09-15 05:42:34,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=241286.66666666666, ans=0.0 2024-09-15 05:42:46,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=241315.0, ans=0.0 2024-09-15 05:43:04,059 INFO [train.py:1198] (0/2) Epoch 14, batch 2100, loss[loss=0.2175, ctc_loss=0.1475, cr_loss=0.3496, over 20880.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1744, cr_loss=0.3905, over 4092483.00 frames. ], batch size: 54, lr: 6.04e-03, grad_scale: 32.0 2024-09-15 05:43:08,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 1.980e+02 2.120e+02 2.281e+02 2.836e+02, threshold=4.240e+02, percent-clipped=0.0 2024-09-15 05:43:39,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=241400.0, ans=10.0 2024-09-15 05:44:19,155 INFO [train.py:1198] (0/2) Epoch 14, batch 2150, loss[loss=0.2412, ctc_loss=0.1624, cr_loss=0.3944, over 20164.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1731, cr_loss=0.3887, over 4094619.27 frames. ], batch size: 74, lr: 6.04e-03, grad_scale: 32.0 2024-09-15 05:45:33,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=241626.66666666666, ans=0.0 2024-09-15 05:45:35,091 INFO [train.py:1198] (0/2) Epoch 14, batch 2200, loss[loss=0.2049, ctc_loss=0.136, cr_loss=0.3446, over 20364.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.172, cr_loss=0.3867, over 4095336.02 frames. ], batch size: 45, lr: 6.04e-03, grad_scale: 32.0 2024-09-15 05:45:39,532 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.074e+02 2.247e+02 2.504e+02 4.445e+02, threshold=4.495e+02, percent-clipped=1.0 2024-09-15 05:45:49,569 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-15 05:45:56,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=241655.0, ans=0.0 2024-09-15 05:46:18,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=241683.33333333334, ans=0.125 2024-09-15 05:46:31,261 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-09-15 05:46:52,931 INFO [train.py:1198] (0/2) Epoch 14, batch 2250, loss[loss=0.2178, ctc_loss=0.1469, cr_loss=0.3544, over 20957.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1718, cr_loss=0.3863, over 4100174.59 frames. ], batch size: 50, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:46:56,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=241768.33333333334, ans=0.125 2024-09-15 05:47:25,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2024-09-15 05:47:28,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=241825.0, ans=0.0 2024-09-15 05:47:29,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=241825.0, ans=0.125 2024-09-15 05:47:33,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=241825.0, ans=0.125 2024-09-15 05:47:51,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=241881.66666666666, ans=0.125 2024-09-15 05:48:02,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=241881.66666666666, ans=0.125 2024-09-15 05:48:07,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.38 vs. limit=10.0 2024-09-15 05:48:08,194 INFO [train.py:1198] (0/2) Epoch 14, batch 2300, loss[loss=0.2422, ctc_loss=0.1679, cr_loss=0.372, over 20295.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1743, cr_loss=0.3897, over 4063740.08 frames. ], batch size: 74, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:48:12,807 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.675e+02 2.050e+02 2.168e+02 2.378e+02 4.598e+02, threshold=4.337e+02, percent-clipped=1.0 2024-09-15 05:48:47,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=241966.66666666666, ans=0.1 2024-09-15 05:48:56,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=241995.0, ans=0.125 2024-09-15 05:49:15,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=242023.33333333334, ans=0.125 2024-09-15 05:49:26,727 INFO [train.py:1198] (0/2) Epoch 14, batch 2350, loss[loss=0.2717, ctc_loss=0.1897, cr_loss=0.4101, over 20963.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1741, cr_loss=0.3888, over 4060649.62 frames. ], batch size: 64, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:50:08,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-09-15 05:50:09,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242108.33333333334, ans=0.1 2024-09-15 05:50:12,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242136.66666666666, ans=0.1 2024-09-15 05:50:13,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=242136.66666666666, ans=0.125 2024-09-15 05:50:42,259 INFO [train.py:1198] (0/2) Epoch 14, batch 2400, loss[loss=0.2494, ctc_loss=0.1699, cr_loss=0.3973, over 21019.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1738, cr_loss=0.3889, over 4072432.84 frames. ], batch size: 61, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:50:48,147 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.003e+02 2.156e+02 2.309e+02 2.945e+02, threshold=4.312e+02, percent-clipped=0.0 2024-09-15 05:51:22,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=242250.0, ans=0.025 2024-09-15 05:52:00,234 INFO [train.py:1198] (0/2) Epoch 14, batch 2450, loss[loss=0.2384, ctc_loss=0.1673, cr_loss=0.3554, over 20813.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1738, cr_loss=0.3894, over 4086299.04 frames. ], batch size: 59, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:52:17,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2024-09-15 05:52:18,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=242363.33333333334, ans=0.0 2024-09-15 05:52:25,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.19 vs. limit=10.0 2024-09-15 05:53:08,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=242448.33333333334, ans=0.0 2024-09-15 05:53:11,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=242448.33333333334, ans=0.125 2024-09-15 05:53:14,151 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-09-15 05:53:15,288 INFO [train.py:1198] (0/2) Epoch 14, batch 2500, loss[loss=0.2221, ctc_loss=0.1479, cr_loss=0.3706, over 20973.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.174, cr_loss=0.3902, over 4093045.18 frames. ], batch size: 48, lr: 6.03e-03, grad_scale: 32.0 2024-09-15 05:53:21,384 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.683e+02 2.007e+02 2.160e+02 2.318e+02 3.852e+02, threshold=4.320e+02, percent-clipped=0.0 2024-09-15 05:53:26,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=242476.66666666666, ans=0.025 2024-09-15 05:53:45,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=22.5 2024-09-15 05:54:09,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=242561.66666666666, ans=0.125 2024-09-15 05:54:13,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=242561.66666666666, ans=0.125 2024-09-15 05:54:28,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=242590.0, ans=0.0 2024-09-15 05:54:32,629 INFO [train.py:1198] (0/2) Epoch 14, batch 2550, loss[loss=0.219, ctc_loss=0.1502, cr_loss=0.3443, over 20979.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1741, cr_loss=0.3904, over 4097237.32 frames. ], batch size: 51, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 05:54:46,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2024-09-15 05:54:52,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=12.0 2024-09-15 05:55:01,411 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 05:55:16,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2024-09-15 05:55:45,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=242760.0, ans=0.025 2024-09-15 05:55:47,020 INFO [train.py:1198] (0/2) Epoch 14, batch 2600, loss[loss=0.2218, ctc_loss=0.1508, cr_loss=0.355, over 20981.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1742, cr_loss=0.391, over 4105486.84 frames. ], batch size: 51, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 05:55:47,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-09-15 05:55:53,011 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.722e+02 2.049e+02 2.221e+02 2.410e+02 3.900e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-15 05:56:22,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=242816.66666666666, ans=0.0 2024-09-15 05:56:26,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=242816.66666666666, ans=0.05 2024-09-15 05:56:32,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=242845.0, ans=0.125 2024-09-15 05:56:32,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=242845.0, ans=0.0 2024-09-15 05:56:49,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=242873.33333333334, ans=0.125 2024-09-15 05:56:55,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=242873.33333333334, ans=0.125 2024-09-15 05:56:57,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-09-15 05:57:02,603 INFO [train.py:1198] (0/2) Epoch 14, batch 2650, loss[loss=0.2607, ctc_loss=0.1788, cr_loss=0.4096, over 20979.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1737, cr_loss=0.3899, over 4098885.85 frames. ], batch size: 64, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 05:57:13,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=242901.66666666666, ans=0.125 2024-09-15 05:58:01,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242986.66666666666, ans=0.1 2024-09-15 05:58:20,784 INFO [train.py:1198] (0/2) Epoch 14, batch 2700, loss[loss=0.2344, ctc_loss=0.1615, cr_loss=0.3647, over 20904.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1735, cr_loss=0.39, over 4103438.17 frames. ], batch size: 54, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 05:58:26,742 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.713e+02 2.056e+02 2.186e+02 2.405e+02 3.188e+02, threshold=4.372e+02, percent-clipped=0.0 2024-09-15 05:58:40,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=243071.66666666666, ans=0.125 2024-09-15 05:58:53,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2024-09-15 05:59:34,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=243185.0, ans=0.125 2024-09-15 05:59:36,326 INFO [train.py:1198] (0/2) Epoch 14, batch 2750, loss[loss=0.2487, ctc_loss=0.1677, cr_loss=0.4052, over 21027.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1724, cr_loss=0.3887, over 4115451.48 frames. ], batch size: 52, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 05:59:47,449 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:00:09,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.82 vs. limit=8.0 2024-09-15 06:00:11,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=243241.66666666666, ans=0.2 2024-09-15 06:00:54,161 INFO [train.py:1198] (0/2) Epoch 14, batch 2800, loss[loss=0.2617, ctc_loss=0.1816, cr_loss=0.4006, over 20840.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1728, cr_loss=0.3889, over 4103921.86 frames. ], batch size: 59, lr: 6.02e-03, grad_scale: 32.0 2024-09-15 06:01:00,158 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.705e+02 1.994e+02 2.164e+02 2.389e+02 3.554e+02, threshold=4.328e+02, percent-clipped=0.0 2024-09-15 06:01:15,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=243355.0, ans=0.125 2024-09-15 06:01:15,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2024-09-15 06:01:27,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=243383.33333333334, ans=0.025 2024-09-15 06:01:41,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2024-09-15 06:01:47,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2024-09-15 06:02:03,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=243440.0, ans=0.125 2024-09-15 06:02:09,102 INFO [train.py:1198] (0/2) Epoch 14, batch 2850, loss[loss=0.2382, ctc_loss=0.1632, cr_loss=0.3748, over 21009.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1728, cr_loss=0.3889, over 4102176.52 frames. ], batch size: 61, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:02:16,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=243468.33333333334, ans=0.1 2024-09-15 06:02:56,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=243553.33333333334, ans=0.07 2024-09-15 06:03:03,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-15 06:03:26,362 INFO [train.py:1198] (0/2) Epoch 14, batch 2900, loss[loss=0.2335, ctc_loss=0.1582, cr_loss=0.3763, over 20971.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1736, cr_loss=0.3893, over 4087167.07 frames. ], batch size: 51, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:03:32,347 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.084e+02 2.234e+02 2.381e+02 7.796e+02, threshold=4.469e+02, percent-clipped=1.0 2024-09-15 06:03:44,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=243638.33333333334, ans=0.125 2024-09-15 06:03:56,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=243666.66666666666, ans=0.0 2024-09-15 06:04:12,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=243695.0, ans=0.125 2024-09-15 06:04:22,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=243695.0, ans=0.0 2024-09-15 06:04:29,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=243723.33333333334, ans=0.0 2024-09-15 06:04:38,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=243723.33333333334, ans=0.025 2024-09-15 06:04:41,259 INFO [train.py:1198] (0/2) Epoch 14, batch 2950, loss[loss=0.2428, ctc_loss=0.1653, cr_loss=0.3873, over 20883.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.174, cr_loss=0.3901, over 4093171.56 frames. ], batch size: 57, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:04:53,889 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:05:31,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=243836.66666666666, ans=0.125 2024-09-15 06:05:48,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243865.0, ans=0.1 2024-09-15 06:05:59,126 INFO [train.py:1198] (0/2) Epoch 14, batch 3000, loss[loss=0.2242, ctc_loss=0.1551, cr_loss=0.3455, over 21044.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1738, cr_loss=0.3895, over 4094191.64 frames. ], batch size: 53, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:05:59,126 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 06:06:30,067 INFO [train.py:1230] (0/2) Epoch 14, validation: loss=0.04865, ctc_loss=0.04865, cr_loss=9.82e-15, over 944034.00 frames. 2024-09-15 06:06:30,068 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 06:06:36,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.009e+02 2.131e+02 2.391e+02 3.022e+02, threshold=4.261e+02, percent-clipped=0.0 2024-09-15 06:06:42,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=243893.33333333334, ans=0.125 2024-09-15 06:07:09,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=243950.0, ans=0.125 2024-09-15 06:07:26,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2024-09-15 06:07:45,166 INFO [train.py:1198] (0/2) Epoch 14, batch 3050, loss[loss=0.2604, ctc_loss=0.1791, cr_loss=0.4068, over 21064.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1738, cr_loss=0.3898, over 4096628.42 frames. ], batch size: 59, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:07:56,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-15 06:08:32,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=244120.0, ans=0.0 2024-09-15 06:08:36,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=244120.0, ans=0.2 2024-09-15 06:08:42,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=244120.0, ans=0.025 2024-09-15 06:09:02,763 INFO [train.py:1198] (0/2) Epoch 14, batch 3100, loss[loss=0.2665, ctc_loss=0.1885, cr_loss=0.3898, over 20981.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1748, cr_loss=0.3909, over 4084184.66 frames. ], batch size: 67, lr: 6.01e-03, grad_scale: 32.0 2024-09-15 06:09:08,625 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.054e+02 2.242e+02 2.513e+02 3.967e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-15 06:09:10,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=244176.66666666666, ans=0.05 2024-09-15 06:09:40,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=244233.33333333334, ans=0.125 2024-09-15 06:09:43,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=244233.33333333334, ans=0.04949747468305833 2024-09-15 06:10:10,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=244290.0, ans=0.125 2024-09-15 06:10:17,626 INFO [train.py:1198] (0/2) Epoch 14, batch 3150, loss[loss=0.2529, ctc_loss=0.1737, cr_loss=0.3959, over 20800.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1743, cr_loss=0.3905, over 4084284.85 frames. ], batch size: 53, lr: 6.00e-03, grad_scale: 32.0 2024-09-15 06:10:27,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=244318.33333333334, ans=0.125 2024-09-15 06:10:48,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=244375.0, ans=10.0 2024-09-15 06:11:10,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=244403.33333333334, ans=0.0 2024-09-15 06:11:35,533 INFO [train.py:1198] (0/2) Epoch 14, batch 3200, loss[loss=0.2518, ctc_loss=0.1735, cr_loss=0.3915, over 20781.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1746, cr_loss=0.3909, over 4092379.33 frames. ], batch size: 56, lr: 6.00e-03, grad_scale: 32.0 2024-09-15 06:11:41,418 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.682e+02 2.096e+02 2.276e+02 2.509e+02 3.651e+02, threshold=4.553e+02, percent-clipped=0.0 2024-09-15 06:12:09,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2024-09-15 06:12:32,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.75 vs. limit=15.0 2024-09-15 06:12:49,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=244601.66666666666, ans=0.125 2024-09-15 06:12:50,351 INFO [train.py:1198] (0/2) Epoch 14, batch 3250, loss[loss=0.2448, ctc_loss=0.1702, cr_loss=0.3726, over 21067.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.175, cr_loss=0.3911, over 4091357.71 frames. ], batch size: 53, lr: 6.00e-03, grad_scale: 32.0 2024-09-15 06:13:16,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=244630.0, ans=0.2 2024-09-15 06:13:29,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=244658.33333333334, ans=0.025 2024-09-15 06:13:39,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=244686.66666666666, ans=0.125 2024-09-15 06:13:50,350 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:14:02,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=244715.0, ans=0.0 2024-09-15 06:14:05,048 INFO [train.py:1198] (0/2) Epoch 14, batch 3300, loss[loss=0.2226, ctc_loss=0.1479, cr_loss=0.3732, over 20360.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1742, cr_loss=0.3894, over 4080411.21 frames. ], batch size: 45, lr: 6.00e-03, grad_scale: 32.0 2024-09-15 06:14:11,007 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.081e+02 2.220e+02 2.406e+02 3.394e+02, threshold=4.440e+02, percent-clipped=0.0 2024-09-15 06:14:24,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=244771.66666666666, ans=0.1 2024-09-15 06:14:58,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=244828.33333333334, ans=0.125 2024-09-15 06:15:07,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-09-15 06:15:19,707 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-15 06:15:23,400 INFO [train.py:1198] (0/2) Epoch 14, batch 3350, loss[loss=0.2651, ctc_loss=0.1862, cr_loss=0.3949, over 19370.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1734, cr_loss=0.3881, over 4072431.33 frames. ], batch size: 90, lr: 6.00e-03, grad_scale: 32.0 2024-09-15 06:15:32,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=244885.0, ans=0.2 2024-09-15 06:15:44,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244913.33333333334, ans=0.1 2024-09-15 06:16:07,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=244970.0, ans=0.125 2024-09-15 06:16:16,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2024-09-15 06:16:24,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-09-15 06:16:32,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244998.33333333334, ans=0.1 2024-09-15 06:16:41,355 INFO [train.py:1198] (0/2) Epoch 14, batch 3400, loss[loss=0.3132, ctc_loss=0.2284, cr_loss=0.4244, over 14637.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.173, cr_loss=0.3877, over 4079642.42 frames. ], batch size: 150, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:16:47,523 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.093e+02 2.255e+02 2.468e+02 3.691e+02, threshold=4.510e+02, percent-clipped=0.0 2024-09-15 06:17:17,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=245083.33333333334, ans=0.125 2024-09-15 06:17:37,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=245111.66666666666, ans=0.1 2024-09-15 06:17:56,729 INFO [train.py:1198] (0/2) Epoch 14, batch 3450, loss[loss=0.2209, ctc_loss=0.1498, cr_loss=0.3555, over 21056.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1734, cr_loss=0.389, over 4094324.57 frames. ], batch size: 53, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:18:13,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=245196.66666666666, ans=0.0 2024-09-15 06:18:57,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-15 06:19:11,772 INFO [train.py:1198] (0/2) Epoch 14, batch 3500, loss[loss=0.2632, ctc_loss=0.1842, cr_loss=0.3951, over 20834.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1745, cr_loss=0.391, over 4096667.50 frames. ], batch size: 65, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:19:13,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=245310.0, ans=0.2 2024-09-15 06:19:17,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.025e+02 2.149e+02 2.398e+02 4.057e+02, threshold=4.299e+02, percent-clipped=0.0 2024-09-15 06:19:37,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=245338.33333333334, ans=0.0 2024-09-15 06:20:07,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=245395.0, ans=0.1 2024-09-15 06:20:18,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2024-09-15 06:20:27,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=245423.33333333334, ans=0.125 2024-09-15 06:20:29,959 INFO [train.py:1198] (0/2) Epoch 14, batch 3550, loss[loss=0.2526, ctc_loss=0.1732, cr_loss=0.3968, over 20971.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1745, cr_loss=0.3916, over 4098897.78 frames. ], batch size: 49, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:20:51,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=245480.0, ans=0.0 2024-09-15 06:21:45,370 INFO [train.py:1198] (0/2) Epoch 14, batch 3600, loss[loss=0.2588, ctc_loss=0.181, cr_loss=0.3892, over 20980.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1748, cr_loss=0.3924, over 4097353.04 frames. ], batch size: 58, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:21:50,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=245593.33333333334, ans=0.2 2024-09-15 06:21:51,359 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.011e+02 2.147e+02 2.307e+02 3.614e+02, threshold=4.295e+02, percent-clipped=0.0 2024-09-15 06:22:11,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=245621.66666666666, ans=0.07 2024-09-15 06:22:25,005 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2024-09-15 06:22:26,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=245650.0, ans=0.125 2024-09-15 06:22:31,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=245678.33333333334, ans=0.2 2024-09-15 06:22:36,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=245678.33333333334, ans=0.125 2024-09-15 06:22:44,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=245678.33333333334, ans=0.125 2024-09-15 06:22:47,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245706.66666666666, ans=0.1 2024-09-15 06:22:59,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=245706.66666666666, ans=0.125 2024-09-15 06:23:01,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=245735.0, ans=0.0 2024-09-15 06:23:03,203 INFO [train.py:1198] (0/2) Epoch 14, batch 3650, loss[loss=0.2141, ctc_loss=0.1411, cr_loss=0.365, over 19962.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1747, cr_loss=0.3923, over 4105873.59 frames. ], batch size: 44, lr: 5.99e-03, grad_scale: 32.0 2024-09-15 06:23:03,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=245735.0, ans=0.125 2024-09-15 06:23:51,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=245820.0, ans=0.125 2024-09-15 06:23:57,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245820.0, ans=0.1 2024-09-15 06:24:04,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2024-09-15 06:24:12,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=245848.33333333334, ans=0.125 2024-09-15 06:24:18,516 INFO [train.py:1198] (0/2) Epoch 14, batch 3700, loss[loss=0.2628, ctc_loss=0.1803, cr_loss=0.4126, over 20771.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1741, cr_loss=0.391, over 4103710.89 frames. ], batch size: 56, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:24:26,027 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.125e+02 2.305e+02 2.570e+02 9.865e+02, threshold=4.609e+02, percent-clipped=1.0 2024-09-15 06:25:06,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245961.66666666666, ans=0.1 2024-09-15 06:25:10,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-15 06:25:30,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.03 vs. limit=6.0 2024-09-15 06:25:33,990 INFO [train.py:1198] (0/2) Epoch 14, batch 3750, loss[loss=0.2159, ctc_loss=0.1479, cr_loss=0.3401, over 20995.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1732, cr_loss=0.3901, over 4106593.47 frames. ], batch size: 51, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:25:42,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.39 vs. limit=22.5 2024-09-15 06:25:58,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=246046.66666666666, ans=0.025 2024-09-15 06:26:07,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=246075.0, ans=0.1 2024-09-15 06:26:52,942 INFO [train.py:1198] (0/2) Epoch 14, batch 3800, loss[loss=0.2302, ctc_loss=0.1551, cr_loss=0.3759, over 20891.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1739, cr_loss=0.3906, over 4097348.55 frames. ], batch size: 54, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:27:00,515 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.653e+02 2.046e+02 2.162e+02 2.407e+02 1.230e+03, threshold=4.325e+02, percent-clipped=1.0 2024-09-15 06:27:26,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=246216.66666666666, ans=0.125 2024-09-15 06:27:30,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=246216.66666666666, ans=0.125 2024-09-15 06:28:04,265 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.84 vs. limit=22.5 2024-09-15 06:28:05,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-09-15 06:28:10,467 INFO [train.py:1198] (0/2) Epoch 14, batch 3850, loss[loss=0.2746, ctc_loss=0.1903, cr_loss=0.4213, over 20961.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1751, cr_loss=0.3917, over 4087413.62 frames. ], batch size: 64, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:28:59,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2024-09-15 06:29:09,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=246415.0, ans=0.2 2024-09-15 06:29:11,836 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.34 vs. limit=12.0 2024-09-15 06:29:26,110 INFO [train.py:1198] (0/2) Epoch 14, batch 3900, loss[loss=0.2662, ctc_loss=0.1806, cr_loss=0.4276, over 21067.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1753, cr_loss=0.391, over 4070216.53 frames. ], batch size: 59, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:29:33,463 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.090e+02 2.335e+02 2.590e+02 3.484e+02, threshold=4.670e+02, percent-clipped=0.0 2024-09-15 06:29:38,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2024-09-15 06:29:50,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=246471.66666666666, ans=0.025 2024-09-15 06:30:04,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=246500.0, ans=0.125 2024-09-15 06:30:14,732 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:30:33,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=246556.66666666666, ans=0.0 2024-09-15 06:30:41,072 INFO [train.py:1198] (0/2) Epoch 14, batch 3950, loss[loss=0.2174, ctc_loss=0.1467, cr_loss=0.3535, over 20950.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1745, cr_loss=0.3889, over 4054086.67 frames. ], batch size: 48, lr: 5.98e-03, grad_scale: 16.0 2024-09-15 06:30:41,568 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:30:50,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=246585.0, ans=0.0 2024-09-15 06:30:50,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.12 vs. limit=6.0 2024-09-15 06:31:00,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=246613.33333333334, ans=0.0 2024-09-15 06:31:19,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=246641.66666666666, ans=0.125 2024-09-15 06:31:22,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=246641.66666666666, ans=0.0 2024-09-15 06:31:56,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=246698.33333333334, ans=0.125 2024-09-15 06:31:56,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=246698.33333333334, ans=0.0 2024-09-15 06:31:58,573 INFO [train.py:1198] (0/2) Epoch 14, batch 4000, loss[loss=0.2898, ctc_loss=0.2083, cr_loss=0.4079, over 14804.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1739, cr_loss=0.3884, over 4065129.71 frames. ], batch size: 149, lr: 5.97e-03, grad_scale: 32.0 2024-09-15 06:32:01,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=246726.66666666666, ans=0.125 2024-09-15 06:32:06,008 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.001e+02 2.127e+02 2.354e+02 3.662e+02, threshold=4.253e+02, percent-clipped=0.0 2024-09-15 06:32:19,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=246755.0, ans=0.0 2024-09-15 06:32:27,314 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:33:13,720 INFO [train.py:1198] (0/2) Epoch 14, batch 4050, loss[loss=0.2401, ctc_loss=0.1617, cr_loss=0.3919, over 20882.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1735, cr_loss=0.3887, over 4078596.59 frames. ], batch size: 54, lr: 5.97e-03, grad_scale: 32.0 2024-09-15 06:33:42,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=246896.66666666666, ans=0.0 2024-09-15 06:34:10,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=246953.33333333334, ans=0.125 2024-09-15 06:34:24,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=246981.66666666666, ans=0.125 2024-09-15 06:34:31,839 INFO [train.py:1198] (0/2) Epoch 14, batch 4100, loss[loss=0.2067, ctc_loss=0.1383, cr_loss=0.3419, over 20979.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.174, cr_loss=0.39, over 4085965.98 frames. ], batch size: 55, lr: 5.97e-03, grad_scale: 16.0 2024-09-15 06:34:37,221 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=22.5 2024-09-15 06:34:40,767 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.018e+02 2.160e+02 2.373e+02 3.553e+02, threshold=4.319e+02, percent-clipped=0.0 2024-09-15 06:34:45,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=247038.33333333334, ans=0.125 2024-09-15 06:35:10,419 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=22.5 2024-09-15 06:35:31,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=247123.33333333334, ans=0.125 2024-09-15 06:35:34,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=247123.33333333334, ans=0.0 2024-09-15 06:35:46,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=247151.66666666666, ans=0.025 2024-09-15 06:35:47,584 INFO [train.py:1198] (0/2) Epoch 14, batch 4150, loss[loss=0.2601, ctc_loss=0.1811, cr_loss=0.3948, over 20711.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1731, cr_loss=0.3877, over 4079456.81 frames. ], batch size: 68, lr: 5.97e-03, grad_scale: 16.0 2024-09-15 06:36:00,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=12.0 2024-09-15 06:36:19,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=247208.33333333334, ans=0.025 2024-09-15 06:36:22,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=247208.33333333334, ans=0.125 2024-09-15 06:36:29,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2024-09-15 06:36:37,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=247236.66666666666, ans=0.2 2024-09-15 06:37:02,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=247265.0, ans=0.025 2024-09-15 06:37:06,382 INFO [train.py:1198] (0/2) Epoch 14, batch 4200, loss[loss=0.2654, ctc_loss=0.184, cr_loss=0.4069, over 20658.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1722, cr_loss=0.3875, over 4090647.01 frames. ], batch size: 68, lr: 5.97e-03, grad_scale: 16.0 2024-09-15 06:37:15,284 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.044e+02 2.236e+02 2.474e+02 3.327e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-15 06:37:27,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=247321.66666666666, ans=0.0 2024-09-15 06:38:15,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=247406.66666666666, ans=0.025 2024-09-15 06:38:18,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=247406.66666666666, ans=0.0 2024-09-15 06:38:21,417 INFO [train.py:1198] (0/2) Epoch 14, batch 4250, loss[loss=0.2207, ctc_loss=0.1509, cr_loss=0.3489, over 21054.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1723, cr_loss=0.3882, over 4088370.39 frames. ], batch size: 56, lr: 5.97e-03, grad_scale: 16.0 2024-09-15 06:38:23,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=247435.0, ans=0.05 2024-09-15 06:38:42,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=247463.33333333334, ans=0.125 2024-09-15 06:38:47,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=247463.33333333334, ans=0.125 2024-09-15 06:39:05,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=247491.66666666666, ans=0.125 2024-09-15 06:39:05,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-09-15 06:39:15,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=247520.0, ans=0.125 2024-09-15 06:39:39,251 INFO [train.py:1198] (0/2) Epoch 14, batch 4300, loss[loss=0.2742, ctc_loss=0.1903, cr_loss=0.4197, over 21019.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1724, cr_loss=0.3886, over 4094338.96 frames. ], batch size: 63, lr: 5.96e-03, grad_scale: 16.0 2024-09-15 06:39:39,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=247576.66666666666, ans=0.125 2024-09-15 06:39:48,230 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.070e+02 2.227e+02 2.452e+02 3.530e+02, threshold=4.454e+02, percent-clipped=0.0 2024-09-15 06:40:05,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=22.5 2024-09-15 06:40:27,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=247661.66666666666, ans=0.125 2024-09-15 06:40:49,342 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.21 vs. limit=22.5 2024-09-15 06:40:54,168 INFO [train.py:1198] (0/2) Epoch 14, batch 4350, loss[loss=0.2405, ctc_loss=0.1621, cr_loss=0.3923, over 20792.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1724, cr_loss=0.3887, over 4101295.25 frames. ], batch size: 53, lr: 5.96e-03, grad_scale: 16.0 2024-09-15 06:40:57,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=247718.33333333334, ans=0.0 2024-09-15 06:41:06,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=247718.33333333334, ans=0.0 2024-09-15 06:41:49,383 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=22.5 2024-09-15 06:41:57,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=247831.66666666666, ans=0.0 2024-09-15 06:41:57,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=247831.66666666666, ans=0.95 2024-09-15 06:42:09,608 INFO [train.py:1198] (0/2) Epoch 14, batch 4400, loss[loss=0.2244, ctc_loss=0.1521, cr_loss=0.3612, over 20961.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1722, cr_loss=0.3877, over 4105260.16 frames. ], batch size: 55, lr: 5.96e-03, grad_scale: 32.0 2024-09-15 06:42:12,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=247860.0, ans=0.125 2024-09-15 06:42:18,683 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.032e+02 2.155e+02 2.365e+02 1.102e+03, threshold=4.310e+02, percent-clipped=1.0 2024-09-15 06:42:29,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=247888.33333333334, ans=10.0 2024-09-15 06:42:34,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=247888.33333333334, ans=0.125 2024-09-15 06:43:01,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=247945.0, ans=0.1 2024-09-15 06:43:16,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=247973.33333333334, ans=0.125 2024-09-15 06:43:26,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=248001.66666666666, ans=0.1 2024-09-15 06:43:27,594 INFO [train.py:1198] (0/2) Epoch 14, batch 4450, loss[loss=0.2712, ctc_loss=0.19, cr_loss=0.406, over 21015.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1736, cr_loss=0.3895, over 4097531.19 frames. ], batch size: 63, lr: 5.96e-03, grad_scale: 32.0 2024-09-15 06:43:31,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=248001.66666666666, ans=0.07 2024-09-15 06:43:46,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.16 vs. limit=10.0 2024-09-15 06:44:34,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=248115.0, ans=0.125 2024-09-15 06:44:45,694 INFO [train.py:1198] (0/2) Epoch 14, batch 4500, loss[loss=0.2525, ctc_loss=0.173, cr_loss=0.3973, over 20705.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1736, cr_loss=0.3894, over 4097253.02 frames. ], batch size: 71, lr: 5.96e-03, grad_scale: 32.0 2024-09-15 06:44:49,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=22.5 2024-09-15 06:44:54,780 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.109e+02 2.256e+02 2.511e+02 3.955e+02, threshold=4.512e+02, percent-clipped=0.0 2024-09-15 06:45:04,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=248171.66666666666, ans=0.125 2024-09-15 06:45:10,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=248171.66666666666, ans=0.1 2024-09-15 06:45:11,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=248171.66666666666, ans=0.025 2024-09-15 06:46:01,294 INFO [train.py:1198] (0/2) Epoch 14, batch 4550, loss[loss=0.2865, ctc_loss=0.1972, cr_loss=0.4467, over 20874.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1727, cr_loss=0.3877, over 4090695.38 frames. ], batch size: 65, lr: 5.96e-03, grad_scale: 32.0 2024-09-15 06:46:01,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=248285.0, ans=0.125 2024-09-15 06:46:27,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=248313.33333333334, ans=0.125 2024-09-15 06:46:31,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-15 06:47:16,746 INFO [train.py:1198] (0/2) Epoch 14, batch 4600, loss[loss=0.2675, ctc_loss=0.1841, cr_loss=0.4172, over 20026.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1728, cr_loss=0.3875, over 4074771.55 frames. ], batch size: 80, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:47:25,874 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.058e+02 2.228e+02 2.403e+02 3.695e+02, threshold=4.455e+02, percent-clipped=0.0 2024-09-15 06:47:43,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=22.5 2024-09-15 06:47:53,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=248483.33333333334, ans=0.1 2024-09-15 06:48:29,179 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2024-09-15 06:48:34,270 INFO [train.py:1198] (0/2) Epoch 14, batch 4650, loss[loss=0.2533, ctc_loss=0.1736, cr_loss=0.3987, over 21050.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1732, cr_loss=0.3887, over 4077947.87 frames. ], batch size: 62, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:48:57,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=248596.66666666666, ans=0.0 2024-09-15 06:49:06,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=248625.0, ans=0.1 2024-09-15 06:49:29,435 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:49:50,411 INFO [train.py:1198] (0/2) Epoch 14, batch 4700, loss[loss=0.234, ctc_loss=0.162, cr_loss=0.36, over 20803.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1727, cr_loss=0.3883, over 4089888.29 frames. ], batch size: 56, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:49:59,260 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.024e+02 2.186e+02 2.369e+02 4.541e+02, threshold=4.371e+02, percent-clipped=1.0 2024-09-15 06:50:05,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=248738.33333333334, ans=0.125 2024-09-15 06:50:07,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=248738.33333333334, ans=0.0 2024-09-15 06:50:28,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=248766.66666666666, ans=0.2 2024-09-15 06:50:34,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=248766.66666666666, ans=0.125 2024-09-15 06:50:54,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=248823.33333333334, ans=10.0 2024-09-15 06:50:57,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=248823.33333333334, ans=0.2 2024-09-15 06:51:00,257 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 06:51:08,852 INFO [train.py:1198] (0/2) Epoch 14, batch 4750, loss[loss=0.2237, ctc_loss=0.1495, cr_loss=0.3714, over 20880.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1725, cr_loss=0.3881, over 4092546.65 frames. ], batch size: 54, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:51:18,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=248851.66666666666, ans=0.0 2024-09-15 06:52:21,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=248965.0, ans=0.0 2024-09-15 06:52:23,997 INFO [train.py:1198] (0/2) Epoch 14, batch 4800, loss[loss=0.2711, ctc_loss=0.1886, cr_loss=0.4128, over 20962.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1726, cr_loss=0.3884, over 4091198.62 frames. ], batch size: 58, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:52:33,197 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.079e+02 2.188e+02 2.473e+02 4.707e+02, threshold=4.376e+02, percent-clipped=1.0 2024-09-15 06:52:56,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=249050.0, ans=0.125 2024-09-15 06:53:05,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=249050.0, ans=0.125 2024-09-15 06:53:10,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=249078.33333333334, ans=0.2 2024-09-15 06:53:19,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249078.33333333334, ans=0.1 2024-09-15 06:53:19,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=249078.33333333334, ans=0.05 2024-09-15 06:53:40,004 INFO [train.py:1198] (0/2) Epoch 14, batch 4850, loss[loss=0.2825, ctc_loss=0.2003, cr_loss=0.4107, over 18157.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1731, cr_loss=0.3893, over 4088448.46 frames. ], batch size: 108, lr: 5.95e-03, grad_scale: 32.0 2024-09-15 06:54:05,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=249163.33333333334, ans=0.025 2024-09-15 06:54:46,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.79 vs. limit=12.0 2024-09-15 06:54:57,545 INFO [train.py:1198] (0/2) Epoch 14, batch 4900, loss[loss=0.264, ctc_loss=0.1839, cr_loss=0.4008, over 20296.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1725, cr_loss=0.3892, over 4090616.80 frames. ], batch size: 74, lr: 5.94e-03, grad_scale: 32.0 2024-09-15 06:55:03,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249276.66666666666, ans=0.1 2024-09-15 06:55:06,666 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 1.997e+02 2.129e+02 2.287e+02 3.273e+02, threshold=4.258e+02, percent-clipped=0.0 2024-09-15 06:55:26,055 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-88000.pt 2024-09-15 06:55:29,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=249333.33333333334, ans=0.0 2024-09-15 06:56:06,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=249390.0, ans=0.0 2024-09-15 06:56:07,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=249390.0, ans=0.2 2024-09-15 06:56:12,685 INFO [train.py:1198] (0/2) Epoch 14, batch 4950, loss[loss=0.2554, ctc_loss=0.1758, cr_loss=0.3976, over 21048.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.172, cr_loss=0.3884, over 4097696.25 frames. ], batch size: 62, lr: 5.94e-03, grad_scale: 32.0 2024-09-15 06:56:33,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=249446.66666666666, ans=0.125 2024-09-15 06:56:58,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=249503.33333333334, ans=0.125 2024-09-15 06:57:29,706 INFO [train.py:1198] (0/2) Epoch 14, batch 5000, loss[loss=0.2361, ctc_loss=0.1603, cr_loss=0.3793, over 20816.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1719, cr_loss=0.3886, over 4100918.12 frames. ], batch size: 53, lr: 5.94e-03, grad_scale: 32.0 2024-09-15 06:57:40,063 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.060e+02 2.176e+02 2.400e+02 3.518e+02, threshold=4.352e+02, percent-clipped=0.0 2024-09-15 06:57:43,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=249588.33333333334, ans=0.125 2024-09-15 06:57:49,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249588.33333333334, ans=0.1 2024-09-15 06:57:57,221 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.33 vs. limit=10.0 2024-09-15 06:58:43,844 INFO [train.py:1198] (0/2) Epoch 14, batch 5050, loss[loss=0.2703, ctc_loss=0.1817, cr_loss=0.4428, over 20965.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1732, cr_loss=0.3895, over 4081598.61 frames. ], batch size: 55, lr: 5.94e-03, grad_scale: 16.0 2024-09-15 06:58:52,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=249701.66666666666, ans=0.125 2024-09-15 06:59:06,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=249730.0, ans=0.0 2024-09-15 06:59:19,988 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=12.0 2024-09-15 06:59:57,836 INFO [train.py:1198] (0/2) Epoch 14, batch 5100, loss[loss=0.261, ctc_loss=0.182, cr_loss=0.3948, over 20836.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1745, cr_loss=0.3917, over 4077246.03 frames. ], batch size: 65, lr: 5.94e-03, grad_scale: 16.0 2024-09-15 07:00:07,982 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.085e+02 2.221e+02 2.455e+02 4.461e+02, threshold=4.442e+02, percent-clipped=1.0 2024-09-15 07:00:08,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=249843.33333333334, ans=0.0 2024-09-15 07:00:53,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=249928.33333333334, ans=0.125 2024-09-15 07:00:55,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=249928.33333333334, ans=0.125 2024-09-15 07:01:05,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=249956.66666666666, ans=0.125 2024-09-15 07:01:05,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2024-09-15 07:01:12,424 INFO [train.py:1198] (0/2) Epoch 14, batch 5150, loss[loss=0.2213, ctc_loss=0.1512, cr_loss=0.3506, over 20928.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.1737, cr_loss=0.3914, over 4094955.65 frames. ], batch size: 50, lr: 5.94e-03, grad_scale: 16.0 2024-09-15 07:01:15,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=249985.0, ans=0.0 2024-09-15 07:02:05,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-09-15 07:02:10,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=250098.33333333334, ans=0.125 2024-09-15 07:02:19,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=250098.33333333334, ans=0.0 2024-09-15 07:02:25,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=250126.66666666666, ans=0.1 2024-09-15 07:02:26,323 INFO [train.py:1198] (0/2) Epoch 14, batch 5200, loss[loss=0.2213, ctc_loss=0.1509, cr_loss=0.3519, over 20971.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1733, cr_loss=0.3903, over 4095686.45 frames. ], batch size: 55, lr: 5.93e-03, grad_scale: 32.0 2024-09-15 07:02:26,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=250126.66666666666, ans=0.125 2024-09-15 07:02:30,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=250126.66666666666, ans=0.125 2024-09-15 07:02:36,366 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.076e+02 2.222e+02 2.520e+02 3.444e+02, threshold=4.444e+02, percent-clipped=0.0 2024-09-15 07:02:42,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=250155.0, ans=0.0 2024-09-15 07:03:27,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=250240.0, ans=0.025 2024-09-15 07:03:39,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=250268.33333333334, ans=0.0 2024-09-15 07:03:40,804 INFO [train.py:1198] (0/2) Epoch 14, batch 5250, loss[loss=0.2486, ctc_loss=0.1717, cr_loss=0.3844, over 20981.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1719, cr_loss=0.3879, over 4106171.79 frames. ], batch size: 55, lr: 5.93e-03, grad_scale: 32.0 2024-09-15 07:03:48,570 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2024-09-15 07:03:49,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=250268.33333333334, ans=0.125 2024-09-15 07:04:41,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=250381.66666666666, ans=0.0 2024-09-15 07:04:42,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=250381.66666666666, ans=0.2 2024-09-15 07:04:45,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=250381.66666666666, ans=0.125 2024-09-15 07:04:57,654 INFO [train.py:1198] (0/2) Epoch 14, batch 5300, loss[loss=0.2743, ctc_loss=0.1887, cr_loss=0.428, over 20919.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1729, cr_loss=0.3889, over 4099857.75 frames. ], batch size: 67, lr: 5.93e-03, grad_scale: 16.0 2024-09-15 07:05:00,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=250410.0, ans=0.0 2024-09-15 07:05:09,303 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.047e+02 2.170e+02 2.342e+02 4.334e+02, threshold=4.340e+02, percent-clipped=0.0 2024-09-15 07:05:09,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=250410.0, ans=0.125 2024-09-15 07:05:38,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=250466.66666666666, ans=0.0 2024-09-15 07:06:06,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=250523.33333333334, ans=0.125 2024-09-15 07:06:11,618 INFO [train.py:1198] (0/2) Epoch 14, batch 5350, loss[loss=0.2604, ctc_loss=0.1798, cr_loss=0.4031, over 20317.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1736, cr_loss=0.3903, over 4091754.03 frames. ], batch size: 74, lr: 5.93e-03, grad_scale: 16.0 2024-09-15 07:07:03,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=250636.66666666666, ans=0.125 2024-09-15 07:07:12,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=250665.0, ans=0.025 2024-09-15 07:07:14,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-09-15 07:07:28,163 INFO [train.py:1198] (0/2) Epoch 14, batch 5400, loss[loss=0.2479, ctc_loss=0.1666, cr_loss=0.4063, over 21065.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1733, cr_loss=0.3897, over 4090381.80 frames. ], batch size: 59, lr: 5.93e-03, grad_scale: 16.0 2024-09-15 07:07:40,097 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.761e+02 2.016e+02 2.163e+02 2.322e+02 4.968e+02, threshold=4.325e+02, percent-clipped=1.0 2024-09-15 07:07:58,570 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2024-09-15 07:08:17,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=250778.33333333334, ans=0.125 2024-09-15 07:08:28,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=250806.66666666666, ans=0.125 2024-09-15 07:08:30,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=250806.66666666666, ans=0.125 2024-09-15 07:08:41,481 INFO [train.py:1198] (0/2) Epoch 14, batch 5450, loss[loss=0.2202, ctc_loss=0.1473, cr_loss=0.3641, over 20936.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1733, cr_loss=0.3895, over 4093110.25 frames. ], batch size: 49, lr: 5.93e-03, grad_scale: 16.0 2024-09-15 07:08:59,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=250863.33333333334, ans=0.2 2024-09-15 07:09:02,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=250863.33333333334, ans=0.0 2024-09-15 07:09:18,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=250891.66666666666, ans=0.0 2024-09-15 07:09:24,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=250920.0, ans=0.125 2024-09-15 07:09:30,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=250920.0, ans=0.0 2024-09-15 07:09:43,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.44 vs. limit=15.0 2024-09-15 07:09:55,817 INFO [train.py:1198] (0/2) Epoch 14, batch 5500, loss[loss=0.2479, ctc_loss=0.1714, cr_loss=0.3825, over 20828.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1722, cr_loss=0.3882, over 4101651.11 frames. ], batch size: 59, lr: 5.92e-03, grad_scale: 16.0 2024-09-15 07:10:03,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=250976.66666666666, ans=0.125 2024-09-15 07:10:06,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=250976.66666666666, ans=0.125 2024-09-15 07:10:07,552 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.084e+02 2.215e+02 2.384e+02 4.753e+02, threshold=4.429e+02, percent-clipped=1.0 2024-09-15 07:10:16,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251005.0, ans=0.1 2024-09-15 07:10:36,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=251033.33333333334, ans=0.125 2024-09-15 07:11:05,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=251090.0, ans=0.09899494936611666 2024-09-15 07:11:09,776 INFO [train.py:1198] (0/2) Epoch 14, batch 5550, loss[loss=0.2312, ctc_loss=0.1571, cr_loss=0.3704, over 20971.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1725, cr_loss=0.3889, over 4112052.47 frames. ], batch size: 58, lr: 5.92e-03, grad_scale: 16.0 2024-09-15 07:11:21,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=251118.33333333334, ans=0.125 2024-09-15 07:11:26,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=251146.66666666666, ans=0.125 2024-09-15 07:11:40,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=22.5 2024-09-15 07:12:07,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=251231.66666666666, ans=0.125 2024-09-15 07:12:08,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=251231.66666666666, ans=0.0 2024-09-15 07:12:11,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=251231.66666666666, ans=0.0 2024-09-15 07:12:23,637 INFO [train.py:1198] (0/2) Epoch 14, batch 5600, loss[loss=0.1786, ctc_loss=0.1192, cr_loss=0.2973, over 20960.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1728, cr_loss=0.3886, over 4095219.71 frames. ], batch size: 49, lr: 5.92e-03, grad_scale: 32.0 2024-09-15 07:12:27,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=251260.0, ans=0.0 2024-09-15 07:12:28,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=251260.0, ans=0.04949747468305833 2024-09-15 07:12:32,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=251260.0, ans=0.125 2024-09-15 07:12:35,412 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.071e+02 2.210e+02 2.433e+02 4.468e+02, threshold=4.420e+02, percent-clipped=1.0 2024-09-15 07:12:38,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=251288.33333333334, ans=0.125 2024-09-15 07:13:02,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=251316.66666666666, ans=0.2 2024-09-15 07:13:12,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=251345.0, ans=0.125 2024-09-15 07:13:18,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.29 vs. limit=12.0 2024-09-15 07:13:34,712 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2024-09-15 07:13:34,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-15 07:13:40,038 INFO [train.py:1198] (0/2) Epoch 14, batch 5650, loss[loss=0.2506, ctc_loss=0.176, cr_loss=0.3732, over 21080.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1727, cr_loss=0.3891, over 4095287.79 frames. ], batch size: 59, lr: 5.92e-03, grad_scale: 32.0 2024-09-15 07:13:46,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=251401.66666666666, ans=0.2 2024-09-15 07:13:55,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=251430.0, ans=0.1 2024-09-15 07:14:33,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=251486.66666666666, ans=0.1 2024-09-15 07:14:38,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=251515.0, ans=0.0 2024-09-15 07:14:52,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-09-15 07:14:54,264 INFO [train.py:1198] (0/2) Epoch 14, batch 5700, loss[loss=0.2522, ctc_loss=0.1753, cr_loss=0.3845, over 20268.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1723, cr_loss=0.3884, over 4091399.07 frames. ], batch size: 74, lr: 5.92e-03, grad_scale: 32.0 2024-09-15 07:14:59,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=251543.33333333334, ans=15.0 2024-09-15 07:15:05,986 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.065e+02 2.164e+02 2.402e+02 3.994e+02, threshold=4.329e+02, percent-clipped=0.0 2024-09-15 07:15:18,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=251571.66666666666, ans=0.0 2024-09-15 07:15:39,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=251628.33333333334, ans=0.2 2024-09-15 07:16:09,348 INFO [train.py:1198] (0/2) Epoch 14, batch 5750, loss[loss=0.3033, ctc_loss=0.2132, cr_loss=0.4509, over 18034.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1726, cr_loss=0.3896, over 4076233.13 frames. ], batch size: 108, lr: 5.92e-03, grad_scale: 32.0 2024-09-15 07:16:25,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=251713.33333333334, ans=0.125 2024-09-15 07:16:37,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=251741.66666666666, ans=0.0 2024-09-15 07:16:49,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=251741.66666666666, ans=0.025 2024-09-15 07:16:51,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=251770.0, ans=0.125 2024-09-15 07:16:54,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=251770.0, ans=0.2 2024-09-15 07:17:03,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=251770.0, ans=0.125 2024-09-15 07:17:22,954 INFO [train.py:1198] (0/2) Epoch 14, batch 5800, loss[loss=0.214, ctc_loss=0.1457, cr_loss=0.342, over 21000.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1732, cr_loss=0.3905, over 4072043.69 frames. ], batch size: 52, lr: 5.91e-03, grad_scale: 32.0 2024-09-15 07:17:29,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=251826.66666666666, ans=0.04949747468305833 2024-09-15 07:17:34,723 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.056e+02 2.207e+02 2.375e+02 3.164e+02, threshold=4.414e+02, percent-clipped=0.0 2024-09-15 07:17:38,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=251855.0, ans=0.125 2024-09-15 07:17:45,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=251855.0, ans=0.0 2024-09-15 07:17:46,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=251855.0, ans=0.0 2024-09-15 07:17:52,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=251883.33333333334, ans=0.0 2024-09-15 07:17:55,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=251883.33333333334, ans=0.0 2024-09-15 07:18:13,119 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 07:18:23,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=251940.0, ans=0.0 2024-09-15 07:18:36,474 INFO [train.py:1198] (0/2) Epoch 14, batch 5850, loss[loss=0.2545, ctc_loss=0.1761, cr_loss=0.3918, over 21065.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1731, cr_loss=0.39, over 4067736.51 frames. ], batch size: 59, lr: 5.91e-03, grad_scale: 16.0 2024-09-15 07:18:38,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=22.5 2024-09-15 07:18:47,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2024-09-15 07:19:16,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=22.5 2024-09-15 07:19:25,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=12.0 2024-09-15 07:19:31,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=252053.33333333334, ans=0.1 2024-09-15 07:19:38,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=252081.66666666666, ans=0.0 2024-09-15 07:19:50,219 INFO [train.py:1198] (0/2) Epoch 14, batch 5900, loss[loss=0.2119, ctc_loss=0.1427, cr_loss=0.3463, over 19895.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1725, cr_loss=0.3883, over 4071283.13 frames. ], batch size: 44, lr: 5.91e-03, grad_scale: 16.0 2024-09-15 07:20:03,433 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.085e+02 2.209e+02 2.390e+02 5.116e+02, threshold=4.418e+02, percent-clipped=1.0 2024-09-15 07:20:15,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=252138.33333333334, ans=0.2 2024-09-15 07:20:26,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=252166.66666666666, ans=0.125 2024-09-15 07:20:32,533 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-09-15 07:21:03,928 INFO [train.py:1198] (0/2) Epoch 14, batch 5950, loss[loss=0.2234, ctc_loss=0.1507, cr_loss=0.3633, over 20791.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1741, cr_loss=0.3901, over 4064670.36 frames. ], batch size: 53, lr: 5.91e-03, grad_scale: 16.0 2024-09-15 07:21:04,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=252251.66666666666, ans=0.05 2024-09-15 07:21:07,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-09-15 07:21:25,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=252280.0, ans=0.2 2024-09-15 07:22:19,602 INFO [train.py:1198] (0/2) Epoch 14, batch 6000, loss[loss=0.2385, ctc_loss=0.1636, cr_loss=0.3749, over 20893.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1744, cr_loss=0.3901, over 4050432.95 frames. ], batch size: 54, lr: 5.91e-03, grad_scale: 32.0 2024-09-15 07:22:19,603 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 07:22:46,096 INFO [train.py:1230] (0/2) Epoch 14, validation: loss=0.04719, ctc_loss=0.04719, cr_loss=9.826e-15, over 944034.00 frames. 2024-09-15 07:22:46,097 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 07:22:57,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=252393.33333333334, ans=0.05 2024-09-15 07:23:01,736 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.076e+02 2.229e+02 2.452e+02 3.696e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-15 07:23:05,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=252421.66666666666, ans=0.0 2024-09-15 07:23:06,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=252421.66666666666, ans=0.0 2024-09-15 07:23:38,166 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 07:23:44,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=22.5 2024-09-15 07:23:45,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=252478.33333333334, ans=0.0 2024-09-15 07:23:46,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.66 vs. limit=5.0 2024-09-15 07:23:54,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=252506.66666666666, ans=0.125 2024-09-15 07:24:02,646 INFO [train.py:1198] (0/2) Epoch 14, batch 6050, loss[loss=0.2774, ctc_loss=0.1904, cr_loss=0.435, over 21013.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1739, cr_loss=0.3904, over 4066753.72 frames. ], batch size: 63, lr: 5.91e-03, grad_scale: 32.0 2024-09-15 07:24:07,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=252535.0, ans=0.125 2024-09-15 07:24:44,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=252591.66666666666, ans=0.2 2024-09-15 07:25:16,314 INFO [train.py:1198] (0/2) Epoch 14, batch 6100, loss[loss=0.2142, ctc_loss=0.1485, cr_loss=0.3285, over 20951.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1748, cr_loss=0.392, over 4066741.04 frames. ], batch size: 51, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:25:17,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=252676.66666666666, ans=0.125 2024-09-15 07:25:29,244 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.689e+02 2.050e+02 2.187e+02 2.369e+02 4.237e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-15 07:25:35,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=252705.0, ans=0.125 2024-09-15 07:25:38,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=252705.0, ans=0.125 2024-09-15 07:25:38,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2024-09-15 07:25:59,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-09-15 07:26:24,291 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-15 07:26:29,725 INFO [train.py:1198] (0/2) Epoch 14, batch 6150, loss[loss=0.2139, ctc_loss=0.1454, cr_loss=0.3424, over 21067.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.175, cr_loss=0.392, over 4063320.64 frames. ], batch size: 56, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:26:36,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=252818.33333333334, ans=0.1 2024-09-15 07:27:02,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=252875.0, ans=0.125 2024-09-15 07:27:05,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=252875.0, ans=0.2 2024-09-15 07:27:09,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=252875.0, ans=0.2 2024-09-15 07:27:14,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=252903.33333333334, ans=0.125 2024-09-15 07:27:24,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=252903.33333333334, ans=0.0 2024-09-15 07:27:40,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=252931.66666666666, ans=0.95 2024-09-15 07:27:42,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=252960.0, ans=0.125 2024-09-15 07:27:43,188 INFO [train.py:1198] (0/2) Epoch 14, batch 6200, loss[loss=0.2722, ctc_loss=0.1898, cr_loss=0.4118, over 19589.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.175, cr_loss=0.3913, over 4057964.74 frames. ], batch size: 90, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:27:44,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=252960.0, ans=0.025 2024-09-15 07:27:56,270 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.054e+02 2.224e+02 2.506e+02 4.746e+02, threshold=4.447e+02, percent-clipped=2.0 2024-09-15 07:28:57,451 INFO [train.py:1198] (0/2) Epoch 14, batch 6250, loss[loss=0.2452, ctc_loss=0.1657, cr_loss=0.3973, over 20768.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1744, cr_loss=0.3905, over 4049014.33 frames. ], batch size: 53, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:29:18,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=253130.0, ans=0.5 2024-09-15 07:30:11,503 INFO [train.py:1198] (0/2) Epoch 14, batch 6300, loss[loss=0.3126, ctc_loss=0.2307, cr_loss=0.4098, over 13992.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1747, cr_loss=0.3906, over 4036067.68 frames. ], batch size: 150, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:30:16,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=253243.33333333334, ans=0.2 2024-09-15 07:30:25,107 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.051e+02 2.204e+02 2.467e+02 4.348e+02, threshold=4.408e+02, percent-clipped=0.0 2024-09-15 07:30:40,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=253300.0, ans=0.125 2024-09-15 07:31:04,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=253328.33333333334, ans=0.125 2024-09-15 07:31:25,672 INFO [train.py:1198] (0/2) Epoch 14, batch 6350, loss[loss=0.2944, ctc_loss=0.2107, cr_loss=0.4187, over 14331.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1778, cr_loss=0.3927, over 3962210.61 frames. ], batch size: 149, lr: 5.90e-03, grad_scale: 32.0 2024-09-15 07:32:14,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2024-09-15 07:32:19,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=253470.0, ans=0.025 2024-09-15 07:32:23,531 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-14.pt 2024-09-15 07:33:10,802 INFO [train.py:1198] (0/2) Epoch 15, batch 0, loss[loss=0.2782, ctc_loss=0.1917, cr_loss=0.4327, over 20066.00 frames. ], tot_loss[loss=0.2782, ctc_loss=0.1917, cr_loss=0.4327, over 20066.00 frames. ], batch size: 80, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:33:10,803 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 07:33:29,093 INFO [train.py:1230] (0/2) Epoch 15, validation: loss=0.04831, ctc_loss=0.04831, cr_loss=9.472e-15, over 944034.00 frames. 2024-09-15 07:33:29,093 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 07:33:30,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=253501.16666666666, ans=0.2 2024-09-15 07:33:34,237 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-15 07:33:56,379 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.310e+02 2.553e+02 2.783e+02 4.142e+02, threshold=5.106e+02, percent-clipped=0.0 2024-09-15 07:34:07,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=253557.83333333334, ans=0.125 2024-09-15 07:34:12,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0 2024-09-15 07:34:23,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=253586.16666666666, ans=0.1 2024-09-15 07:34:32,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=253614.5, ans=0.07 2024-09-15 07:34:42,076 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 07:34:46,211 INFO [train.py:1198] (0/2) Epoch 15, batch 50, loss[loss=0.2337, ctc_loss=0.1566, cr_loss=0.3854, over 21081.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1751, cr_loss=0.393, over 918355.11 frames. ], batch size: 53, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:34:47,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=253642.83333333334, ans=0.1 2024-09-15 07:34:52,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-09-15 07:34:59,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=253671.16666666666, ans=0.2 2024-09-15 07:35:17,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=253699.5, ans=0.125 2024-09-15 07:36:00,849 INFO [train.py:1198] (0/2) Epoch 15, batch 100, loss[loss=0.2411, ctc_loss=0.1654, cr_loss=0.3785, over 20877.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.178, cr_loss=0.3986, over 1619614.42 frames. ], batch size: 57, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:36:13,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=253784.5, ans=0.125 2024-09-15 07:36:27,769 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.058e+02 2.257e+02 2.510e+02 3.786e+02, threshold=4.515e+02, percent-clipped=0.0 2024-09-15 07:36:37,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=253841.16666666666, ans=0.0 2024-09-15 07:37:18,675 INFO [train.py:1198] (0/2) Epoch 15, batch 150, loss[loss=0.2196, ctc_loss=0.1506, cr_loss=0.3449, over 20967.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1747, cr_loss=0.3942, over 2165637.87 frames. ], batch size: 50, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:38:16,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=254011.16666666666, ans=0.2 2024-09-15 07:38:34,114 INFO [train.py:1198] (0/2) Epoch 15, batch 200, loss[loss=0.2359, ctc_loss=0.161, cr_loss=0.3748, over 21083.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1721, cr_loss=0.3891, over 2580512.86 frames. ], batch size: 59, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:39:00,994 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.702e+02 1.990e+02 2.141e+02 2.354e+02 4.678e+02, threshold=4.282e+02, percent-clipped=1.0 2024-09-15 07:39:17,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=254152.83333333334, ans=0.2 2024-09-15 07:39:34,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=254181.16666666666, ans=0.0 2024-09-15 07:39:36,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-15 07:39:48,709 INFO [train.py:1198] (0/2) Epoch 15, batch 250, loss[loss=0.2322, ctc_loss=0.1573, cr_loss=0.3744, over 20772.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.172, cr_loss=0.3892, over 2918175.20 frames. ], batch size: 56, lr: 5.69e-03, grad_scale: 32.0 2024-09-15 07:40:02,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=254237.83333333334, ans=15.0 2024-09-15 07:40:43,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254294.5, ans=0.1 2024-09-15 07:41:07,339 INFO [train.py:1198] (0/2) Epoch 15, batch 300, loss[loss=0.2649, ctc_loss=0.1823, cr_loss=0.4126, over 20986.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1718, cr_loss=0.3895, over 3185351.20 frames. ], batch size: 58, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:41:07,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=254351.16666666666, ans=0.0 2024-09-15 07:41:22,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=254379.5, ans=0.125 2024-09-15 07:41:34,592 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.029e+02 2.143e+02 2.306e+02 3.853e+02, threshold=4.287e+02, percent-clipped=0.0 2024-09-15 07:41:34,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=254379.5, ans=0.0 2024-09-15 07:41:47,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.83 vs. limit=22.5 2024-09-15 07:42:09,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=254464.5, ans=0.0 2024-09-15 07:42:22,477 INFO [train.py:1198] (0/2) Epoch 15, batch 350, loss[loss=0.232, ctc_loss=0.1575, cr_loss=0.3726, over 20867.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1717, cr_loss=0.389, over 3394505.05 frames. ], batch size: 57, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:43:24,317 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-15 07:43:35,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=254606.16666666666, ans=0.125 2024-09-15 07:43:41,531 INFO [train.py:1198] (0/2) Epoch 15, batch 400, loss[loss=0.2347, ctc_loss=0.1639, cr_loss=0.3539, over 21015.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1719, cr_loss=0.3888, over 3552270.23 frames. ], batch size: 63, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:44:08,631 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.048e+02 2.187e+02 2.410e+02 6.467e+02, threshold=4.375e+02, percent-clipped=1.0 2024-09-15 07:44:10,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=254691.16666666666, ans=0.1 2024-09-15 07:44:12,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=254691.16666666666, ans=0.125 2024-09-15 07:44:16,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=254691.16666666666, ans=0.2 2024-09-15 07:44:52,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=254747.83333333334, ans=0.2 2024-09-15 07:44:56,512 INFO [train.py:1198] (0/2) Epoch 15, batch 450, loss[loss=0.243, ctc_loss=0.164, cr_loss=0.3949, over 20976.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1725, cr_loss=0.3899, over 3667002.61 frames. ], batch size: 55, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:45:36,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=254832.83333333334, ans=0.05 2024-09-15 07:45:47,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=254861.16666666666, ans=0.125 2024-09-15 07:46:13,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=22.5 2024-09-15 07:46:14,425 INFO [train.py:1198] (0/2) Epoch 15, batch 500, loss[loss=0.2554, ctc_loss=0.1786, cr_loss=0.3838, over 20199.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1724, cr_loss=0.3897, over 3768272.10 frames. ], batch size: 80, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:46:40,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=254946.16666666666, ans=0.125 2024-09-15 07:46:41,405 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.092e+02 2.215e+02 2.567e+02 3.534e+02, threshold=4.431e+02, percent-clipped=0.0 2024-09-15 07:47:30,109 INFO [train.py:1198] (0/2) Epoch 15, batch 550, loss[loss=0.2131, ctc_loss=0.1427, cr_loss=0.3516, over 20974.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1731, cr_loss=0.3907, over 3850407.56 frames. ], batch size: 49, lr: 5.68e-03, grad_scale: 32.0 2024-09-15 07:47:46,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=255087.83333333334, ans=0.125 2024-09-15 07:47:51,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=255087.83333333334, ans=0.125 2024-09-15 07:47:55,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=255087.83333333334, ans=0.0 2024-09-15 07:48:04,124 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2024-09-15 07:48:14,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=255144.5, ans=0.2 2024-09-15 07:48:44,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=255172.83333333334, ans=0.0 2024-09-15 07:48:48,604 INFO [train.py:1198] (0/2) Epoch 15, batch 600, loss[loss=0.2555, ctc_loss=0.176, cr_loss=0.3974, over 20777.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1723, cr_loss=0.39, over 3912110.82 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:48:50,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=255201.16666666666, ans=0.0 2024-09-15 07:49:16,050 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 1.988e+02 2.133e+02 2.327e+02 2.983e+02, threshold=4.266e+02, percent-clipped=0.0 2024-09-15 07:49:38,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=255286.16666666666, ans=10.0 2024-09-15 07:49:48,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2024-09-15 07:49:55,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2024-09-15 07:50:00,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=255314.5, ans=10.0 2024-09-15 07:50:03,866 INFO [train.py:1198] (0/2) Epoch 15, batch 650, loss[loss=0.308, ctc_loss=0.221, cr_loss=0.4349, over 18127.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.173, cr_loss=0.3916, over 3945242.10 frames. ], batch size: 108, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:50:33,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255399.5, ans=0.1 2024-09-15 07:50:43,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-09-15 07:50:47,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=255427.83333333334, ans=0.0 2024-09-15 07:51:06,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=255456.16666666666, ans=0.2 2024-09-15 07:51:17,974 INFO [train.py:1198] (0/2) Epoch 15, batch 700, loss[loss=0.2629, ctc_loss=0.1864, cr_loss=0.3825, over 20678.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1731, cr_loss=0.3912, over 3969423.27 frames. ], batch size: 68, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:51:28,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=255484.5, ans=10.0 2024-09-15 07:51:30,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=255484.5, ans=0.125 2024-09-15 07:51:31,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=255484.5, ans=0.125 2024-09-15 07:51:45,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=255512.83333333334, ans=0.04949747468305833 2024-09-15 07:51:48,186 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.710e+02 2.037e+02 2.195e+02 2.389e+02 6.615e+02, threshold=4.391e+02, percent-clipped=1.0 2024-09-15 07:51:59,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=255541.16666666666, ans=0.0 2024-09-15 07:52:36,128 INFO [train.py:1198] (0/2) Epoch 15, batch 750, loss[loss=0.251, ctc_loss=0.1735, cr_loss=0.3877, over 21078.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.1737, cr_loss=0.391, over 3978714.06 frames. ], batch size: 59, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:52:37,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255626.16666666666, ans=0.1 2024-09-15 07:52:47,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=255626.16666666666, ans=0.125 2024-09-15 07:53:22,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=255711.16666666666, ans=0.1 2024-09-15 07:53:42,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=255739.5, ans=0.125 2024-09-15 07:53:48,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=255739.5, ans=0.125 2024-09-15 07:53:51,483 INFO [train.py:1198] (0/2) Epoch 15, batch 800, loss[loss=0.2553, ctc_loss=0.174, cr_loss=0.4068, over 20836.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1728, cr_loss=0.3899, over 4007155.40 frames. ], batch size: 59, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:54:06,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=255796.16666666666, ans=0.0 2024-09-15 07:54:21,846 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.653e+02 2.007e+02 2.109e+02 2.286e+02 3.573e+02, threshold=4.219e+02, percent-clipped=0.0 2024-09-15 07:54:27,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2024-09-15 07:54:47,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=255852.83333333334, ans=0.0 2024-09-15 07:55:09,704 INFO [train.py:1198] (0/2) Epoch 15, batch 850, loss[loss=0.2826, ctc_loss=0.199, cr_loss=0.4181, over 20672.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1722, cr_loss=0.3899, over 4040071.10 frames. ], batch size: 71, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:55:14,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=22.5 2024-09-15 07:55:16,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=255909.5, ans=0.125 2024-09-15 07:55:32,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=255937.83333333334, ans=0.5 2024-09-15 07:55:34,675 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-15 07:55:52,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255966.16666666666, ans=0.1 2024-09-15 07:56:10,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2024-09-15 07:56:17,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=256022.83333333334, ans=0.2 2024-09-15 07:56:24,574 INFO [train.py:1198] (0/2) Epoch 15, batch 900, loss[loss=0.2451, ctc_loss=0.1635, cr_loss=0.4081, over 20971.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1717, cr_loss=0.3896, over 4059506.72 frames. ], batch size: 49, lr: 5.67e-03, grad_scale: 32.0 2024-09-15 07:56:39,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=256079.5, ans=0.0 2024-09-15 07:56:51,857 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.068e+02 2.217e+02 2.381e+02 3.265e+02, threshold=4.435e+02, percent-clipped=0.0 2024-09-15 07:57:43,554 INFO [train.py:1198] (0/2) Epoch 15, batch 950, loss[loss=0.2692, ctc_loss=0.1892, cr_loss=0.3996, over 21033.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1717, cr_loss=0.3897, over 4074340.89 frames. ], batch size: 62, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 07:58:11,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=22.5 2024-09-15 07:58:26,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=22.5 2024-09-15 07:58:41,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=256277.83333333334, ans=0.125 2024-09-15 07:58:59,020 INFO [train.py:1198] (0/2) Epoch 15, batch 1000, loss[loss=0.3182, ctc_loss=0.2223, cr_loss=0.4796, over 18596.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1712, cr_loss=0.3889, over 4087530.35 frames. ], batch size: 108, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 07:59:22,308 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.31 vs. limit=10.0 2024-09-15 07:59:26,007 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.055e+02 2.232e+02 2.541e+02 4.564e+02, threshold=4.464e+02, percent-clipped=1.0 2024-09-15 07:59:26,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=256362.83333333334, ans=0.0 2024-09-15 07:59:35,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=256391.16666666666, ans=0.0 2024-09-15 08:00:07,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=256447.83333333334, ans=0.125 2024-09-15 08:00:16,220 INFO [train.py:1198] (0/2) Epoch 15, batch 1050, loss[loss=0.2116, ctc_loss=0.1446, cr_loss=0.335, over 20972.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1714, cr_loss=0.3888, over 4079463.48 frames. ], batch size: 51, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 08:00:34,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=256504.5, ans=10.0 2024-09-15 08:00:40,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=256504.5, ans=0.125 2024-09-15 08:00:42,342 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2024-09-15 08:01:31,370 INFO [train.py:1198] (0/2) Epoch 15, batch 1100, loss[loss=0.22, ctc_loss=0.1499, cr_loss=0.3502, over 20761.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1714, cr_loss=0.3887, over 4073984.80 frames. ], batch size: 56, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 08:01:58,721 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.042e+02 2.228e+02 2.394e+02 4.223e+02, threshold=4.456e+02, percent-clipped=0.0 2024-09-15 08:02:00,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=256674.5, ans=0.0 2024-09-15 08:02:09,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256674.5, ans=0.1 2024-09-15 08:02:10,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-09-15 08:02:17,329 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:02:21,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=256702.83333333334, ans=0.125 2024-09-15 08:02:35,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=256731.16666666666, ans=0.0 2024-09-15 08:02:39,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=256731.16666666666, ans=0.07 2024-09-15 08:02:46,889 INFO [train.py:1198] (0/2) Epoch 15, batch 1150, loss[loss=0.2072, ctc_loss=0.1417, cr_loss=0.3275, over 20364.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.172, cr_loss=0.3892, over 4070623.80 frames. ], batch size: 45, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 08:03:08,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=256787.83333333334, ans=0.2 2024-09-15 08:03:18,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=256816.16666666666, ans=0.09899494936611666 2024-09-15 08:03:20,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=256816.16666666666, ans=0.125 2024-09-15 08:04:01,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=256872.83333333334, ans=0.0 2024-09-15 08:04:02,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-09-15 08:04:04,731 INFO [train.py:1198] (0/2) Epoch 15, batch 1200, loss[loss=0.2337, ctc_loss=0.1586, cr_loss=0.3753, over 20980.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1722, cr_loss=0.3901, over 4072333.29 frames. ], batch size: 51, lr: 5.66e-03, grad_scale: 32.0 2024-09-15 08:04:10,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256901.16666666666, ans=0.1 2024-09-15 08:04:16,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=256901.16666666666, ans=0.0 2024-09-15 08:04:27,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=256929.5, ans=0.125 2024-09-15 08:04:31,911 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.078e+02 2.265e+02 2.538e+02 5.320e+02, threshold=4.531e+02, percent-clipped=1.0 2024-09-15 08:05:18,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=257042.83333333334, ans=0.125 2024-09-15 08:05:19,813 INFO [train.py:1198] (0/2) Epoch 15, batch 1250, loss[loss=0.2641, ctc_loss=0.1786, cr_loss=0.4272, over 20922.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1717, cr_loss=0.3888, over 4087752.84 frames. ], batch size: 60, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:05:45,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=257071.16666666666, ans=0.0 2024-09-15 08:05:51,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=257099.5, ans=0.05 2024-09-15 08:06:15,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=257127.83333333334, ans=0.2 2024-09-15 08:06:37,932 INFO [train.py:1198] (0/2) Epoch 15, batch 1300, loss[loss=0.2608, ctc_loss=0.1793, cr_loss=0.4076, over 20957.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1722, cr_loss=0.3901, over 4094238.58 frames. ], batch size: 67, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:06:50,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=257184.5, ans=0.2 2024-09-15 08:06:51,780 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:07:01,469 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-09-15 08:07:04,976 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.016e+02 2.233e+02 2.486e+02 3.513e+02, threshold=4.465e+02, percent-clipped=0.0 2024-09-15 08:07:23,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.18 vs. limit=22.5 2024-09-15 08:07:49,568 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2024-09-15 08:07:53,154 INFO [train.py:1198] (0/2) Epoch 15, batch 1350, loss[loss=0.2297, ctc_loss=0.1551, cr_loss=0.373, over 20926.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1727, cr_loss=0.3907, over 4103019.21 frames. ], batch size: 60, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:08:05,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=257326.16666666666, ans=0.1 2024-09-15 08:08:13,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=257354.5, ans=0.125 2024-09-15 08:09:00,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2024-09-15 08:09:11,583 INFO [train.py:1198] (0/2) Epoch 15, batch 1400, loss[loss=0.2511, ctc_loss=0.1748, cr_loss=0.3819, over 21089.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1726, cr_loss=0.3907, over 4112604.90 frames. ], batch size: 59, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:09:35,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=257496.16666666666, ans=0.125 2024-09-15 08:09:38,525 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.041e+02 2.205e+02 2.399e+02 3.235e+02, threshold=4.410e+02, percent-clipped=0.0 2024-09-15 08:09:46,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=257524.5, ans=0.0 2024-09-15 08:09:58,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=257552.83333333334, ans=0.125 2024-09-15 08:10:16,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=257581.16666666666, ans=0.125 2024-09-15 08:10:17,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=257581.16666666666, ans=10.0 2024-09-15 08:10:26,211 INFO [train.py:1198] (0/2) Epoch 15, batch 1450, loss[loss=0.2511, ctc_loss=0.1753, cr_loss=0.3789, over 20869.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1731, cr_loss=0.3904, over 4095777.19 frames. ], batch size: 65, lr: 5.65e-03, grad_scale: 64.0 2024-09-15 08:10:26,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=257609.5, ans=0.07 2024-09-15 08:10:39,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-09-15 08:11:01,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=257666.16666666666, ans=0.125 2024-09-15 08:11:07,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=257666.16666666666, ans=0.0 2024-09-15 08:11:32,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=257722.83333333334, ans=0.0 2024-09-15 08:11:44,373 INFO [train.py:1198] (0/2) Epoch 15, batch 1500, loss[loss=0.2263, ctc_loss=0.1539, cr_loss=0.362, over 21044.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1721, cr_loss=0.3894, over 4090034.22 frames. ], batch size: 56, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:11:44,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=257751.16666666666, ans=0.025 2024-09-15 08:11:51,395 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2024-09-15 08:12:07,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=257779.5, ans=0.0 2024-09-15 08:12:12,540 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.619e+02 2.042e+02 2.175e+02 2.452e+02 3.722e+02, threshold=4.351e+02, percent-clipped=0.0 2024-09-15 08:12:21,291 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=22.5 2024-09-15 08:12:29,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=257836.16666666666, ans=0.125 2024-09-15 08:12:59,190 INFO [train.py:1198] (0/2) Epoch 15, batch 1550, loss[loss=0.2598, ctc_loss=0.178, cr_loss=0.4094, over 20880.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.172, cr_loss=0.3888, over 4094764.22 frames. ], batch size: 57, lr: 5.65e-03, grad_scale: 32.0 2024-09-15 08:13:17,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=257921.16666666666, ans=0.0 2024-09-15 08:13:30,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=257949.5, ans=0.2 2024-09-15 08:13:32,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=257949.5, ans=0.025 2024-09-15 08:14:16,802 INFO [train.py:1198] (0/2) Epoch 15, batch 1600, loss[loss=0.2907, ctc_loss=0.203, cr_loss=0.4387, over 20292.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1711, cr_loss=0.3874, over 4107777.73 frames. ], batch size: 80, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:14:45,452 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.047e+02 2.165e+02 2.331e+02 3.063e+02, threshold=4.331e+02, percent-clipped=0.0 2024-09-15 08:15:16,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=258147.83333333334, ans=0.1 2024-09-15 08:15:16,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=258147.83333333334, ans=0.125 2024-09-15 08:15:32,747 INFO [train.py:1198] (0/2) Epoch 15, batch 1650, loss[loss=0.1955, ctc_loss=0.1315, cr_loss=0.32, over 20951.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1707, cr_loss=0.3869, over 4106513.99 frames. ], batch size: 49, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:15:34,826 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-09-15 08:15:40,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=258176.16666666666, ans=0.125 2024-09-15 08:15:41,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=258176.16666666666, ans=0.025 2024-09-15 08:15:43,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=258176.16666666666, ans=0.5 2024-09-15 08:15:44,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=258176.16666666666, ans=0.125 2024-09-15 08:16:02,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-15 08:16:37,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=258289.5, ans=0.2 2024-09-15 08:16:47,117 INFO [train.py:1198] (0/2) Epoch 15, batch 1700, loss[loss=0.259, ctc_loss=0.1793, cr_loss=0.3984, over 21012.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.171, cr_loss=0.3877, over 4092958.28 frames. ], batch size: 61, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:16:50,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=258317.83333333334, ans=0.0 2024-09-15 08:17:08,927 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2024-09-15 08:17:11,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=258346.16666666666, ans=0.125 2024-09-15 08:17:18,470 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.071e+02 2.208e+02 2.414e+02 4.241e+02, threshold=4.417e+02, percent-clipped=0.0 2024-09-15 08:17:23,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=258374.5, ans=0.025 2024-09-15 08:18:04,974 INFO [train.py:1198] (0/2) Epoch 15, batch 1750, loss[loss=0.2861, ctc_loss=0.199, cr_loss=0.4357, over 20566.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1716, cr_loss=0.3884, over 4089305.88 frames. ], batch size: 75, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:18:25,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=258487.83333333334, ans=0.0 2024-09-15 08:18:26,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=258487.83333333334, ans=0.0 2024-09-15 08:19:10,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=258572.83333333334, ans=0.125 2024-09-15 08:19:13,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=258572.83333333334, ans=0.125 2024-09-15 08:19:17,308 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.76 vs. limit=10.0 2024-09-15 08:19:20,806 INFO [train.py:1198] (0/2) Epoch 15, batch 1800, loss[loss=0.2642, ctc_loss=0.1831, cr_loss=0.4056, over 20940.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1719, cr_loss=0.3882, over 4076527.94 frames. ], batch size: 60, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:19:49,451 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.042e+02 2.182e+02 2.362e+02 3.275e+02, threshold=4.364e+02, percent-clipped=0.0 2024-09-15 08:19:57,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=258657.83333333334, ans=0.025 2024-09-15 08:20:03,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=258657.83333333334, ans=0.125 2024-09-15 08:20:05,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-09-15 08:20:09,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-15 08:20:38,884 INFO [train.py:1198] (0/2) Epoch 15, batch 1850, loss[loss=0.2306, ctc_loss=0.1584, cr_loss=0.3611, over 20790.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1722, cr_loss=0.3883, over 4073813.45 frames. ], batch size: 56, lr: 5.64e-03, grad_scale: 32.0 2024-09-15 08:20:58,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=258771.16666666666, ans=0.1 2024-09-15 08:21:04,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=258771.16666666666, ans=0.0 2024-09-15 08:21:18,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=258799.5, ans=0.0 2024-09-15 08:21:36,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=258827.83333333334, ans=0.125 2024-09-15 08:21:54,077 INFO [train.py:1198] (0/2) Epoch 15, batch 1900, loss[loss=0.2179, ctc_loss=0.1474, cr_loss=0.3526, over 20791.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1715, cr_loss=0.3872, over 4078363.33 frames. ], batch size: 53, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:22:22,541 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.066e+02 2.248e+02 2.501e+02 3.115e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-15 08:22:27,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=258941.16666666666, ans=0.07 2024-09-15 08:22:48,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=258969.5, ans=0.125 2024-09-15 08:22:52,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=258969.5, ans=0.125 2024-09-15 08:23:00,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=258997.83333333334, ans=0.025 2024-09-15 08:23:11,939 INFO [train.py:1198] (0/2) Epoch 15, batch 1950, loss[loss=0.2321, ctc_loss=0.1598, cr_loss=0.3614, over 21004.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1709, cr_loss=0.3863, over 4086375.71 frames. ], batch size: 55, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:23:25,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=259054.5, ans=0.125 2024-09-15 08:23:43,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=259082.83333333334, ans=0.5 2024-09-15 08:23:43,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=259082.83333333334, ans=0.2 2024-09-15 08:24:26,631 INFO [train.py:1198] (0/2) Epoch 15, batch 2000, loss[loss=0.2096, ctc_loss=0.1399, cr_loss=0.3485, over 20958.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1709, cr_loss=0.3866, over 4078164.69 frames. ], batch size: 50, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:24:32,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=259167.83333333334, ans=0.1 2024-09-15 08:24:54,960 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.739e+02 2.039e+02 2.219e+02 2.423e+02 5.335e+02, threshold=4.438e+02, percent-clipped=1.0 2024-09-15 08:25:44,104 INFO [train.py:1198] (0/2) Epoch 15, batch 2050, loss[loss=0.2372, ctc_loss=0.1617, cr_loss=0.3774, over 20830.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1707, cr_loss=0.3863, over 4082235.75 frames. ], batch size: 59, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:25:57,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=259337.83333333334, ans=0.125 2024-09-15 08:26:01,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=259337.83333333334, ans=0.0 2024-09-15 08:26:57,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=259451.16666666666, ans=0.125 2024-09-15 08:26:59,098 INFO [train.py:1198] (0/2) Epoch 15, batch 2100, loss[loss=0.2666, ctc_loss=0.1843, cr_loss=0.4117, over 20848.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1715, cr_loss=0.3881, over 4076730.00 frames. ], batch size: 65, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:27:27,454 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.077e+02 2.200e+02 2.420e+02 5.649e+02, threshold=4.400e+02, percent-clipped=2.0 2024-09-15 08:28:14,520 INFO [train.py:1198] (0/2) Epoch 15, batch 2150, loss[loss=0.264, ctc_loss=0.1864, cr_loss=0.3877, over 20675.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1715, cr_loss=0.3875, over 4076846.57 frames. ], batch size: 71, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:28:50,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=259649.5, ans=0.0 2024-09-15 08:29:04,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259677.83333333334, ans=0.1 2024-09-15 08:29:32,235 INFO [train.py:1198] (0/2) Epoch 15, batch 2200, loss[loss=0.2302, ctc_loss=0.1556, cr_loss=0.3727, over 21007.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1721, cr_loss=0.3887, over 4076472.69 frames. ], batch size: 52, lr: 5.63e-03, grad_scale: 32.0 2024-09-15 08:29:52,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=259762.83333333334, ans=0.0 2024-09-15 08:29:52,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=259762.83333333334, ans=0.125 2024-09-15 08:30:00,977 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.042e+02 2.122e+02 2.305e+02 3.906e+02, threshold=4.245e+02, percent-clipped=0.0 2024-09-15 08:30:26,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=259819.5, ans=0.0 2024-09-15 08:30:37,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=259847.83333333334, ans=0.125 2024-09-15 08:30:47,717 INFO [train.py:1198] (0/2) Epoch 15, batch 2250, loss[loss=0.2493, ctc_loss=0.1765, cr_loss=0.3641, over 20686.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1714, cr_loss=0.3882, over 4086038.50 frames. ], batch size: 71, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:30:53,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=259876.16666666666, ans=0.125 2024-09-15 08:31:15,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=259904.5, ans=0.2 2024-09-15 08:31:15,439 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=15.0 2024-09-15 08:31:19,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=259904.5, ans=0.0 2024-09-15 08:31:25,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259932.83333333334, ans=0.1 2024-09-15 08:31:33,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=259932.83333333334, ans=0.0 2024-09-15 08:31:52,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=259989.5, ans=0.1 2024-09-15 08:32:07,766 INFO [train.py:1198] (0/2) Epoch 15, batch 2300, loss[loss=0.2764, ctc_loss=0.1885, cr_loss=0.4394, over 20998.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1715, cr_loss=0.3884, over 4092619.26 frames. ], batch size: 63, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:32:11,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=260017.83333333334, ans=0.0 2024-09-15 08:32:26,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=260046.16666666666, ans=0.125 2024-09-15 08:32:35,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=260046.16666666666, ans=0.05 2024-09-15 08:32:36,366 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.068e+02 2.299e+02 2.502e+02 4.196e+02, threshold=4.599e+02, percent-clipped=0.0 2024-09-15 08:32:54,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=260102.83333333334, ans=0.1 2024-09-15 08:33:02,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=260102.83333333334, ans=0.0 2024-09-15 08:33:22,742 INFO [train.py:1198] (0/2) Epoch 15, batch 2350, loss[loss=0.2463, ctc_loss=0.1683, cr_loss=0.3901, over 20337.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1712, cr_loss=0.3888, over 4096730.32 frames. ], batch size: 74, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:33:27,601 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:33:37,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=260187.83333333334, ans=0.125 2024-09-15 08:33:38,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=260187.83333333334, ans=0.05 2024-09-15 08:34:01,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-09-15 08:34:20,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=260244.5, ans=0.125 2024-09-15 08:34:32,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=260272.83333333334, ans=0.09899494936611666 2024-09-15 08:34:41,280 INFO [train.py:1198] (0/2) Epoch 15, batch 2400, loss[loss=0.2693, ctc_loss=0.1851, cr_loss=0.4209, over 20822.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1715, cr_loss=0.3892, over 4100990.29 frames. ], batch size: 65, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:35:09,565 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.020e+02 2.145e+02 2.307e+02 3.011e+02, threshold=4.290e+02, percent-clipped=0.0 2024-09-15 08:35:12,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=260357.83333333334, ans=0.05 2024-09-15 08:35:16,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=260357.83333333334, ans=0.125 2024-09-15 08:35:56,280 INFO [train.py:1198] (0/2) Epoch 15, batch 2450, loss[loss=0.2402, ctc_loss=0.1631, cr_loss=0.3854, over 21057.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1705, cr_loss=0.3877, over 4105839.58 frames. ], batch size: 53, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:36:20,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=260471.16666666666, ans=0.1 2024-09-15 08:36:29,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=260499.5, ans=0.0 2024-09-15 08:36:43,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=260527.83333333334, ans=0.125 2024-09-15 08:36:55,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=260527.83333333334, ans=0.125 2024-09-15 08:37:00,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2024-09-15 08:37:04,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=260556.16666666666, ans=0.125 2024-09-15 08:37:10,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=260556.16666666666, ans=0.0 2024-09-15 08:37:14,339 INFO [train.py:1198] (0/2) Epoch 15, batch 2500, loss[loss=0.2404, ctc_loss=0.167, cr_loss=0.3666, over 21052.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1707, cr_loss=0.3881, over 4118016.95 frames. ], batch size: 56, lr: 5.62e-03, grad_scale: 32.0 2024-09-15 08:37:16,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2024-09-15 08:37:28,270 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:37:38,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=260612.83333333334, ans=0.125 2024-09-15 08:37:42,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.723e+02 2.050e+02 2.265e+02 2.531e+02 4.236e+02, threshold=4.531e+02, percent-clipped=0.0 2024-09-15 08:37:56,493 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-92000.pt 2024-09-15 08:37:57,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=260641.16666666666, ans=0.1 2024-09-15 08:38:04,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=15.0 2024-09-15 08:38:05,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=260669.5, ans=0.1 2024-09-15 08:38:11,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=260669.5, ans=0.125 2024-09-15 08:38:14,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=260697.83333333334, ans=0.125 2024-09-15 08:38:25,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=260697.83333333334, ans=0.2 2024-09-15 08:38:30,669 INFO [train.py:1198] (0/2) Epoch 15, batch 2550, loss[loss=0.2567, ctc_loss=0.1758, cr_loss=0.4046, over 20971.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1702, cr_loss=0.3872, over 4122807.59 frames. ], batch size: 58, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:38:32,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=12.0 2024-09-15 08:39:33,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=260839.5, ans=0.0 2024-09-15 08:39:45,554 INFO [train.py:1198] (0/2) Epoch 15, batch 2600, loss[loss=0.2633, ctc_loss=0.1813, cr_loss=0.41, over 20675.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1701, cr_loss=0.3868, over 4108397.50 frames. ], batch size: 66, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:39:48,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=260867.83333333334, ans=0.2 2024-09-15 08:39:54,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=260867.83333333334, ans=0.125 2024-09-15 08:39:54,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=260867.83333333334, ans=0.125 2024-09-15 08:40:03,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.75 vs. limit=10.0 2024-09-15 08:40:04,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=260896.16666666666, ans=0.0 2024-09-15 08:40:16,374 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.658e+02 2.096e+02 2.224e+02 2.407e+02 3.923e+02, threshold=4.448e+02, percent-clipped=0.0 2024-09-15 08:40:18,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=260924.5, ans=0.07 2024-09-15 08:40:21,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=260924.5, ans=0.025 2024-09-15 08:40:36,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2024-09-15 08:40:43,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=260952.83333333334, ans=0.125 2024-09-15 08:40:54,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=22.5 2024-09-15 08:41:02,497 INFO [train.py:1198] (0/2) Epoch 15, batch 2650, loss[loss=0.2861, ctc_loss=0.2012, cr_loss=0.4248, over 20631.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1708, cr_loss=0.387, over 4091854.08 frames. ], batch size: 68, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:41:14,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=261009.5, ans=0.125 2024-09-15 08:41:33,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=261066.16666666666, ans=0.0 2024-09-15 08:41:52,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=261094.5, ans=0.025 2024-09-15 08:42:17,941 INFO [train.py:1198] (0/2) Epoch 15, batch 2700, loss[loss=0.2728, ctc_loss=0.1869, cr_loss=0.4294, over 19314.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1711, cr_loss=0.3875, over 4094292.01 frames. ], batch size: 90, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:42:29,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=261151.16666666666, ans=0.125 2024-09-15 08:42:30,437 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:42:49,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.120e+02 2.293e+02 2.561e+02 3.584e+02, threshold=4.586e+02, percent-clipped=0.0 2024-09-15 08:43:17,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261236.16666666666, ans=0.1 2024-09-15 08:43:19,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=261236.16666666666, ans=0.125 2024-09-15 08:43:22,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=261264.5, ans=0.125 2024-09-15 08:43:25,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=261264.5, ans=0.125 2024-09-15 08:43:36,948 INFO [train.py:1198] (0/2) Epoch 15, batch 2750, loss[loss=0.2587, ctc_loss=0.1797, cr_loss=0.3951, over 20062.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1703, cr_loss=0.3875, over 4102082.23 frames. ], batch size: 80, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:43:56,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261321.16666666666, ans=0.1 2024-09-15 08:44:04,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=261321.16666666666, ans=0.125 2024-09-15 08:44:14,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-15 08:44:23,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=261377.83333333334, ans=0.2 2024-09-15 08:44:32,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=261377.83333333334, ans=0.125 2024-09-15 08:44:49,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=261406.16666666666, ans=0.2 2024-09-15 08:44:52,131 INFO [train.py:1198] (0/2) Epoch 15, batch 2800, loss[loss=0.2414, ctc_loss=0.1642, cr_loss=0.386, over 21012.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1698, cr_loss=0.3866, over 4107304.15 frames. ], batch size: 61, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:44:52,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=261434.5, ans=0.125 2024-09-15 08:44:55,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2024-09-15 08:45:13,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=261462.83333333334, ans=0.125 2024-09-15 08:45:20,987 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.044e+02 2.217e+02 2.446e+02 3.231e+02, threshold=4.433e+02, percent-clipped=0.0 2024-09-15 08:45:51,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=22.5 2024-09-15 08:46:10,647 INFO [train.py:1198] (0/2) Epoch 15, batch 2850, loss[loss=0.2146, ctc_loss=0.1476, cr_loss=0.3352, over 20325.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1712, cr_loss=0.3878, over 4087176.19 frames. ], batch size: 45, lr: 5.61e-03, grad_scale: 32.0 2024-09-15 08:46:18,522 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:46:46,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=261632.83333333334, ans=0.09899494936611666 2024-09-15 08:47:00,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261661.16666666666, ans=0.1 2024-09-15 08:47:25,505 INFO [train.py:1198] (0/2) Epoch 15, batch 2900, loss[loss=0.2889, ctc_loss=0.2143, cr_loss=0.3734, over 14610.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1709, cr_loss=0.387, over 4097688.11 frames. ], batch size: 149, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:47:51,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261746.16666666666, ans=0.1 2024-09-15 08:47:53,915 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.036e+02 2.221e+02 2.380e+02 4.191e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-15 08:48:19,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261802.83333333334, ans=0.1 2024-09-15 08:48:42,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=261859.5, ans=6.0 2024-09-15 08:48:42,823 INFO [train.py:1198] (0/2) Epoch 15, batch 2950, loss[loss=0.2244, ctc_loss=0.151, cr_loss=0.3672, over 21074.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1697, cr_loss=0.3862, over 4103896.64 frames. ], batch size: 56, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:48:48,588 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.64 vs. limit=5.0 2024-09-15 08:48:49,643 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=22.5 2024-09-15 08:49:42,072 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:49:58,446 INFO [train.py:1198] (0/2) Epoch 15, batch 3000, loss[loss=0.2497, ctc_loss=0.1701, cr_loss=0.398, over 21076.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1699, cr_loss=0.3868, over 4100844.53 frames. ], batch size: 59, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:49:58,447 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 08:50:17,947 INFO [train.py:1230] (0/2) Epoch 15, validation: loss=0.04693, ctc_loss=0.04693, cr_loss=9.851e-15, over 944034.00 frames. 2024-09-15 08:50:17,947 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 08:50:35,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-15 08:50:46,784 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.717e+02 2.044e+02 2.249e+02 2.455e+02 4.170e+02, threshold=4.498e+02, percent-clipped=0.0 2024-09-15 08:51:15,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=262086.16666666666, ans=0.125 2024-09-15 08:51:24,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=262114.5, ans=0.1 2024-09-15 08:51:35,672 INFO [train.py:1198] (0/2) Epoch 15, batch 3050, loss[loss=0.2321, ctc_loss=0.1569, cr_loss=0.3758, over 21065.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1701, cr_loss=0.3872, over 4108360.08 frames. ], batch size: 53, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:51:39,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.20 vs. limit=12.0 2024-09-15 08:52:50,879 INFO [train.py:1198] (0/2) Epoch 15, batch 3100, loss[loss=0.2772, ctc_loss=0.1909, cr_loss=0.4318, over 20656.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1705, cr_loss=0.3876, over 4110101.56 frames. ], batch size: 66, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:53:11,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=22.5 2024-09-15 08:53:19,505 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.058e+02 2.194e+02 2.429e+02 3.199e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 08:53:27,434 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 08:53:38,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=262369.5, ans=0.0 2024-09-15 08:53:46,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=12.0 2024-09-15 08:53:51,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=262369.5, ans=0.1 2024-09-15 08:54:08,279 INFO [train.py:1198] (0/2) Epoch 15, batch 3150, loss[loss=0.2417, ctc_loss=0.1667, cr_loss=0.3749, over 21042.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1716, cr_loss=0.3885, over 4096017.91 frames. ], batch size: 62, lr: 5.60e-03, grad_scale: 32.0 2024-09-15 08:54:43,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=262482.8333333333, ans=0.125 2024-09-15 08:55:23,009 INFO [train.py:1198] (0/2) Epoch 15, batch 3200, loss[loss=0.2766, ctc_loss=0.1928, cr_loss=0.4188, over 19370.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1717, cr_loss=0.3891, over 4097898.98 frames. ], batch size: 90, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 08:55:27,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=262567.8333333333, ans=0.0 2024-09-15 08:55:51,669 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 1.993e+02 2.115e+02 2.319e+02 3.282e+02, threshold=4.229e+02, percent-clipped=0.0 2024-09-15 08:55:53,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=262624.5, ans=0.09899494936611666 2024-09-15 08:55:56,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=262624.5, ans=0.125 2024-09-15 08:56:37,944 INFO [train.py:1198] (0/2) Epoch 15, batch 3250, loss[loss=0.2692, ctc_loss=0.1853, cr_loss=0.4193, over 20627.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1723, cr_loss=0.39, over 4094184.82 frames. ], batch size: 66, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 08:56:59,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2024-09-15 08:57:06,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=262737.8333333333, ans=0.0 2024-09-15 08:57:30,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=262794.5, ans=0.125 2024-09-15 08:57:55,809 INFO [train.py:1198] (0/2) Epoch 15, batch 3300, loss[loss=0.2677, ctc_loss=0.1827, cr_loss=0.4252, over 20730.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1728, cr_loss=0.3902, over 4087464.93 frames. ], batch size: 71, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 08:58:06,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=262851.1666666667, ans=0.125 2024-09-15 08:58:12,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=262879.5, ans=0.0 2024-09-15 08:58:12,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=262879.5, ans=0.0 2024-09-15 08:58:24,328 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.013e+02 2.168e+02 2.340e+02 4.023e+02, threshold=4.337e+02, percent-clipped=0.0 2024-09-15 08:58:57,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=262964.5, ans=0.125 2024-09-15 08:59:10,886 INFO [train.py:1198] (0/2) Epoch 15, batch 3350, loss[loss=0.271, ctc_loss=0.1834, cr_loss=0.4378, over 20858.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.173, cr_loss=0.3909, over 4087784.76 frames. ], batch size: 65, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 08:59:15,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=262992.8333333333, ans=0.125 2024-09-15 08:59:48,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=263049.5, ans=0.125 2024-09-15 09:00:28,413 INFO [train.py:1198] (0/2) Epoch 15, batch 3400, loss[loss=0.2365, ctc_loss=0.1607, cr_loss=0.3792, over 20972.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1731, cr_loss=0.3916, over 4092208.03 frames. ], batch size: 51, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 09:00:54,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=263162.8333333333, ans=0.125 2024-09-15 09:00:56,910 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.056e+02 2.235e+02 2.445e+02 8.480e+02, threshold=4.471e+02, percent-clipped=1.0 2024-09-15 09:01:00,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=263191.1666666667, ans=0.025 2024-09-15 09:01:21,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=263219.5, ans=0.125 2024-09-15 09:01:33,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263247.8333333333, ans=0.1 2024-09-15 09:01:43,512 INFO [train.py:1198] (0/2) Epoch 15, batch 3450, loss[loss=0.2077, ctc_loss=0.1344, cr_loss=0.3665, over 19835.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1722, cr_loss=0.3913, over 4099061.18 frames. ], batch size: 44, lr: 5.59e-03, grad_scale: 32.0 2024-09-15 09:03:01,558 INFO [train.py:1198] (0/2) Epoch 15, batch 3500, loss[loss=0.2622, ctc_loss=0.1813, cr_loss=0.4046, over 20815.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1712, cr_loss=0.3894, over 4110879.79 frames. ], batch size: 59, lr: 5.59e-03, grad_scale: 64.0 2024-09-15 09:03:31,506 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.017e+02 2.226e+02 2.415e+02 4.174e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-15 09:03:39,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=263474.5, ans=0.025 2024-09-15 09:03:57,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=263502.8333333333, ans=0.2 2024-09-15 09:04:09,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=263531.1666666667, ans=0.125 2024-09-15 09:04:16,497 INFO [train.py:1198] (0/2) Epoch 15, batch 3550, loss[loss=0.2838, ctc_loss=0.1955, cr_loss=0.4416, over 20980.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1709, cr_loss=0.389, over 4107791.79 frames. ], batch size: 64, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:04:16,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=263559.5, ans=0.125 2024-09-15 09:04:24,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=263559.5, ans=0.1 2024-09-15 09:04:27,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=263559.5, ans=0.125 2024-09-15 09:04:33,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=263587.8333333333, ans=0.035 2024-09-15 09:04:43,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.29 vs. limit=15.0 2024-09-15 09:05:23,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=263672.8333333333, ans=0.125 2024-09-15 09:05:34,763 INFO [train.py:1198] (0/2) Epoch 15, batch 3600, loss[loss=0.2828, ctc_loss=0.1984, cr_loss=0.4217, over 18290.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1711, cr_loss=0.3888, over 4106197.47 frames. ], batch size: 108, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:05:35,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-09-15 09:05:39,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=263701.1666666667, ans=10.0 2024-09-15 09:05:55,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263729.5, ans=0.1 2024-09-15 09:06:05,311 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.714e+02 2.071e+02 2.205e+02 2.391e+02 4.670e+02, threshold=4.411e+02, percent-clipped=1.0 2024-09-15 09:06:20,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=263786.1666666667, ans=0.035 2024-09-15 09:06:50,291 INFO [train.py:1198] (0/2) Epoch 15, batch 3650, loss[loss=0.2496, ctc_loss=0.1724, cr_loss=0.386, over 20777.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1706, cr_loss=0.3877, over 4093693.01 frames. ], batch size: 56, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:07:13,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=263871.1666666667, ans=0.0 2024-09-15 09:07:39,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263927.8333333333, ans=0.1 2024-09-15 09:07:40,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=263927.8333333333, ans=0.025 2024-09-15 09:08:05,009 INFO [train.py:1198] (0/2) Epoch 15, batch 3700, loss[loss=0.291, ctc_loss=0.2067, cr_loss=0.4215, over 17979.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.172, cr_loss=0.3901, over 4084082.80 frames. ], batch size: 108, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:08:27,981 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-09-15 09:08:32,062 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:08:34,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=264012.8333333333, ans=0.025 2024-09-15 09:08:35,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=15.0 2024-09-15 09:08:37,738 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 2.012e+02 2.130e+02 2.276e+02 5.003e+02, threshold=4.259e+02, percent-clipped=1.0 2024-09-15 09:08:54,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=264069.5, ans=0.125 2024-09-15 09:09:22,688 INFO [train.py:1198] (0/2) Epoch 15, batch 3750, loss[loss=0.2718, ctc_loss=0.1878, cr_loss=0.4197, over 20951.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1711, cr_loss=0.3885, over 4077411.36 frames. ], batch size: 67, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:09:44,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=264154.5, ans=0.125 2024-09-15 09:10:33,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=264239.5, ans=0.125 2024-09-15 09:10:37,783 INFO [train.py:1198] (0/2) Epoch 15, batch 3800, loss[loss=0.2117, ctc_loss=0.1453, cr_loss=0.3321, over 19941.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1716, cr_loss=0.3895, over 4084856.69 frames. ], batch size: 44, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:11:07,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=264296.1666666667, ans=0.125 2024-09-15 09:11:10,492 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.055e+02 2.289e+02 2.466e+02 4.211e+02, threshold=4.578e+02, percent-clipped=0.0 2024-09-15 09:11:14,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.53 vs. limit=5.0 2024-09-15 09:11:42,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=264381.1666666667, ans=0.025 2024-09-15 09:11:45,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=264381.1666666667, ans=0.125 2024-09-15 09:11:55,265 INFO [train.py:1198] (0/2) Epoch 15, batch 3850, loss[loss=0.2649, ctc_loss=0.1818, cr_loss=0.4154, over 20284.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1715, cr_loss=0.3891, over 4094001.94 frames. ], batch size: 74, lr: 5.58e-03, grad_scale: 32.0 2024-09-15 09:12:01,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=264409.5, ans=0.125 2024-09-15 09:12:22,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=264437.8333333333, ans=0.0 2024-09-15 09:12:30,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=264466.1666666667, ans=0.125 2024-09-15 09:13:00,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=264522.8333333333, ans=0.125 2024-09-15 09:13:10,784 INFO [train.py:1198] (0/2) Epoch 15, batch 3900, loss[loss=0.3102, ctc_loss=0.2275, cr_loss=0.4138, over 14874.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1715, cr_loss=0.389, over 4079814.19 frames. ], batch size: 149, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:13:26,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-09-15 09:13:27,904 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:13:41,352 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.094e+02 2.280e+02 2.508e+02 3.421e+02, threshold=4.561e+02, percent-clipped=0.0 2024-09-15 09:14:02,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=264636.1666666667, ans=0.0 2024-09-15 09:14:17,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=264664.5, ans=0.0 2024-09-15 09:14:24,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=264664.5, ans=0.125 2024-09-15 09:14:29,225 INFO [train.py:1198] (0/2) Epoch 15, batch 3950, loss[loss=0.2654, ctc_loss=0.1854, cr_loss=0.3997, over 19467.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.172, cr_loss=0.3902, over 4085662.63 frames. ], batch size: 90, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:14:39,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=264692.8333333333, ans=0.0 2024-09-15 09:14:47,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=264721.1666666667, ans=0.125 2024-09-15 09:14:53,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=264721.1666666667, ans=0.0 2024-09-15 09:15:01,592 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-09-15 09:15:28,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=264806.1666666667, ans=0.025 2024-09-15 09:15:44,079 INFO [train.py:1198] (0/2) Epoch 15, batch 4000, loss[loss=0.2496, ctc_loss=0.1707, cr_loss=0.3945, over 21063.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1725, cr_loss=0.3903, over 4067248.08 frames. ], batch size: 62, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:16:05,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=264862.8333333333, ans=0.0 2024-09-15 09:16:08,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=264862.8333333333, ans=0.125 2024-09-15 09:16:13,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=264891.1666666667, ans=0.125 2024-09-15 09:16:14,194 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.071e+02 2.223e+02 2.375e+02 3.814e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-15 09:16:20,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=264891.1666666667, ans=0.125 2024-09-15 09:17:00,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264976.1666666667, ans=0.1 2024-09-15 09:17:02,173 INFO [train.py:1198] (0/2) Epoch 15, batch 4050, loss[loss=0.2447, ctc_loss=0.1673, cr_loss=0.387, over 20955.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1718, cr_loss=0.3886, over 4071918.34 frames. ], batch size: 58, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:17:05,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=264976.1666666667, ans=0.07 2024-09-15 09:17:08,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=264976.1666666667, ans=0.125 2024-09-15 09:17:20,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=265004.5, ans=0.125 2024-09-15 09:17:56,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=265061.1666666667, ans=0.025 2024-09-15 09:18:17,681 INFO [train.py:1198] (0/2) Epoch 15, batch 4100, loss[loss=0.2665, ctc_loss=0.1842, cr_loss=0.4113, over 20874.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1712, cr_loss=0.3883, over 4085223.33 frames. ], batch size: 65, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:18:47,836 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.093e+02 2.224e+02 2.504e+02 3.147e+02, threshold=4.449e+02, percent-clipped=0.0 2024-09-15 09:18:49,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=265174.5, ans=0.025 2024-09-15 09:19:12,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265202.8333333333, ans=0.1 2024-09-15 09:19:16,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=265202.8333333333, ans=0.125 2024-09-15 09:19:22,318 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.02 vs. limit=6.0 2024-09-15 09:19:36,580 INFO [train.py:1198] (0/2) Epoch 15, batch 4150, loss[loss=0.2843, ctc_loss=0.1944, cr_loss=0.4494, over 20675.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1709, cr_loss=0.3881, over 4096084.10 frames. ], batch size: 71, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:19:37,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=22.5 2024-09-15 09:19:50,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=265287.8333333333, ans=0.025 2024-09-15 09:20:02,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=265287.8333333333, ans=0.0 2024-09-15 09:20:51,800 INFO [train.py:1198] (0/2) Epoch 15, batch 4200, loss[loss=0.2728, ctc_loss=0.1922, cr_loss=0.4026, over 20841.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1709, cr_loss=0.3885, over 4100105.59 frames. ], batch size: 65, lr: 5.57e-03, grad_scale: 32.0 2024-09-15 09:20:56,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=265401.1666666667, ans=0.125 2024-09-15 09:20:59,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265401.1666666667, ans=0.1 2024-09-15 09:21:20,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-15 09:21:21,750 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.038e+02 2.154e+02 2.360e+02 3.093e+02, threshold=4.308e+02, percent-clipped=0.0 2024-09-15 09:21:31,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=265457.8333333333, ans=0.0 2024-09-15 09:21:41,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=265486.1666666667, ans=0.125 2024-09-15 09:22:10,049 INFO [train.py:1198] (0/2) Epoch 15, batch 4250, loss[loss=0.2181, ctc_loss=0.1458, cr_loss=0.3617, over 20961.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1714, cr_loss=0.3898, over 4104177.03 frames. ], batch size: 50, lr: 5.56e-03, grad_scale: 32.0 2024-09-15 09:22:49,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=265599.5, ans=0.0 2024-09-15 09:22:58,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.13 vs. limit=15.0 2024-09-15 09:23:13,142 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-15 09:23:17,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=265656.1666666667, ans=0.2 2024-09-15 09:23:26,081 INFO [train.py:1198] (0/2) Epoch 15, batch 4300, loss[loss=0.2359, ctc_loss=0.1578, cr_loss=0.3905, over 20974.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1709, cr_loss=0.3892, over 4106757.19 frames. ], batch size: 49, lr: 5.56e-03, grad_scale: 16.0 2024-09-15 09:23:26,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.00 vs. limit=15.0 2024-09-15 09:23:39,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=265712.8333333333, ans=0.0 2024-09-15 09:23:57,740 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.040e+02 2.249e+02 2.478e+02 3.601e+02, threshold=4.499e+02, percent-clipped=0.0 2024-09-15 09:24:41,494 INFO [train.py:1198] (0/2) Epoch 15, batch 4350, loss[loss=0.2079, ctc_loss=0.1398, cr_loss=0.3406, over 20882.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1714, cr_loss=0.3902, over 4117733.07 frames. ], batch size: 54, lr: 5.56e-03, grad_scale: 16.0 2024-09-15 09:25:10,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=265882.8333333333, ans=0.125 2024-09-15 09:25:29,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265911.1666666667, ans=0.1 2024-09-15 09:25:59,115 INFO [train.py:1198] (0/2) Epoch 15, batch 4400, loss[loss=0.2688, ctc_loss=0.1891, cr_loss=0.3982, over 20670.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1713, cr_loss=0.3889, over 4119540.41 frames. ], batch size: 68, lr: 5.56e-03, grad_scale: 32.0 2024-09-15 09:26:30,647 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.051e+02 2.145e+02 2.300e+02 3.280e+02, threshold=4.289e+02, percent-clipped=0.0 2024-09-15 09:26:34,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=266024.5, ans=0.04949747468305833 2024-09-15 09:26:49,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=266052.8333333333, ans=0.0 2024-09-15 09:26:52,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=266052.8333333333, ans=0.1 2024-09-15 09:27:14,520 INFO [train.py:1198] (0/2) Epoch 15, batch 4450, loss[loss=0.2442, ctc_loss=0.1659, cr_loss=0.3916, over 20994.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1714, cr_loss=0.3897, over 4122906.49 frames. ], batch size: 55, lr: 5.56e-03, grad_scale: 32.0 2024-09-15 09:27:36,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=266137.8333333333, ans=0.0 2024-09-15 09:27:40,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=266137.8333333333, ans=0.05 2024-09-15 09:27:50,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=15.0 2024-09-15 09:27:57,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=266166.1666666667, ans=0.125 2024-09-15 09:28:06,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=266194.5, ans=0.0 2024-09-15 09:28:13,508 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:28:32,390 INFO [train.py:1198] (0/2) Epoch 15, batch 4500, loss[loss=0.2662, ctc_loss=0.1868, cr_loss=0.3973, over 20982.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1718, cr_loss=0.3901, over 4121171.15 frames. ], batch size: 63, lr: 5.56e-03, grad_scale: 32.0 2024-09-15 09:29:03,705 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.071e+02 2.234e+02 2.438e+02 3.164e+02, threshold=4.468e+02, percent-clipped=0.0 2024-09-15 09:29:39,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=266364.5, ans=0.0 2024-09-15 09:29:47,895 INFO [train.py:1198] (0/2) Epoch 15, batch 4550, loss[loss=0.3044, ctc_loss=0.2216, cr_loss=0.4143, over 14614.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1718, cr_loss=0.3903, over 4094471.73 frames. ], batch size: 149, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:29:54,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=266392.8333333333, ans=0.125 2024-09-15 09:30:43,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2024-09-15 09:31:05,930 INFO [train.py:1198] (0/2) Epoch 15, batch 4600, loss[loss=0.233, ctc_loss=0.1575, cr_loss=0.3776, over 20999.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1719, cr_loss=0.3904, over 4102883.96 frames. ], batch size: 52, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:31:09,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=266534.5, ans=0.0 2024-09-15 09:31:10,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=266534.5, ans=0.1 2024-09-15 09:31:37,678 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.069e+02 2.250e+02 2.460e+02 6.260e+02, threshold=4.500e+02, percent-clipped=2.0 2024-09-15 09:31:45,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=266591.1666666667, ans=0.125 2024-09-15 09:32:14,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=15.0 2024-09-15 09:32:21,219 INFO [train.py:1198] (0/2) Epoch 15, batch 4650, loss[loss=0.2638, ctc_loss=0.1829, cr_loss=0.4046, over 20685.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1719, cr_loss=0.3904, over 4107295.79 frames. ], batch size: 71, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:32:25,317 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-15 09:32:30,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=266676.1666666667, ans=0.1 2024-09-15 09:33:39,286 INFO [train.py:1198] (0/2) Epoch 15, batch 4700, loss[loss=0.2795, ctc_loss=0.1924, cr_loss=0.4353, over 21017.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1715, cr_loss=0.3893, over 4100285.70 frames. ], batch size: 61, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:33:41,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=266817.8333333333, ans=0.0 2024-09-15 09:33:42,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=266817.8333333333, ans=0.1 2024-09-15 09:33:58,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=266846.1666666667, ans=0.2 2024-09-15 09:34:10,612 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.705e+02 2.045e+02 2.175e+02 2.360e+02 3.270e+02, threshold=4.351e+02, percent-clipped=0.0 2024-09-15 09:34:16,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=8.0 2024-09-15 09:34:54,480 INFO [train.py:1198] (0/2) Epoch 15, batch 4750, loss[loss=0.2593, ctc_loss=0.1777, cr_loss=0.4081, over 20210.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1717, cr_loss=0.3898, over 4110677.40 frames. ], batch size: 80, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:35:17,460 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:35:21,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=266987.8333333333, ans=0.0 2024-09-15 09:35:25,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=22.5 2024-09-15 09:35:29,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.39 vs. limit=15.0 2024-09-15 09:35:53,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267072.8333333333, ans=0.1 2024-09-15 09:36:04,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=267072.8333333333, ans=12.0 2024-09-15 09:36:09,993 INFO [train.py:1198] (0/2) Epoch 15, batch 4800, loss[loss=0.28, ctc_loss=0.1962, cr_loss=0.4192, over 17912.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.172, cr_loss=0.3898, over 4096662.49 frames. ], batch size: 108, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:36:11,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=267101.1666666667, ans=0.07 2024-09-15 09:36:18,277 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:36:45,031 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.038e+02 2.153e+02 2.343e+02 3.590e+02, threshold=4.306e+02, percent-clipped=0.0 2024-09-15 09:37:28,278 INFO [train.py:1198] (0/2) Epoch 15, batch 4850, loss[loss=0.25, ctc_loss=0.1687, cr_loss=0.4064, over 20837.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1713, cr_loss=0.3885, over 4087579.44 frames. ], batch size: 59, lr: 5.55e-03, grad_scale: 32.0 2024-09-15 09:37:42,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=267271.1666666667, ans=0.0 2024-09-15 09:38:05,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=267299.5, ans=0.2 2024-09-15 09:38:06,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=267299.5, ans=0.025 2024-09-15 09:38:21,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=267327.8333333333, ans=0.2 2024-09-15 09:38:44,249 INFO [train.py:1198] (0/2) Epoch 15, batch 4900, loss[loss=0.2224, ctc_loss=0.1514, cr_loss=0.3551, over 20897.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1704, cr_loss=0.3872, over 4089005.50 frames. ], batch size: 54, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:38:58,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=267412.8333333333, ans=0.2 2024-09-15 09:39:15,564 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.047e+02 2.157e+02 2.313e+02 3.010e+02, threshold=4.313e+02, percent-clipped=0.0 2024-09-15 09:39:32,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=267469.5, ans=0.0 2024-09-15 09:39:33,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=267469.5, ans=0.125 2024-09-15 09:39:43,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267469.5, ans=0.1 2024-09-15 09:39:46,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2024-09-15 09:40:01,218 INFO [train.py:1198] (0/2) Epoch 15, batch 4950, loss[loss=0.2591, ctc_loss=0.1791, cr_loss=0.4002, over 20960.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1699, cr_loss=0.3871, over 4097093.20 frames. ], batch size: 58, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:40:04,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=267526.1666666667, ans=0.125 2024-09-15 09:40:27,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=267554.5, ans=0.2 2024-09-15 09:40:47,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=267611.1666666667, ans=0.07 2024-09-15 09:41:04,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=267639.5, ans=0.04949747468305833 2024-09-15 09:41:07,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=267639.5, ans=0.025 2024-09-15 09:41:15,867 INFO [train.py:1198] (0/2) Epoch 15, batch 5000, loss[loss=0.2663, ctc_loss=0.1855, cr_loss=0.4042, over 19689.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1704, cr_loss=0.3877, over 4099340.29 frames. ], batch size: 90, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:41:31,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-09-15 09:41:38,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267696.1666666667, ans=0.1 2024-09-15 09:41:46,790 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.086e+02 2.200e+02 2.479e+02 6.714e+02, threshold=4.400e+02, percent-clipped=2.0 2024-09-15 09:42:14,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=267781.1666666667, ans=0.125 2024-09-15 09:42:24,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=267781.1666666667, ans=0.125 2024-09-15 09:42:30,022 INFO [train.py:1198] (0/2) Epoch 15, batch 5050, loss[loss=0.2452, ctc_loss=0.1679, cr_loss=0.3867, over 21086.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1705, cr_loss=0.3883, over 4105235.14 frames. ], batch size: 59, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:42:35,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2024-09-15 09:42:55,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=267837.8333333333, ans=0.1 2024-09-15 09:43:12,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=267894.5, ans=0.0 2024-09-15 09:43:43,720 INFO [train.py:1198] (0/2) Epoch 15, batch 5100, loss[loss=0.2066, ctc_loss=0.1393, cr_loss=0.3363, over 20941.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.17, cr_loss=0.3871, over 4110093.10 frames. ], batch size: 50, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:43:45,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267951.1666666667, ans=0.1 2024-09-15 09:43:47,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=267951.1666666667, ans=0.125 2024-09-15 09:44:07,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=267979.5, ans=0.125 2024-09-15 09:44:15,089 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.040e+02 2.154e+02 2.335e+02 2.871e+02, threshold=4.309e+02, percent-clipped=0.0 2024-09-15 09:44:25,048 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.01 vs. limit=10.0 2024-09-15 09:44:38,180 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 09:44:42,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2024-09-15 09:44:58,382 INFO [train.py:1198] (0/2) Epoch 15, batch 5150, loss[loss=0.2554, ctc_loss=0.174, cr_loss=0.4074, over 20792.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1693, cr_loss=0.3856, over 4104251.76 frames. ], batch size: 53, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:44:59,555 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.10 vs. limit=8.0 2024-09-15 09:45:36,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=268149.5, ans=0.025 2024-09-15 09:45:44,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=268177.8333333333, ans=0.125 2024-09-15 09:45:59,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-15 09:46:07,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=22.5 2024-09-15 09:46:15,333 INFO [train.py:1198] (0/2) Epoch 15, batch 5200, loss[loss=0.319, ctc_loss=0.2352, cr_loss=0.4193, over 14175.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1692, cr_loss=0.3848, over 4106819.98 frames. ], batch size: 149, lr: 5.54e-03, grad_scale: 32.0 2024-09-15 09:46:21,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=268234.5, ans=0.025 2024-09-15 09:46:23,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268234.5, ans=0.1 2024-09-15 09:46:47,549 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.031e+02 2.169e+02 2.299e+02 3.874e+02, threshold=4.339e+02, percent-clipped=0.0 2024-09-15 09:47:08,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=268319.5, ans=0.125 2024-09-15 09:47:08,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=268319.5, ans=0.125 2024-09-15 09:47:17,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268347.8333333333, ans=0.1 2024-09-15 09:47:26,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=268347.8333333333, ans=0.0 2024-09-15 09:47:27,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=268376.1666666667, ans=0.0 2024-09-15 09:47:28,898 INFO [train.py:1198] (0/2) Epoch 15, batch 5250, loss[loss=0.2486, ctc_loss=0.1722, cr_loss=0.3821, over 21024.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1695, cr_loss=0.3854, over 4105681.77 frames. ], batch size: 63, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:48:10,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=268432.8333333333, ans=0.2 2024-09-15 09:48:21,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268461.1666666667, ans=0.1 2024-09-15 09:48:24,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=268461.1666666667, ans=0.125 2024-09-15 09:48:33,363 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-15 09:48:37,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=268489.5, ans=0.2 2024-09-15 09:48:43,303 INFO [train.py:1198] (0/2) Epoch 15, batch 5300, loss[loss=0.2745, ctc_loss=0.1895, cr_loss=0.4251, over 21046.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1697, cr_loss=0.3862, over 4099224.07 frames. ], batch size: 63, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:48:43,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=268517.8333333333, ans=0.0 2024-09-15 09:48:49,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=268517.8333333333, ans=0.125 2024-09-15 09:48:55,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=268517.8333333333, ans=0.0 2024-09-15 09:49:01,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-15 09:49:18,287 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.036e+02 2.143e+02 2.347e+02 3.939e+02, threshold=4.286e+02, percent-clipped=0.0 2024-09-15 09:49:21,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268574.5, ans=0.1 2024-09-15 09:49:56,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=268631.1666666667, ans=0.125 2024-09-15 09:50:00,376 INFO [train.py:1198] (0/2) Epoch 15, batch 5350, loss[loss=0.2576, ctc_loss=0.1785, cr_loss=0.3956, over 21007.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1694, cr_loss=0.3864, over 4095468.47 frames. ], batch size: 61, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:50:05,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268659.5, ans=0.1 2024-09-15 09:50:37,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-09-15 09:51:03,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=268772.8333333333, ans=0.125 2024-09-15 09:51:14,113 INFO [train.py:1198] (0/2) Epoch 15, batch 5400, loss[loss=0.2588, ctc_loss=0.1793, cr_loss=0.3974, over 20687.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1699, cr_loss=0.3871, over 4090196.00 frames. ], batch size: 66, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:51:46,686 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.037e+02 2.142e+02 2.320e+02 9.416e+02, threshold=4.283e+02, percent-clipped=1.0 2024-09-15 09:52:28,267 INFO [train.py:1198] (0/2) Epoch 15, batch 5450, loss[loss=0.2673, ctc_loss=0.1836, cr_loss=0.4186, over 20795.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1694, cr_loss=0.3864, over 4096539.34 frames. ], batch size: 71, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:52:47,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=268971.1666666667, ans=0.125 2024-09-15 09:53:15,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=22.5 2024-09-15 09:53:28,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-09-15 09:53:36,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2024-09-15 09:53:42,727 INFO [train.py:1198] (0/2) Epoch 15, batch 5500, loss[loss=0.2299, ctc_loss=0.1567, cr_loss=0.3661, over 20980.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1696, cr_loss=0.3864, over 4090265.93 frames. ], batch size: 55, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:53:53,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=269084.5, ans=0.125 2024-09-15 09:54:00,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=269112.8333333333, ans=0.0 2024-09-15 09:54:15,071 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.056e+02 2.198e+02 2.340e+02 4.709e+02, threshold=4.396e+02, percent-clipped=1.0 2024-09-15 09:54:42,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=269197.8333333333, ans=0.125 2024-09-15 09:54:50,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=269197.8333333333, ans=0.025 2024-09-15 09:54:51,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=269197.8333333333, ans=0.125 2024-09-15 09:54:59,147 INFO [train.py:1198] (0/2) Epoch 15, batch 5550, loss[loss=0.2586, ctc_loss=0.1764, cr_loss=0.4112, over 20638.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1697, cr_loss=0.3863, over 4085587.94 frames. ], batch size: 66, lr: 5.53e-03, grad_scale: 32.0 2024-09-15 09:55:13,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2024-09-15 09:55:33,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=269282.8333333333, ans=0.125 2024-09-15 09:55:42,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2024-09-15 09:55:58,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=269339.5, ans=0.0 2024-09-15 09:56:13,563 INFO [train.py:1198] (0/2) Epoch 15, batch 5600, loss[loss=0.2686, ctc_loss=0.1814, cr_loss=0.436, over 20862.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1694, cr_loss=0.3861, over 4091021.47 frames. ], batch size: 57, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 09:56:18,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=269367.8333333333, ans=0.0 2024-09-15 09:56:46,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.116e+02 2.356e+02 2.676e+02 4.096e+02, threshold=4.712e+02, percent-clipped=0.0 2024-09-15 09:57:17,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=269481.1666666667, ans=0.125 2024-09-15 09:57:27,850 INFO [train.py:1198] (0/2) Epoch 15, batch 5650, loss[loss=0.2067, ctc_loss=0.1362, cr_loss=0.3527, over 20343.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1699, cr_loss=0.3865, over 4082350.77 frames. ], batch size: 45, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 09:57:47,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=269537.8333333333, ans=0.0 2024-09-15 09:58:12,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=269594.5, ans=15.0 2024-09-15 09:58:24,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269594.5, ans=0.1 2024-09-15 09:58:29,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=269622.8333333333, ans=0.0 2024-09-15 09:58:36,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=269622.8333333333, ans=0.1 2024-09-15 09:58:38,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=269622.8333333333, ans=0.125 2024-09-15 09:58:39,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=269622.8333333333, ans=0.025 2024-09-15 09:58:43,631 INFO [train.py:1198] (0/2) Epoch 15, batch 5700, loss[loss=0.28, ctc_loss=0.1936, cr_loss=0.4323, over 18509.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1702, cr_loss=0.3875, over 4091006.94 frames. ], batch size: 108, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 09:59:15,800 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.146e+02 2.356e+02 2.674e+02 3.246e+02, threshold=4.712e+02, percent-clipped=0.0 2024-09-15 09:59:34,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=22.5 2024-09-15 09:59:46,266 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2024-09-15 09:59:57,459 INFO [train.py:1198] (0/2) Epoch 15, batch 5750, loss[loss=0.2432, ctc_loss=0.1657, cr_loss=0.3875, over 21057.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1704, cr_loss=0.3874, over 4085495.01 frames. ], batch size: 63, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 10:00:00,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=269792.8333333333, ans=0.125 2024-09-15 10:00:03,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=269792.8333333333, ans=0.125 2024-09-15 10:00:04,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=269792.8333333333, ans=0.125 2024-09-15 10:00:24,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=269821.1666666667, ans=0.2 2024-09-15 10:00:43,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=269877.8333333333, ans=0.2 2024-09-15 10:00:53,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=269877.8333333333, ans=0.125 2024-09-15 10:01:10,910 INFO [train.py:1198] (0/2) Epoch 15, batch 5800, loss[loss=0.242, ctc_loss=0.1634, cr_loss=0.393, over 20891.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1696, cr_loss=0.3874, over 4101217.53 frames. ], batch size: 54, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 10:01:43,628 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.003e+02 2.144e+02 2.325e+02 2.916e+02, threshold=4.289e+02, percent-clipped=0.0 2024-09-15 10:01:45,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=269991.1666666667, ans=0.125 2024-09-15 10:01:51,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-09-15 10:01:54,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-09-15 10:02:14,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=12.0 2024-09-15 10:02:23,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2024-09-15 10:02:25,572 INFO [train.py:1198] (0/2) Epoch 15, batch 5850, loss[loss=0.236, ctc_loss=0.159, cr_loss=0.385, over 20889.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1695, cr_loss=0.3877, over 4098594.55 frames. ], batch size: 54, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 10:02:31,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=270076.1666666667, ans=0.0 2024-09-15 10:02:37,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=270076.1666666667, ans=0.1 2024-09-15 10:02:40,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=270104.5, ans=0.0 2024-09-15 10:02:48,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=270104.5, ans=0.035 2024-09-15 10:03:12,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=270161.1666666667, ans=22.5 2024-09-15 10:03:42,162 INFO [train.py:1198] (0/2) Epoch 15, batch 5900, loss[loss=0.2424, ctc_loss=0.1648, cr_loss=0.388, over 21015.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1702, cr_loss=0.3884, over 4102159.43 frames. ], batch size: 63, lr: 5.52e-03, grad_scale: 32.0 2024-09-15 10:04:08,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=270246.1666666667, ans=0.125 2024-09-15 10:04:15,039 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.105e+02 2.306e+02 2.612e+02 5.347e+02, threshold=4.612e+02, percent-clipped=1.0 2024-09-15 10:04:20,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-15 10:04:26,099 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=22.5 2024-09-15 10:04:32,304 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.36 vs. limit=6.0 2024-09-15 10:04:56,750 INFO [train.py:1198] (0/2) Epoch 15, batch 5950, loss[loss=0.2705, ctc_loss=0.1879, cr_loss=0.4132, over 19342.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.17, cr_loss=0.3871, over 4085816.00 frames. ], batch size: 90, lr: 5.51e-03, grad_scale: 32.0 2024-09-15 10:05:01,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=270359.5, ans=0.125 2024-09-15 10:05:35,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=270416.1666666667, ans=0.07 2024-09-15 10:05:36,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=22.5 2024-09-15 10:05:40,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=270444.5, ans=0.125 2024-09-15 10:05:47,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=270444.5, ans=0.0 2024-09-15 10:06:00,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-15 10:06:11,384 INFO [train.py:1198] (0/2) Epoch 15, batch 6000, loss[loss=0.2392, ctc_loss=0.1629, cr_loss=0.3815, over 20975.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1709, cr_loss=0.3881, over 4078694.43 frames. ], batch size: 58, lr: 5.51e-03, grad_scale: 32.0 2024-09-15 10:06:11,385 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 10:06:31,965 INFO [train.py:1230] (0/2) Epoch 15, validation: loss=0.04684, ctc_loss=0.04684, cr_loss=1.007e-14, over 944034.00 frames. 2024-09-15 10:06:31,966 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 10:06:48,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=270529.5, ans=0.2 2024-09-15 10:06:48,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=270529.5, ans=0.0 2024-09-15 10:07:04,359 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.048e+02 2.187e+02 2.421e+02 3.370e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-15 10:07:27,269 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-15 10:07:45,963 INFO [train.py:1198] (0/2) Epoch 15, batch 6050, loss[loss=0.2842, ctc_loss=0.1982, cr_loss=0.43, over 19695.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1708, cr_loss=0.3879, over 4083169.84 frames. ], batch size: 90, lr: 5.51e-03, grad_scale: 32.0 2024-09-15 10:07:52,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=270642.8333333333, ans=0.125 2024-09-15 10:07:59,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=270642.8333333333, ans=15.0 2024-09-15 10:08:02,547 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-15 10:08:31,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=270727.8333333333, ans=0.125 2024-09-15 10:08:40,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=270727.8333333333, ans=0.0 2024-09-15 10:09:00,481 INFO [train.py:1198] (0/2) Epoch 15, batch 6100, loss[loss=0.2463, ctc_loss=0.1687, cr_loss=0.3881, over 20755.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1712, cr_loss=0.3886, over 4079999.97 frames. ], batch size: 71, lr: 5.51e-03, grad_scale: 32.0 2024-09-15 10:09:32,814 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.165e+02 2.332e+02 2.616e+02 4.520e+02, threshold=4.663e+02, percent-clipped=1.0 2024-09-15 10:10:09,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=270897.8333333333, ans=0.025 2024-09-15 10:10:14,429 INFO [train.py:1198] (0/2) Epoch 15, batch 6150, loss[loss=0.2322, ctc_loss=0.1586, cr_loss=0.3679, over 20983.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1716, cr_loss=0.3886, over 4062557.14 frames. ], batch size: 52, lr: 5.51e-03, grad_scale: 16.0 2024-09-15 10:10:42,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=270982.8333333333, ans=0.05 2024-09-15 10:10:45,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=270982.8333333333, ans=0.0 2024-09-15 10:11:01,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2024-09-15 10:11:28,266 INFO [train.py:1198] (0/2) Epoch 15, batch 6200, loss[loss=0.2487, ctc_loss=0.172, cr_loss=0.3838, over 21045.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.171, cr_loss=0.3872, over 4046728.21 frames. ], batch size: 62, lr: 5.51e-03, grad_scale: 16.0 2024-09-15 10:12:01,619 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.098e+02 2.237e+02 2.445e+02 4.631e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-15 10:12:09,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=271124.5, ans=0.125 2024-09-15 10:12:38,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=271181.1666666667, ans=0.1 2024-09-15 10:12:41,276 INFO [train.py:1198] (0/2) Epoch 15, batch 6250, loss[loss=0.3027, ctc_loss=0.223, cr_loss=0.3987, over 13960.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1717, cr_loss=0.3873, over 4018504.29 frames. ], batch size: 150, lr: 5.51e-03, grad_scale: 16.0 2024-09-15 10:12:56,840 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-15 10:12:59,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=271237.8333333333, ans=0.025 2024-09-15 10:13:12,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271266.1666666667, ans=0.1 2024-09-15 10:13:54,228 INFO [train.py:1198] (0/2) Epoch 15, batch 6300, loss[loss=0.2627, ctc_loss=0.1826, cr_loss=0.4007, over 21023.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1711, cr_loss=0.3865, over 4016658.19 frames. ], batch size: 62, lr: 5.50e-03, grad_scale: 16.0 2024-09-15 10:14:05,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=271351.1666666667, ans=0.025 2024-09-15 10:14:20,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=271379.5, ans=0.125 2024-09-15 10:14:20,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=271379.5, ans=0.0 2024-09-15 10:14:27,380 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.189e+02 2.351e+02 2.723e+02 3.733e+02, threshold=4.703e+02, percent-clipped=0.0 2024-09-15 10:14:34,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=271407.8333333333, ans=0.035 2024-09-15 10:14:36,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=271436.1666666667, ans=0.02 2024-09-15 10:14:59,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=271464.5, ans=0.0 2024-09-15 10:15:02,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=271464.5, ans=0.025 2024-09-15 10:15:06,321 INFO [train.py:1198] (0/2) Epoch 15, batch 6350, loss[loss=0.2961, ctc_loss=0.2091, cr_loss=0.435, over 14042.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1762, cr_loss=0.3887, over 3849238.89 frames. ], batch size: 149, lr: 5.50e-03, grad_scale: 16.0 2024-09-15 10:15:07,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.25 vs. limit=15.0 2024-09-15 10:15:08,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=271492.8333333333, ans=0.125 2024-09-15 10:15:16,603 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 10:15:19,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=271521.1666666667, ans=0.125 2024-09-15 10:15:46,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=271549.5, ans=0.125 2024-09-15 10:16:04,902 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-15.pt 2024-09-15 10:16:53,443 INFO [train.py:1198] (0/2) Epoch 16, batch 0, loss[loss=0.2667, ctc_loss=0.1864, cr_loss=0.4019, over 20689.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.1864, cr_loss=0.4019, over 20689.00 frames. ], batch size: 68, lr: 5.33e-03, grad_scale: 32.0 2024-09-15 10:16:53,445 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 10:17:11,507 INFO [train.py:1230] (0/2) Epoch 16, validation: loss=0.04762, ctc_loss=0.04762, cr_loss=1.01e-14, over 944034.00 frames. 2024-09-15 10:17:11,508 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 10:17:11,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=271609.0, ans=0.2 2024-09-15 10:17:19,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=271609.0, ans=0.0 2024-09-15 10:17:24,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=271609.0, ans=0.0 2024-09-15 10:17:28,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=271637.3333333333, ans=0.2 2024-09-15 10:17:43,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=271665.6666666667, ans=0.125 2024-09-15 10:18:00,333 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.342e+02 2.569e+02 2.810e+02 3.882e+02, threshold=5.139e+02, percent-clipped=0.0 2024-09-15 10:18:16,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.53 vs. limit=15.0 2024-09-15 10:18:17,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2024-09-15 10:18:24,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=271722.3333333333, ans=0.1 2024-09-15 10:18:27,543 INFO [train.py:1198] (0/2) Epoch 16, batch 50, loss[loss=0.2467, ctc_loss=0.1756, cr_loss=0.3558, over 20866.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.169, cr_loss=0.3849, over 917753.07 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 32.0 2024-09-15 10:18:50,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=22.5 2024-09-15 10:19:14,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=271835.6666666667, ans=0.125 2024-09-15 10:19:45,715 INFO [train.py:1198] (0/2) Epoch 16, batch 100, loss[loss=0.2503, ctc_loss=0.1704, cr_loss=0.3992, over 21056.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1696, cr_loss=0.387, over 1619454.48 frames. ], batch size: 62, lr: 5.32e-03, grad_scale: 32.0 2024-09-15 10:20:21,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=271949.0, ans=0.125 2024-09-15 10:20:22,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=271949.0, ans=0.07 2024-09-15 10:20:33,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=271977.3333333333, ans=0.125 2024-09-15 10:20:34,308 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 1.999e+02 2.139e+02 2.310e+02 4.100e+02, threshold=4.277e+02, percent-clipped=0.0 2024-09-15 10:20:36,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=271977.3333333333, ans=0.125 2024-09-15 10:20:41,959 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-96000.pt 2024-09-15 10:21:02,490 INFO [train.py:1198] (0/2) Epoch 16, batch 150, loss[loss=0.2277, ctc_loss=0.1544, cr_loss=0.3662, over 21074.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1686, cr_loss=0.3861, over 2175960.89 frames. ], batch size: 56, lr: 5.32e-03, grad_scale: 32.0 2024-09-15 10:21:28,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=272062.3333333333, ans=0.125 2024-09-15 10:21:46,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=272119.0, ans=0.1 2024-09-15 10:22:17,837 INFO [train.py:1198] (0/2) Epoch 16, batch 200, loss[loss=0.2389, ctc_loss=0.1651, cr_loss=0.3688, over 20959.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1694, cr_loss=0.3877, over 2600815.73 frames. ], batch size: 55, lr: 5.32e-03, grad_scale: 32.0 2024-09-15 10:22:21,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=272175.6666666667, ans=0.04949747468305833 2024-09-15 10:22:45,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=272204.0, ans=0.0 2024-09-15 10:22:45,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=272204.0, ans=0.125 2024-09-15 10:23:09,223 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.114e+02 2.225e+02 2.373e+02 3.012e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-15 10:23:23,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=272289.0, ans=0.0 2024-09-15 10:23:35,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=272317.3333333333, ans=0.125 2024-09-15 10:23:35,374 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 10:23:36,530 INFO [train.py:1198] (0/2) Epoch 16, batch 250, loss[loss=0.264, ctc_loss=0.1826, cr_loss=0.4071, over 20666.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1691, cr_loss=0.3876, over 2944871.33 frames. ], batch size: 66, lr: 5.32e-03, grad_scale: 32.0 2024-09-15 10:23:59,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=272345.6666666667, ans=0.125 2024-09-15 10:24:05,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=272374.0, ans=0.125 2024-09-15 10:24:14,802 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 10:24:26,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=272402.3333333333, ans=0.1 2024-09-15 10:24:35,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=272430.6666666667, ans=0.0 2024-09-15 10:24:52,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=272430.6666666667, ans=0.125 2024-09-15 10:24:54,972 INFO [train.py:1198] (0/2) Epoch 16, batch 300, loss[loss=0.2462, ctc_loss=0.1672, cr_loss=0.3947, over 21042.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1686, cr_loss=0.387, over 3203660.35 frames. ], batch size: 56, lr: 5.32e-03, grad_scale: 16.0 2024-09-15 10:24:59,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=272459.0, ans=0.125 2024-09-15 10:25:39,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.41 vs. limit=10.0 2024-09-15 10:25:44,703 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.092e+02 2.244e+02 2.386e+02 3.188e+02, threshold=4.488e+02, percent-clipped=0.0 2024-09-15 10:25:48,833 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2024-09-15 10:26:10,436 INFO [train.py:1198] (0/2) Epoch 16, batch 350, loss[loss=0.2937, ctc_loss=0.2045, cr_loss=0.4463, over 20071.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1689, cr_loss=0.3873, over 3409942.34 frames. ], batch size: 80, lr: 5.32e-03, grad_scale: 16.0 2024-09-15 10:26:12,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=272600.6666666667, ans=0.0 2024-09-15 10:26:19,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=272600.6666666667, ans=0.0 2024-09-15 10:26:31,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=272629.0, ans=0.125 2024-09-15 10:26:43,853 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 10:27:18,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-09-15 10:27:25,107 INFO [train.py:1198] (0/2) Epoch 16, batch 400, loss[loss=0.2277, ctc_loss=0.156, cr_loss=0.3586, over 20975.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1683, cr_loss=0.3864, over 3563818.27 frames. ], batch size: 55, lr: 5.31e-03, grad_scale: 32.0 2024-09-15 10:27:47,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=272770.6666666667, ans=0.125 2024-09-15 10:28:18,134 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.028e+02 2.168e+02 2.321e+02 3.493e+02, threshold=4.336e+02, percent-clipped=0.0 2024-09-15 10:28:27,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=272855.6666666667, ans=0.5 2024-09-15 10:28:34,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=272855.6666666667, ans=0.125 2024-09-15 10:28:40,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=272855.6666666667, ans=0.025 2024-09-15 10:28:43,422 INFO [train.py:1198] (0/2) Epoch 16, batch 450, loss[loss=0.2463, ctc_loss=0.1684, cr_loss=0.3896, over 20868.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.169, cr_loss=0.3875, over 3677087.09 frames. ], batch size: 54, lr: 5.31e-03, grad_scale: 32.0 2024-09-15 10:28:54,255 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 10:28:55,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=272884.0, ans=0.0 2024-09-15 10:29:03,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=272912.3333333333, ans=0.125 2024-09-15 10:29:24,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=22.5 2024-09-15 10:29:58,285 INFO [train.py:1198] (0/2) Epoch 16, batch 500, loss[loss=0.2775, ctc_loss=0.193, cr_loss=0.4224, over 20672.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1688, cr_loss=0.387, over 3776840.66 frames. ], batch size: 71, lr: 5.31e-03, grad_scale: 32.0 2024-09-15 10:30:23,095 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=12.0 2024-09-15 10:30:26,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.99 vs. limit=10.0 2024-09-15 10:30:50,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.028e+02 2.192e+02 2.340e+02 4.949e+02, threshold=4.384e+02, percent-clipped=1.0 2024-09-15 10:31:09,640 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2024-09-15 10:31:13,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=273139.0, ans=0.1 2024-09-15 10:31:16,493 INFO [train.py:1198] (0/2) Epoch 16, batch 550, loss[loss=0.2471, ctc_loss=0.1685, cr_loss=0.3931, over 20880.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1686, cr_loss=0.3872, over 3858931.24 frames. ], batch size: 54, lr: 5.31e-03, grad_scale: 32.0 2024-09-15 10:31:36,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=273195.6666666667, ans=0.1 2024-09-15 10:31:38,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.87 vs. limit=15.0 2024-09-15 10:31:53,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=273224.0, ans=0.125 2024-09-15 10:31:55,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-15 10:32:15,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=273280.6666666667, ans=0.125 2024-09-15 10:32:28,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-09-15 10:32:32,324 INFO [train.py:1198] (0/2) Epoch 16, batch 600, loss[loss=0.237, ctc_loss=0.1643, cr_loss=0.3635, over 21040.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1686, cr_loss=0.3871, over 3904528.70 frames. ], batch size: 62, lr: 5.31e-03, grad_scale: 32.0 2024-09-15 10:32:41,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=273309.0, ans=0.0 2024-09-15 10:33:24,547 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.071e+02 2.221e+02 2.482e+02 4.369e+02, threshold=4.442e+02, percent-clipped=0.0 2024-09-15 10:33:34,243 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-09-15 10:33:40,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=273422.3333333333, ans=0.035 2024-09-15 10:33:44,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=273422.3333333333, ans=0.04949747468305833 2024-09-15 10:33:48,830 INFO [train.py:1198] (0/2) Epoch 16, batch 650, loss[loss=0.2652, ctc_loss=0.1842, cr_loss=0.4052, over 20960.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1689, cr_loss=0.3874, over 3951950.51 frames. ], batch size: 64, lr: 5.31e-03, grad_scale: 16.0 2024-09-15 10:34:16,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=273479.0, ans=0.125 2024-09-15 10:34:24,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=273507.3333333333, ans=0.125 2024-09-15 10:34:30,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=273507.3333333333, ans=0.125 2024-09-15 10:34:43,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=273535.6666666667, ans=0.125 2024-09-15 10:34:48,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=22.5 2024-09-15 10:34:55,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=273564.0, ans=0.125 2024-09-15 10:34:58,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=273564.0, ans=0.0 2024-09-15 10:35:07,476 INFO [train.py:1198] (0/2) Epoch 16, batch 700, loss[loss=0.2657, ctc_loss=0.1833, cr_loss=0.4117, over 21006.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1703, cr_loss=0.3885, over 3953104.23 frames. ], batch size: 63, lr: 5.31e-03, grad_scale: 16.0 2024-09-15 10:35:21,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=273620.6666666667, ans=0.125 2024-09-15 10:35:21,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=273620.6666666667, ans=0.0 2024-09-15 10:35:50,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.92 vs. limit=22.5 2024-09-15 10:35:55,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=273677.3333333333, ans=0.0 2024-09-15 10:35:55,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=273677.3333333333, ans=0.1 2024-09-15 10:35:58,389 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.060e+02 2.179e+02 2.435e+02 5.198e+02, threshold=4.358e+02, percent-clipped=1.0 2024-09-15 10:36:25,819 INFO [train.py:1198] (0/2) Epoch 16, batch 750, loss[loss=0.2628, ctc_loss=0.1781, cr_loss=0.4235, over 20640.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1704, cr_loss=0.3883, over 3985059.72 frames. ], batch size: 66, lr: 5.30e-03, grad_scale: 16.0 2024-09-15 10:37:15,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=273819.0, ans=0.125 2024-09-15 10:37:17,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=273819.0, ans=0.125 2024-09-15 10:37:23,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=273819.0, ans=0.125 2024-09-15 10:37:40,945 INFO [train.py:1198] (0/2) Epoch 16, batch 800, loss[loss=0.2112, ctc_loss=0.1421, cr_loss=0.3455, over 19871.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1703, cr_loss=0.3889, over 4006236.46 frames. ], batch size: 44, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:37:46,206 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-15 10:38:32,337 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.024e+02 2.175e+02 2.334e+02 3.062e+02, threshold=4.350e+02, percent-clipped=0.0 2024-09-15 10:38:56,007 INFO [train.py:1198] (0/2) Epoch 16, batch 850, loss[loss=0.2144, ctc_loss=0.1444, cr_loss=0.3503, over 20986.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1697, cr_loss=0.3883, over 4026718.78 frames. ], batch size: 51, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:39:38,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=274074.0, ans=0.125 2024-09-15 10:39:49,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=274102.3333333333, ans=0.125 2024-09-15 10:40:10,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=274130.6666666667, ans=0.125 2024-09-15 10:40:14,864 INFO [train.py:1198] (0/2) Epoch 16, batch 900, loss[loss=0.2417, ctc_loss=0.165, cr_loss=0.3834, over 20878.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1703, cr_loss=0.3891, over 4040644.05 frames. ], batch size: 54, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:40:39,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=274187.3333333333, ans=0.125 2024-09-15 10:40:42,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=274187.3333333333, ans=0.125 2024-09-15 10:41:06,498 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.058e+02 2.209e+02 2.546e+02 4.940e+02, threshold=4.417e+02, percent-clipped=2.0 2024-09-15 10:41:30,589 INFO [train.py:1198] (0/2) Epoch 16, batch 950, loss[loss=0.2549, ctc_loss=0.1728, cr_loss=0.4109, over 20750.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1699, cr_loss=0.3883, over 4060904.44 frames. ], batch size: 71, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:42:48,873 INFO [train.py:1198] (0/2) Epoch 16, batch 1000, loss[loss=0.3091, ctc_loss=0.2208, cr_loss=0.4418, over 14564.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1706, cr_loss=0.3893, over 4059992.62 frames. ], batch size: 149, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:43:25,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=274499.0, ans=0.125 2024-09-15 10:43:39,717 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.082e+02 2.262e+02 2.543e+02 4.405e+02, threshold=4.523e+02, percent-clipped=0.0 2024-09-15 10:44:04,050 INFO [train.py:1198] (0/2) Epoch 16, batch 1050, loss[loss=0.2326, ctc_loss=0.1567, cr_loss=0.3797, over 21045.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1705, cr_loss=0.3898, over 4078436.70 frames. ], batch size: 62, lr: 5.30e-03, grad_scale: 32.0 2024-09-15 10:44:05,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=274584.0, ans=0.125 2024-09-15 10:44:16,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=274584.0, ans=0.125 2024-09-15 10:44:37,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=274640.6666666667, ans=0.125 2024-09-15 10:45:05,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=274697.3333333333, ans=0.125 2024-09-15 10:45:21,813 INFO [train.py:1198] (0/2) Epoch 16, batch 1100, loss[loss=0.2705, ctc_loss=0.1846, cr_loss=0.4295, over 20960.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1713, cr_loss=0.3909, over 4062517.04 frames. ], batch size: 64, lr: 5.30e-03, grad_scale: 16.0 2024-09-15 10:46:06,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=274810.6666666667, ans=0.125 2024-09-15 10:46:14,275 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.064e+02 2.211e+02 2.356e+02 4.647e+02, threshold=4.423e+02, percent-clipped=1.0 2024-09-15 10:46:37,234 INFO [train.py:1198] (0/2) Epoch 16, batch 1150, loss[loss=0.2528, ctc_loss=0.1719, cr_loss=0.4044, over 20652.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1701, cr_loss=0.3894, over 4068112.81 frames. ], batch size: 66, lr: 5.29e-03, grad_scale: 16.0 2024-09-15 10:46:44,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=274867.3333333333, ans=0.125 2024-09-15 10:47:04,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=274895.6666666667, ans=0.125 2024-09-15 10:47:28,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=274952.3333333333, ans=0.1 2024-09-15 10:47:33,437 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 10:47:50,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=22.5 2024-09-15 10:47:55,697 INFO [train.py:1198] (0/2) Epoch 16, batch 1200, loss[loss=0.2604, ctc_loss=0.1814, cr_loss=0.3952, over 20704.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1694, cr_loss=0.3885, over 4081542.86 frames. ], batch size: 71, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:47:59,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275009.0, ans=0.1 2024-09-15 10:48:16,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=12.0 2024-09-15 10:48:48,510 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.711e+02 2.017e+02 2.151e+02 2.331e+02 3.562e+02, threshold=4.302e+02, percent-clipped=0.0 2024-09-15 10:48:54,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=275122.3333333333, ans=0.125 2024-09-15 10:49:05,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=275122.3333333333, ans=0.125 2024-09-15 10:49:11,227 INFO [train.py:1198] (0/2) Epoch 16, batch 1250, loss[loss=0.2306, ctc_loss=0.1607, cr_loss=0.3495, over 21008.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1687, cr_loss=0.387, over 4094595.17 frames. ], batch size: 61, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:49:29,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=275179.0, ans=0.0 2024-09-15 10:50:07,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=275235.6666666667, ans=0.125 2024-09-15 10:50:10,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=275264.0, ans=0.125 2024-09-15 10:50:27,025 INFO [train.py:1198] (0/2) Epoch 16, batch 1300, loss[loss=0.2131, ctc_loss=0.1443, cr_loss=0.344, over 20892.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1685, cr_loss=0.3865, over 4098555.40 frames. ], batch size: 54, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:50:59,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=275349.0, ans=15.0 2024-09-15 10:51:22,984 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.146e+02 2.311e+02 2.483e+02 3.541e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-15 10:51:26,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=275377.3333333333, ans=0.125 2024-09-15 10:51:27,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=275377.3333333333, ans=0.2 2024-09-15 10:51:30,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=275405.6666666667, ans=0.2 2024-09-15 10:51:45,665 INFO [train.py:1198] (0/2) Epoch 16, batch 1350, loss[loss=0.2577, ctc_loss=0.178, cr_loss=0.3986, over 20222.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1694, cr_loss=0.3876, over 4087127.90 frames. ], batch size: 74, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:51:48,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=275434.0, ans=0.2 2024-09-15 10:52:22,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=275490.6666666667, ans=0.0 2024-09-15 10:52:35,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=275519.0, ans=0.125 2024-09-15 10:52:42,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=275519.0, ans=0.125 2024-09-15 10:52:47,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.02 vs. limit=10.0 2024-09-15 10:53:01,498 INFO [train.py:1198] (0/2) Epoch 16, batch 1400, loss[loss=0.2286, ctc_loss=0.1575, cr_loss=0.3556, over 21010.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1699, cr_loss=0.3889, over 4096816.65 frames. ], batch size: 61, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:53:42,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.00 vs. limit=22.5 2024-09-15 10:53:56,900 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.136e+02 2.287e+02 2.701e+02 4.168e+02, threshold=4.574e+02, percent-clipped=0.0 2024-09-15 10:54:08,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=275689.0, ans=0.07 2024-09-15 10:54:19,469 INFO [train.py:1198] (0/2) Epoch 16, batch 1450, loss[loss=0.217, ctc_loss=0.1446, cr_loss=0.3619, over 21052.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1694, cr_loss=0.3887, over 4103280.43 frames. ], batch size: 53, lr: 5.29e-03, grad_scale: 32.0 2024-09-15 10:54:35,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-09-15 10:54:39,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=275745.6666666667, ans=0.04949747468305833 2024-09-15 10:55:06,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-15 10:55:09,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=12.0 2024-09-15 10:55:15,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=275802.3333333333, ans=0.0 2024-09-15 10:55:34,519 INFO [train.py:1198] (0/2) Epoch 16, batch 1500, loss[loss=0.2393, ctc_loss=0.1647, cr_loss=0.3734, over 20839.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1692, cr_loss=0.3879, over 4103351.39 frames. ], batch size: 59, lr: 5.28e-03, grad_scale: 32.0 2024-09-15 10:55:38,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=22.5 2024-09-15 10:56:02,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=275887.3333333333, ans=0.05 2024-09-15 10:56:14,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.92 vs. limit=15.0 2024-09-15 10:56:27,157 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.040e+02 2.152e+02 2.370e+02 3.364e+02, threshold=4.304e+02, percent-clipped=0.0 2024-09-15 10:56:27,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=275944.0, ans=0.0 2024-09-15 10:56:52,714 INFO [train.py:1198] (0/2) Epoch 16, batch 1550, loss[loss=0.2762, ctc_loss=0.1928, cr_loss=0.4173, over 20693.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.169, cr_loss=0.3873, over 4104556.73 frames. ], batch size: 66, lr: 5.28e-03, grad_scale: 32.0 2024-09-15 10:57:35,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=276057.3333333333, ans=0.125 2024-09-15 10:57:36,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=276085.6666666667, ans=0.125 2024-09-15 10:58:08,069 INFO [train.py:1198] (0/2) Epoch 16, batch 1600, loss[loss=0.2504, ctc_loss=0.1726, cr_loss=0.389, over 20676.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1699, cr_loss=0.388, over 4097181.46 frames. ], batch size: 68, lr: 5.28e-03, grad_scale: 32.0 2024-09-15 10:58:20,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=276142.3333333333, ans=0.025 2024-09-15 10:59:05,763 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.010e+02 2.161e+02 2.414e+02 4.031e+02, threshold=4.322e+02, percent-clipped=0.0 2024-09-15 10:59:09,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-15 10:59:26,801 INFO [train.py:1198] (0/2) Epoch 16, batch 1650, loss[loss=0.2416, ctc_loss=0.1658, cr_loss=0.3789, over 20666.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1685, cr_loss=0.3861, over 4114159.60 frames. ], batch size: 71, lr: 5.28e-03, grad_scale: 16.0 2024-09-15 10:59:30,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=276284.0, ans=0.1 2024-09-15 11:00:07,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=276340.6666666667, ans=0.125 2024-09-15 11:00:41,681 INFO [train.py:1198] (0/2) Epoch 16, batch 1700, loss[loss=0.274, ctc_loss=0.1935, cr_loss=0.4026, over 20008.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1693, cr_loss=0.387, over 4105701.93 frames. ], batch size: 80, lr: 5.28e-03, grad_scale: 16.0 2024-09-15 11:00:57,102 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 11:01:26,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=276510.6666666667, ans=0.125 2024-09-15 11:01:36,308 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.054e+02 2.208e+02 2.376e+02 3.024e+02, threshold=4.416e+02, percent-clipped=0.0 2024-09-15 11:01:44,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-15 11:01:47,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=276539.0, ans=0.0 2024-09-15 11:01:49,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=276539.0, ans=0.0 2024-09-15 11:01:53,268 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.90 vs. limit=6.0 2024-09-15 11:01:57,117 INFO [train.py:1198] (0/2) Epoch 16, batch 1750, loss[loss=0.2904, ctc_loss=0.2004, cr_loss=0.4502, over 21054.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1705, cr_loss=0.3888, over 4098095.26 frames. ], batch size: 62, lr: 5.28e-03, grad_scale: 16.0 2024-09-15 11:01:57,883 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-09-15 11:02:04,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=276567.3333333333, ans=0.125 2024-09-15 11:02:12,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=276595.6666666667, ans=0.125 2024-09-15 11:02:22,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=276595.6666666667, ans=0.0 2024-09-15 11:03:15,010 INFO [train.py:1198] (0/2) Epoch 16, batch 1800, loss[loss=0.2328, ctc_loss=0.157, cr_loss=0.3792, over 20785.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1701, cr_loss=0.3884, over 4100807.39 frames. ], batch size: 53, lr: 5.28e-03, grad_scale: 16.0 2024-09-15 11:03:30,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=276737.3333333333, ans=0.125 2024-09-15 11:03:37,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=276737.3333333333, ans=0.125 2024-09-15 11:03:54,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=276765.6666666667, ans=0.1 2024-09-15 11:04:09,302 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.054e+02 2.158e+02 2.340e+02 3.233e+02, threshold=4.315e+02, percent-clipped=0.0 2024-09-15 11:04:17,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=276822.3333333333, ans=0.0 2024-09-15 11:04:20,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=276822.3333333333, ans=0.0 2024-09-15 11:04:30,374 INFO [train.py:1198] (0/2) Epoch 16, batch 1850, loss[loss=0.2547, ctc_loss=0.1754, cr_loss=0.3965, over 20677.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1703, cr_loss=0.3883, over 4089750.82 frames. ], batch size: 66, lr: 5.27e-03, grad_scale: 16.0 2024-09-15 11:04:30,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=276850.6666666667, ans=0.125 2024-09-15 11:04:54,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=276879.0, ans=0.125 2024-09-15 11:05:28,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.06 vs. limit=15.0 2024-09-15 11:05:41,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-15 11:05:48,762 INFO [train.py:1198] (0/2) Epoch 16, batch 1900, loss[loss=0.2369, ctc_loss=0.1612, cr_loss=0.3785, over 20829.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1705, cr_loss=0.3886, over 4089256.54 frames. ], batch size: 59, lr: 5.27e-03, grad_scale: 16.0 2024-09-15 11:05:55,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=276992.3333333333, ans=0.1 2024-09-15 11:05:55,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=276992.3333333333, ans=0.125 2024-09-15 11:06:21,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2024-09-15 11:06:28,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=277049.0, ans=0.125 2024-09-15 11:06:43,220 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.018e+02 2.183e+02 2.379e+02 4.800e+02, threshold=4.365e+02, percent-clipped=1.0 2024-09-15 11:07:04,355 INFO [train.py:1198] (0/2) Epoch 16, batch 1950, loss[loss=0.2645, ctc_loss=0.1791, cr_loss=0.4266, over 20930.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1706, cr_loss=0.389, over 4092390.34 frames. ], batch size: 60, lr: 5.27e-03, grad_scale: 16.0 2024-09-15 11:07:16,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=277134.0, ans=0.125 2024-09-15 11:07:46,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=277190.6666666667, ans=0.0 2024-09-15 11:08:22,183 INFO [train.py:1198] (0/2) Epoch 16, batch 2000, loss[loss=0.2477, ctc_loss=0.1694, cr_loss=0.3917, over 20137.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.17, cr_loss=0.3888, over 4087708.91 frames. ], batch size: 80, lr: 5.27e-03, grad_scale: 32.0 2024-09-15 11:08:33,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277275.6666666667, ans=0.1 2024-09-15 11:08:42,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=277304.0, ans=0.125 2024-09-15 11:09:16,386 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.127e+02 2.263e+02 2.565e+02 4.364e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-15 11:09:33,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=277389.0, ans=0.125 2024-09-15 11:09:37,735 INFO [train.py:1198] (0/2) Epoch 16, batch 2050, loss[loss=0.2238, ctc_loss=0.1512, cr_loss=0.3632, over 20803.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1694, cr_loss=0.3874, over 4095146.67 frames. ], batch size: 53, lr: 5.27e-03, grad_scale: 32.0 2024-09-15 11:09:41,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=277417.3333333333, ans=0.125 2024-09-15 11:09:47,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=277417.3333333333, ans=0.025 2024-09-15 11:09:52,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=277445.6666666667, ans=0.125 2024-09-15 11:09:56,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=277445.6666666667, ans=0.125 2024-09-15 11:10:05,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=277445.6666666667, ans=0.2 2024-09-15 11:10:08,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=277474.0, ans=0.125 2024-09-15 11:10:27,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=277502.3333333333, ans=0.125 2024-09-15 11:10:29,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=277502.3333333333, ans=0.125 2024-09-15 11:10:39,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=277530.6666666667, ans=0.125 2024-09-15 11:10:55,895 INFO [train.py:1198] (0/2) Epoch 16, batch 2100, loss[loss=0.2375, ctc_loss=0.1607, cr_loss=0.3844, over 20813.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1694, cr_loss=0.3878, over 4098997.58 frames. ], batch size: 53, lr: 5.27e-03, grad_scale: 32.0 2024-09-15 11:11:02,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=277559.0, ans=0.035 2024-09-15 11:11:09,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=277587.3333333333, ans=0.125 2024-09-15 11:11:12,990 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-15 11:11:46,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-09-15 11:11:50,195 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.070e+02 2.230e+02 2.466e+02 3.838e+02, threshold=4.460e+02, percent-clipped=0.0 2024-09-15 11:12:06,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=277672.3333333333, ans=0.125 2024-09-15 11:12:11,200 INFO [train.py:1198] (0/2) Epoch 16, batch 2150, loss[loss=0.2444, ctc_loss=0.1667, cr_loss=0.3887, over 21069.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1703, cr_loss=0.3885, over 4091307.79 frames. ], batch size: 56, lr: 5.27e-03, grad_scale: 32.0 2024-09-15 11:13:07,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=277785.6666666667, ans=0.0 2024-09-15 11:13:23,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277814.0, ans=0.1 2024-09-15 11:13:26,454 INFO [train.py:1198] (0/2) Epoch 16, batch 2200, loss[loss=0.2577, ctc_loss=0.1755, cr_loss=0.4108, over 20956.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1698, cr_loss=0.3873, over 4086390.56 frames. ], batch size: 64, lr: 5.27e-03, grad_scale: 32.0 2024-09-15 11:13:26,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=277842.3333333333, ans=0.125 2024-09-15 11:14:23,376 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.754e+02 2.076e+02 2.237e+02 2.356e+02 2.843e+02, threshold=4.474e+02, percent-clipped=0.0 2024-09-15 11:14:25,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=277927.3333333333, ans=0.125 2024-09-15 11:14:44,523 INFO [train.py:1198] (0/2) Epoch 16, batch 2250, loss[loss=0.2191, ctc_loss=0.1494, cr_loss=0.3482, over 19945.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1698, cr_loss=0.3881, over 4090982.39 frames. ], batch size: 44, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:14:44,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=277984.0, ans=0.2 2024-09-15 11:14:49,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-15 11:14:51,208 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-15 11:14:54,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=8.0 2024-09-15 11:15:04,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2024-09-15 11:15:21,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=278040.6666666667, ans=0.0 2024-09-15 11:15:23,061 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-15 11:15:59,980 INFO [train.py:1198] (0/2) Epoch 16, batch 2300, loss[loss=0.2221, ctc_loss=0.152, cr_loss=0.3506, over 20963.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1686, cr_loss=0.3869, over 4092055.49 frames. ], batch size: 50, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:16:21,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2024-09-15 11:16:52,002 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2024-09-15 11:16:57,153 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.037e+02 2.171e+02 2.368e+02 3.330e+02, threshold=4.342e+02, percent-clipped=0.0 2024-09-15 11:17:17,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=278267.3333333333, ans=0.125 2024-09-15 11:17:17,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=278267.3333333333, ans=0.0 2024-09-15 11:17:18,525 INFO [train.py:1198] (0/2) Epoch 16, batch 2350, loss[loss=0.2962, ctc_loss=0.2172, cr_loss=0.3948, over 14345.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.169, cr_loss=0.3872, over 4082948.18 frames. ], batch size: 149, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:17:38,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=278295.6666666667, ans=0.2 2024-09-15 11:18:28,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=22.5 2024-09-15 11:18:33,298 INFO [train.py:1198] (0/2) Epoch 16, batch 2400, loss[loss=0.2617, ctc_loss=0.1797, cr_loss=0.4102, over 20700.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1689, cr_loss=0.387, over 4092387.90 frames. ], batch size: 68, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:18:48,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=278437.3333333333, ans=0.025 2024-09-15 11:19:06,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=278465.6666666667, ans=0.125 2024-09-15 11:19:30,342 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.054e+02 2.170e+02 2.281e+02 3.481e+02, threshold=4.340e+02, percent-clipped=0.0 2024-09-15 11:19:51,608 INFO [train.py:1198] (0/2) Epoch 16, batch 2450, loss[loss=0.246, ctc_loss=0.1698, cr_loss=0.3811, over 21062.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1688, cr_loss=0.3869, over 4094008.46 frames. ], batch size: 59, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:20:02,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=278550.6666666667, ans=0.125 2024-09-15 11:20:14,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=278579.0, ans=0.0 2024-09-15 11:20:22,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=278607.3333333333, ans=0.125 2024-09-15 11:20:35,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=278635.6666666667, ans=0.0 2024-09-15 11:20:46,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=278635.6666666667, ans=0.2 2024-09-15 11:20:49,647 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 11:20:52,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=278664.0, ans=0.05 2024-09-15 11:21:03,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=278664.0, ans=0.125 2024-09-15 11:21:07,631 INFO [train.py:1198] (0/2) Epoch 16, batch 2500, loss[loss=0.2703, ctc_loss=0.1866, cr_loss=0.4185, over 18292.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1686, cr_loss=0.3862, over 4088035.28 frames. ], batch size: 108, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:21:49,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=278749.0, ans=0.0 2024-09-15 11:21:55,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=278777.3333333333, ans=0.07 2024-09-15 11:22:01,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=22.5 2024-09-15 11:22:03,998 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.114e+02 2.246e+02 2.453e+02 4.101e+02, threshold=4.493e+02, percent-clipped=0.0 2024-09-15 11:22:20,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=278805.6666666667, ans=15.0 2024-09-15 11:22:25,247 INFO [train.py:1198] (0/2) Epoch 16, batch 2550, loss[loss=0.3014, ctc_loss=0.2117, cr_loss=0.4488, over 18103.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1696, cr_loss=0.3874, over 4086352.67 frames. ], batch size: 108, lr: 5.26e-03, grad_scale: 32.0 2024-09-15 11:22:28,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-09-15 11:22:43,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.69 vs. limit=5.0 2024-09-15 11:23:15,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=278919.0, ans=0.2 2024-09-15 11:23:39,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=278975.6666666667, ans=0.0 2024-09-15 11:23:40,307 INFO [train.py:1198] (0/2) Epoch 16, batch 2600, loss[loss=0.2419, ctc_loss=0.1679, cr_loss=0.3701, over 20979.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1699, cr_loss=0.3874, over 4091717.70 frames. ], batch size: 58, lr: 5.25e-03, grad_scale: 32.0 2024-09-15 11:23:57,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2024-09-15 11:24:06,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279004.0, ans=0.1 2024-09-15 11:24:12,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279032.3333333333, ans=0.1 2024-09-15 11:24:24,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=279060.6666666667, ans=0.0 2024-09-15 11:24:34,894 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.072e+02 2.242e+02 2.418e+02 3.314e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-15 11:24:47,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=279089.0, ans=0.125 2024-09-15 11:24:56,227 INFO [train.py:1198] (0/2) Epoch 16, batch 2650, loss[loss=0.2162, ctc_loss=0.1441, cr_loss=0.3605, over 20967.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1695, cr_loss=0.387, over 4099844.85 frames. ], batch size: 51, lr: 5.25e-03, grad_scale: 32.0 2024-09-15 11:25:21,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=279145.6666666667, ans=0.125 2024-09-15 11:25:33,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=279174.0, ans=0.125 2024-09-15 11:25:41,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=279174.0, ans=0.0 2024-09-15 11:25:53,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=279202.3333333333, ans=0.0 2024-09-15 11:26:14,114 INFO [train.py:1198] (0/2) Epoch 16, batch 2700, loss[loss=0.2516, ctc_loss=0.1727, cr_loss=0.3949, over 20948.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.169, cr_loss=0.3861, over 4102216.63 frames. ], batch size: 60, lr: 5.25e-03, grad_scale: 32.0 2024-09-15 11:26:48,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=279315.6666666667, ans=0.0 2024-09-15 11:27:07,885 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.038e+02 2.202e+02 2.328e+02 3.031e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-15 11:27:14,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.00 vs. limit=6.0 2024-09-15 11:27:21,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=279372.3333333333, ans=0.0 2024-09-15 11:27:27,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=279372.3333333333, ans=0.025 2024-09-15 11:27:31,967 INFO [train.py:1198] (0/2) Epoch 16, batch 2750, loss[loss=0.224, ctc_loss=0.1531, cr_loss=0.3544, over 21016.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1688, cr_loss=0.3853, over 4100148.65 frames. ], batch size: 52, lr: 5.25e-03, grad_scale: 32.0 2024-09-15 11:28:05,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.80 vs. limit=12.0 2024-09-15 11:28:46,890 INFO [train.py:1198] (0/2) Epoch 16, batch 2800, loss[loss=0.2142, ctc_loss=0.1445, cr_loss=0.3487, over 21062.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1687, cr_loss=0.3851, over 4091092.72 frames. ], batch size: 53, lr: 5.25e-03, grad_scale: 32.0 2024-09-15 11:28:52,283 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.47 vs. limit=10.0 2024-09-15 11:29:02,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2024-09-15 11:29:20,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=279599.0, ans=0.015 2024-09-15 11:29:33,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=279627.3333333333, ans=0.125 2024-09-15 11:29:42,473 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.064e+02 2.174e+02 2.612e+02 9.987e+02, threshold=4.347e+02, percent-clipped=2.0 2024-09-15 11:29:44,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=279627.3333333333, ans=0.125 2024-09-15 11:29:51,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=279655.6666666667, ans=0.2 2024-09-15 11:30:01,850 INFO [train.py:1198] (0/2) Epoch 16, batch 2850, loss[loss=0.2449, ctc_loss=0.164, cr_loss=0.4046, over 20773.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1692, cr_loss=0.3863, over 4077337.59 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 16.0 2024-09-15 11:30:27,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=279712.3333333333, ans=0.0 2024-09-15 11:30:30,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=279740.6666666667, ans=0.07 2024-09-15 11:30:49,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=279769.0, ans=0.07 2024-09-15 11:31:02,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=279769.0, ans=0.1 2024-09-15 11:31:19,937 INFO [train.py:1198] (0/2) Epoch 16, batch 2900, loss[loss=0.2447, ctc_loss=0.1717, cr_loss=0.3654, over 20811.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1697, cr_loss=0.3875, over 4077933.52 frames. ], batch size: 65, lr: 5.25e-03, grad_scale: 16.0 2024-09-15 11:31:44,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=279854.0, ans=0.125 2024-09-15 11:31:50,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=279882.3333333333, ans=0.0 2024-09-15 11:31:51,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279882.3333333333, ans=0.1 2024-09-15 11:31:54,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=279882.3333333333, ans=0.125 2024-09-15 11:32:05,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=279910.6666666667, ans=15.0 2024-09-15 11:32:15,545 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.089e+02 2.192e+02 2.388e+02 7.769e+02, threshold=4.385e+02, percent-clipped=0.0 2024-09-15 11:32:19,548 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-15 11:32:26,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=279939.0, ans=0.0 2024-09-15 11:32:31,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279939.0, ans=0.1 2024-09-15 11:32:35,351 INFO [train.py:1198] (0/2) Epoch 16, batch 2950, loss[loss=0.2148, ctc_loss=0.1483, cr_loss=0.3325, over 20353.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1693, cr_loss=0.3867, over 4079838.60 frames. ], batch size: 45, lr: 5.25e-03, grad_scale: 16.0 2024-09-15 11:32:38,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=279967.3333333333, ans=0.025 2024-09-15 11:33:02,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=279995.6666666667, ans=0.125 2024-09-15 11:33:05,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=279995.6666666667, ans=0.0 2024-09-15 11:33:29,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=280052.3333333333, ans=0.1 2024-09-15 11:33:53,898 INFO [train.py:1198] (0/2) Epoch 16, batch 3000, loss[loss=0.2637, ctc_loss=0.1836, cr_loss=0.4006, over 19331.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.169, cr_loss=0.3862, over 4080195.58 frames. ], batch size: 90, lr: 5.24e-03, grad_scale: 16.0 2024-09-15 11:33:53,899 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 11:34:15,819 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.3240, 4.9348, 3.7544, 4.3662], device='cuda:0') 2024-09-15 11:34:20,145 INFO [train.py:1230] (0/2) Epoch 16, validation: loss=0.04642, ctc_loss=0.04642, cr_loss=1.036e-14, over 944034.00 frames. 2024-09-15 11:34:20,146 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 11:34:25,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-15 11:34:44,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=280137.3333333333, ans=0.125 2024-09-15 11:34:52,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=280165.6666666667, ans=0.025 2024-09-15 11:35:16,325 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.055e+02 2.187e+02 2.347e+02 3.238e+02, threshold=4.373e+02, percent-clipped=1.0 2024-09-15 11:35:30,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=280222.3333333333, ans=0.07 2024-09-15 11:35:36,197 INFO [train.py:1198] (0/2) Epoch 16, batch 3050, loss[loss=0.2347, ctc_loss=0.1582, cr_loss=0.3823, over 21059.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1688, cr_loss=0.386, over 4079043.56 frames. ], batch size: 53, lr: 5.24e-03, grad_scale: 16.0 2024-09-15 11:35:48,511 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 11:36:07,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=280307.3333333333, ans=0.125 2024-09-15 11:36:10,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=280307.3333333333, ans=0.0 2024-09-15 11:36:50,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.44 vs. limit=15.0 2024-09-15 11:36:53,871 INFO [train.py:1198] (0/2) Epoch 16, batch 3100, loss[loss=0.2552, ctc_loss=0.1758, cr_loss=0.3971, over 20983.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1692, cr_loss=0.3869, over 4079586.55 frames. ], batch size: 64, lr: 5.24e-03, grad_scale: 16.0 2024-09-15 11:37:10,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=280420.6666666667, ans=0.125 2024-09-15 11:37:16,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=280420.6666666667, ans=0.2 2024-09-15 11:37:27,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=280449.0, ans=0.2 2024-09-15 11:37:31,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=280449.0, ans=0.2 2024-09-15 11:37:49,571 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.067e+02 2.238e+02 2.486e+02 3.935e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-15 11:38:09,360 INFO [train.py:1198] (0/2) Epoch 16, batch 3150, loss[loss=0.2365, ctc_loss=0.1651, cr_loss=0.3569, over 20995.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1693, cr_loss=0.387, over 4082494.96 frames. ], batch size: 61, lr: 5.24e-03, grad_scale: 16.0 2024-09-15 11:38:12,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=280534.0, ans=0.0 2024-09-15 11:38:15,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=280534.0, ans=0.125 2024-09-15 11:38:32,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=280562.3333333333, ans=0.125 2024-09-15 11:39:26,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=280675.6666666667, ans=0.07 2024-09-15 11:39:27,311 INFO [train.py:1198] (0/2) Epoch 16, batch 3200, loss[loss=0.2085, ctc_loss=0.1422, cr_loss=0.3318, over 20958.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1692, cr_loss=0.387, over 4079291.99 frames. ], batch size: 49, lr: 5.24e-03, grad_scale: 32.0 2024-09-15 11:39:28,064 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-15 11:39:51,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=280704.0, ans=0.2 2024-09-15 11:40:23,309 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.086e+02 2.212e+02 2.432e+02 6.840e+02, threshold=4.425e+02, percent-clipped=1.0 2024-09-15 11:40:32,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=280789.0, ans=0.125 2024-09-15 11:40:43,326 INFO [train.py:1198] (0/2) Epoch 16, batch 3250, loss[loss=0.2672, ctc_loss=0.1821, cr_loss=0.4254, over 20019.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1697, cr_loss=0.3877, over 4079522.02 frames. ], batch size: 80, lr: 5.24e-03, grad_scale: 32.0 2024-09-15 11:41:55,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=280930.6666666667, ans=0.125 2024-09-15 11:41:59,168 INFO [train.py:1198] (0/2) Epoch 16, batch 3300, loss[loss=0.2547, ctc_loss=0.1765, cr_loss=0.3906, over 20665.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1716, cr_loss=0.3907, over 4076838.22 frames. ], batch size: 68, lr: 5.24e-03, grad_scale: 32.0 2024-09-15 11:42:32,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=281015.6666666667, ans=0.125 2024-09-15 11:42:35,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=281015.6666666667, ans=0.0 2024-09-15 11:42:43,488 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.37 vs. limit=10.0 2024-09-15 11:42:49,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281044.0, ans=0.1 2024-09-15 11:42:50,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=281044.0, ans=0.125 2024-09-15 11:42:58,074 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.080e+02 2.243e+02 2.473e+02 3.891e+02, threshold=4.487e+02, percent-clipped=0.0 2024-09-15 11:43:17,762 INFO [train.py:1198] (0/2) Epoch 16, batch 3350, loss[loss=0.2263, ctc_loss=0.1555, cr_loss=0.3538, over 20888.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1705, cr_loss=0.3889, over 4084189.45 frames. ], batch size: 54, lr: 5.24e-03, grad_scale: 32.0 2024-09-15 11:43:37,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=281129.0, ans=0.5 2024-09-15 11:44:27,367 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 11:44:36,158 INFO [train.py:1198] (0/2) Epoch 16, batch 3400, loss[loss=0.2351, ctc_loss=0.1599, cr_loss=0.376, over 21071.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1701, cr_loss=0.3881, over 4073060.71 frames. ], batch size: 53, lr: 5.23e-03, grad_scale: 16.0 2024-09-15 11:44:47,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2024-09-15 11:45:05,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=281299.0, ans=0.125 2024-09-15 11:45:33,350 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.672e+02 2.022e+02 2.135e+02 2.409e+02 4.243e+02, threshold=4.269e+02, percent-clipped=0.0 2024-09-15 11:45:48,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=281355.6666666667, ans=0.125 2024-09-15 11:45:51,641 INFO [train.py:1198] (0/2) Epoch 16, batch 3450, loss[loss=0.2446, ctc_loss=0.1679, cr_loss=0.3831, over 21065.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.169, cr_loss=0.386, over 4073612.23 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 16.0 2024-09-15 11:46:06,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=281412.3333333333, ans=0.0 2024-09-15 11:46:11,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=281412.3333333333, ans=0.125 2024-09-15 11:46:46,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=281469.0, ans=0.125 2024-09-15 11:47:06,503 INFO [train.py:1198] (0/2) Epoch 16, batch 3500, loss[loss=0.245, ctc_loss=0.167, cr_loss=0.3901, over 20882.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1694, cr_loss=0.3869, over 4086346.23 frames. ], batch size: 57, lr: 5.23e-03, grad_scale: 16.0 2024-09-15 11:47:24,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=281554.0, ans=0.2 2024-09-15 11:47:34,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=281554.0, ans=0.125 2024-09-15 11:48:06,731 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.022e+02 2.139e+02 2.292e+02 3.021e+02, threshold=4.278e+02, percent-clipped=0.0 2024-09-15 11:48:07,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=281610.6666666667, ans=0.125 2024-09-15 11:48:24,688 INFO [train.py:1198] (0/2) Epoch 16, batch 3550, loss[loss=0.2599, ctc_loss=0.178, cr_loss=0.4096, over 20286.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1693, cr_loss=0.3872, over 4074112.76 frames. ], batch size: 74, lr: 5.23e-03, grad_scale: 16.0 2024-09-15 11:48:29,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=281667.3333333333, ans=0.95 2024-09-15 11:49:07,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=281724.0, ans=0.0 2024-09-15 11:49:37,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281780.6666666667, ans=0.1 2024-09-15 11:49:39,808 INFO [train.py:1198] (0/2) Epoch 16, batch 3600, loss[loss=0.2767, ctc_loss=0.193, cr_loss=0.4188, over 21024.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1693, cr_loss=0.3874, over 4080092.98 frames. ], batch size: 63, lr: 5.23e-03, grad_scale: 32.0 2024-09-15 11:49:49,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=281809.0, ans=0.5 2024-09-15 11:50:04,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=281837.3333333333, ans=0.2 2024-09-15 11:50:14,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281865.6666666667, ans=0.1 2024-09-15 11:50:24,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=281865.6666666667, ans=0.0 2024-09-15 11:50:39,626 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 2.090e+02 2.191e+02 2.448e+02 6.893e+02, threshold=4.383e+02, percent-clipped=1.0 2024-09-15 11:50:56,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=281950.6666666667, ans=0.02 2024-09-15 11:50:57,519 INFO [train.py:1198] (0/2) Epoch 16, batch 3650, loss[loss=0.2373, ctc_loss=0.1611, cr_loss=0.3811, over 20795.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1688, cr_loss=0.3869, over 4084524.38 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 32.0 2024-09-15 11:52:12,904 INFO [train.py:1198] (0/2) Epoch 16, batch 3700, loss[loss=0.2551, ctc_loss=0.1771, cr_loss=0.3903, over 19388.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1678, cr_loss=0.3848, over 4093590.06 frames. ], batch size: 90, lr: 5.23e-03, grad_scale: 32.0 2024-09-15 11:52:30,518 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2024-09-15 11:52:39,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=282120.6666666667, ans=0.125 2024-09-15 11:52:51,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=282149.0, ans=0.125 2024-09-15 11:53:02,788 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=22.5 2024-09-15 11:53:10,814 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.077e+02 2.266e+02 2.521e+02 3.872e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-15 11:53:20,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=282205.6666666667, ans=0.0 2024-09-15 11:53:31,736 INFO [train.py:1198] (0/2) Epoch 16, batch 3750, loss[loss=0.2441, ctc_loss=0.169, cr_loss=0.3755, over 20641.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1672, cr_loss=0.384, over 4091361.73 frames. ], batch size: 68, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 11:53:45,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=282262.3333333333, ans=0.125 2024-09-15 11:53:47,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=282262.3333333333, ans=0.125 2024-09-15 11:53:57,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=282262.3333333333, ans=0.125 2024-09-15 11:54:18,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=282319.0, ans=0.025 2024-09-15 11:54:25,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.30 vs. limit=22.5 2024-09-15 11:54:34,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=282347.3333333333, ans=0.025 2024-09-15 11:54:35,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=282347.3333333333, ans=0.125 2024-09-15 11:54:47,127 INFO [train.py:1198] (0/2) Epoch 16, batch 3800, loss[loss=0.2686, ctc_loss=0.1855, cr_loss=0.4158, over 20651.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.168, cr_loss=0.3847, over 4090586.21 frames. ], batch size: 66, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 11:54:56,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=282375.6666666667, ans=0.125 2024-09-15 11:55:25,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=282432.3333333333, ans=22.5 2024-09-15 11:55:34,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=282460.6666666667, ans=0.125 2024-09-15 11:55:47,378 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.071e+02 2.197e+02 2.383e+02 3.469e+02, threshold=4.394e+02, percent-clipped=0.0 2024-09-15 11:55:52,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=282489.0, ans=0.1 2024-09-15 11:56:02,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=282489.0, ans=0.1 2024-09-15 11:56:05,314 INFO [train.py:1198] (0/2) Epoch 16, batch 3850, loss[loss=0.2116, ctc_loss=0.1427, cr_loss=0.3445, over 20251.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1678, cr_loss=0.3846, over 4102687.59 frames. ], batch size: 45, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 11:56:14,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=282517.3333333333, ans=0.125 2024-09-15 11:56:39,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-09-15 11:56:44,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-09-15 11:57:15,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=282630.6666666667, ans=0.125 2024-09-15 11:57:20,666 INFO [train.py:1198] (0/2) Epoch 16, batch 3900, loss[loss=0.2754, ctc_loss=0.1948, cr_loss=0.403, over 20323.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1673, cr_loss=0.3847, over 4110634.96 frames. ], batch size: 74, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 11:58:09,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-15 11:58:15,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=282744.0, ans=0.1 2024-09-15 11:58:18,112 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.682e+02 2.087e+02 2.211e+02 2.371e+02 3.175e+02, threshold=4.423e+02, percent-clipped=0.0 2024-09-15 11:58:30,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=22.5 2024-09-15 11:58:33,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=282772.3333333333, ans=0.0 2024-09-15 11:58:36,143 INFO [train.py:1198] (0/2) Epoch 16, batch 3950, loss[loss=0.2239, ctc_loss=0.1547, cr_loss=0.3457, over 20881.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1672, cr_loss=0.3841, over 4108145.18 frames. ], batch size: 54, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 11:58:53,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-09-15 11:59:14,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=282857.3333333333, ans=0.0 2024-09-15 11:59:32,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=282885.6666666667, ans=0.125 2024-09-15 11:59:33,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=282885.6666666667, ans=0.0 2024-09-15 11:59:40,127 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=15.0 2024-09-15 11:59:54,476 INFO [train.py:1198] (0/2) Epoch 16, batch 4000, loss[loss=0.2694, ctc_loss=0.1836, cr_loss=0.4292, over 19451.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1669, cr_loss=0.3843, over 4114882.98 frames. ], batch size: 90, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 12:00:32,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=282999.0, ans=15.0 2024-09-15 12:00:39,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=283027.3333333333, ans=0.2 2024-09-15 12:00:44,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2024-09-15 12:00:51,511 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.048e+02 2.185e+02 2.313e+02 3.020e+02, threshold=4.371e+02, percent-clipped=0.0 2024-09-15 12:01:02,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=283055.6666666667, ans=0.5 2024-09-15 12:01:09,764 INFO [train.py:1198] (0/2) Epoch 16, batch 4050, loss[loss=0.2421, ctc_loss=0.1672, cr_loss=0.3743, over 21065.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1666, cr_loss=0.3845, over 4116183.04 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 12:01:17,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=283084.0, ans=0.125 2024-09-15 12:01:19,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-15 12:01:56,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=283169.0, ans=0.125 2024-09-15 12:02:28,566 INFO [train.py:1198] (0/2) Epoch 16, batch 4100, loss[loss=0.2571, ctc_loss=0.1736, cr_loss=0.4175, over 20970.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1663, cr_loss=0.3842, over 4105006.68 frames. ], batch size: 64, lr: 5.22e-03, grad_scale: 32.0 2024-09-15 12:02:44,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-15 12:03:21,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=283310.6666666667, ans=0.0 2024-09-15 12:03:24,227 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-100000.pt 2024-09-15 12:03:26,523 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.020e+02 2.185e+02 2.399e+02 4.031e+02, threshold=4.371e+02, percent-clipped=0.0 2024-09-15 12:03:44,670 INFO [train.py:1198] (0/2) Epoch 16, batch 4150, loss[loss=0.2723, ctc_loss=0.1889, cr_loss=0.4172, over 20330.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.167, cr_loss=0.3846, over 4098640.02 frames. ], batch size: 74, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:04:04,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=283395.6666666667, ans=0.025 2024-09-15 12:04:34,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=283452.3333333333, ans=0.0 2024-09-15 12:04:44,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283480.6666666667, ans=0.1 2024-09-15 12:04:50,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=283480.6666666667, ans=0.0 2024-09-15 12:05:03,755 INFO [train.py:1198] (0/2) Epoch 16, batch 4200, loss[loss=0.2156, ctc_loss=0.1456, cr_loss=0.3499, over 20964.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1671, cr_loss=0.3849, over 4099972.01 frames. ], batch size: 48, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:05:16,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=283509.0, ans=0.125 2024-09-15 12:05:31,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=283537.3333333333, ans=0.125 2024-09-15 12:05:47,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=283594.0, ans=0.2 2024-09-15 12:05:55,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=283594.0, ans=0.0 2024-09-15 12:06:01,456 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.043e+02 2.182e+02 2.317e+02 3.070e+02, threshold=4.364e+02, percent-clipped=0.0 2024-09-15 12:06:19,170 INFO [train.py:1198] (0/2) Epoch 16, batch 4250, loss[loss=0.2111, ctc_loss=0.1424, cr_loss=0.3438, over 21045.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.168, cr_loss=0.3864, over 4095451.76 frames. ], batch size: 56, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:06:42,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=283679.0, ans=0.125 2024-09-15 12:06:42,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=283679.0, ans=0.0 2024-09-15 12:06:45,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=283679.0, ans=0.2 2024-09-15 12:06:50,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=283707.3333333333, ans=0.125 2024-09-15 12:06:52,017 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:06:54,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=283707.3333333333, ans=0.125 2024-09-15 12:06:54,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=283707.3333333333, ans=0.125 2024-09-15 12:07:08,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=283735.6666666667, ans=0.05 2024-09-15 12:07:34,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=283764.0, ans=0.0 2024-09-15 12:07:38,281 INFO [train.py:1198] (0/2) Epoch 16, batch 4300, loss[loss=0.2351, ctc_loss=0.1597, cr_loss=0.3767, over 20963.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1674, cr_loss=0.3859, over 4103790.33 frames. ], batch size: 58, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:07:38,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283792.3333333333, ans=0.1 2024-09-15 12:07:41,588 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:07:50,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=283792.3333333333, ans=0.125 2024-09-15 12:08:36,130 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.016e+02 2.106e+02 2.323e+02 3.174e+02, threshold=4.213e+02, percent-clipped=0.0 2024-09-15 12:08:36,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=283877.3333333333, ans=0.125 2024-09-15 12:08:42,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=283905.6666666667, ans=0.0 2024-09-15 12:08:53,884 INFO [train.py:1198] (0/2) Epoch 16, batch 4350, loss[loss=0.2479, ctc_loss=0.1663, cr_loss=0.4077, over 20794.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1658, cr_loss=0.3837, over 4113542.17 frames. ], batch size: 56, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:09:00,527 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:09:03,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=283934.0, ans=0.0 2024-09-15 12:09:09,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=283962.3333333333, ans=0.125 2024-09-15 12:09:10,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=283962.3333333333, ans=0.2 2024-09-15 12:09:14,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=283962.3333333333, ans=0.0 2024-09-15 12:09:37,189 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=22.5 2024-09-15 12:09:44,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=284019.0, ans=0.025 2024-09-15 12:09:45,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-09-15 12:09:49,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=284019.0, ans=0.125 2024-09-15 12:10:09,235 INFO [train.py:1198] (0/2) Epoch 16, batch 4400, loss[loss=0.2383, ctc_loss=0.1629, cr_loss=0.377, over 20781.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1665, cr_loss=0.385, over 4117319.60 frames. ], batch size: 56, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:10:20,592 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2024-09-15 12:10:28,102 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=22.5 2024-09-15 12:11:03,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=284160.6666666667, ans=0.125 2024-09-15 12:11:09,325 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.015e+02 2.169e+02 2.418e+02 3.055e+02, threshold=4.338e+02, percent-clipped=0.0 2024-09-15 12:11:12,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=284189.0, ans=0.025 2024-09-15 12:11:27,270 INFO [train.py:1198] (0/2) Epoch 16, batch 4450, loss[loss=0.2609, ctc_loss=0.1783, cr_loss=0.4127, over 20966.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1664, cr_loss=0.385, over 4113281.60 frames. ], batch size: 64, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:11:43,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=284245.6666666667, ans=0.0 2024-09-15 12:11:48,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=284245.6666666667, ans=0.0 2024-09-15 12:12:16,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=284302.3333333333, ans=0.125 2024-09-15 12:12:24,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=284302.3333333333, ans=0.2 2024-09-15 12:12:41,892 INFO [train.py:1198] (0/2) Epoch 16, batch 4500, loss[loss=0.2459, ctc_loss=0.1698, cr_loss=0.3804, over 20903.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.167, cr_loss=0.3852, over 4108718.46 frames. ], batch size: 54, lr: 5.21e-03, grad_scale: 32.0 2024-09-15 12:12:46,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=284359.0, ans=0.125 2024-09-15 12:12:57,634 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:13:05,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=12.0 2024-09-15 12:13:24,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284415.6666666667, ans=0.1 2024-09-15 12:13:29,216 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:13:42,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.048e+02 2.234e+02 2.459e+02 4.069e+02, threshold=4.467e+02, percent-clipped=0.0 2024-09-15 12:14:00,737 INFO [train.py:1198] (0/2) Epoch 16, batch 4550, loss[loss=0.2644, ctc_loss=0.1834, cr_loss=0.4051, over 20293.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1671, cr_loss=0.3852, over 4100161.00 frames. ], batch size: 74, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:14:08,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284500.6666666667, ans=0.1 2024-09-15 12:14:25,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=284529.0, ans=0.025 2024-09-15 12:14:57,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=284585.6666666667, ans=0.125 2024-09-15 12:15:16,409 INFO [train.py:1198] (0/2) Epoch 16, batch 4600, loss[loss=0.2607, ctc_loss=0.178, cr_loss=0.4138, over 21078.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1684, cr_loss=0.3876, over 4104904.17 frames. ], batch size: 59, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:15:37,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=284670.6666666667, ans=0.125 2024-09-15 12:16:03,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=284727.3333333333, ans=0.125 2024-09-15 12:16:15,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=284727.3333333333, ans=0.2 2024-09-15 12:16:16,191 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.086e+02 2.286e+02 2.593e+02 4.063e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-15 12:16:34,392 INFO [train.py:1198] (0/2) Epoch 16, batch 4650, loss[loss=0.2569, ctc_loss=0.1742, cr_loss=0.4134, over 20696.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1686, cr_loss=0.3881, over 4099434.51 frames. ], batch size: 68, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:16:37,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284784.0, ans=0.1 2024-09-15 12:17:08,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=284840.6666666667, ans=0.125 2024-09-15 12:17:47,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=284897.3333333333, ans=0.125 2024-09-15 12:17:49,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=284925.6666666667, ans=0.125 2024-09-15 12:17:50,392 INFO [train.py:1198] (0/2) Epoch 16, batch 4700, loss[loss=0.2267, ctc_loss=0.1527, cr_loss=0.3703, over 20963.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1694, cr_loss=0.389, over 4075506.75 frames. ], batch size: 51, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:18:06,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-09-15 12:18:26,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2024-09-15 12:18:29,266 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-09-15 12:18:30,347 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:18:36,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=285010.6666666667, ans=0.2 2024-09-15 12:18:41,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=15.0 2024-09-15 12:18:48,393 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.074e+02 2.241e+02 2.500e+02 5.089e+02, threshold=4.482e+02, percent-clipped=1.0 2024-09-15 12:18:50,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=285039.0, ans=0.125 2024-09-15 12:19:07,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=285067.3333333333, ans=0.0 2024-09-15 12:19:09,025 INFO [train.py:1198] (0/2) Epoch 16, batch 4750, loss[loss=0.2277, ctc_loss=0.1528, cr_loss=0.3746, over 20931.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1683, cr_loss=0.3868, over 4094246.47 frames. ], batch size: 48, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:19:15,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=285067.3333333333, ans=0.0 2024-09-15 12:19:20,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-15 12:19:42,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=285124.0, ans=0.035 2024-09-15 12:19:57,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=22.5 2024-09-15 12:20:24,312 INFO [train.py:1198] (0/2) Epoch 16, batch 4800, loss[loss=0.2196, ctc_loss=0.1493, cr_loss=0.3515, over 21073.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1681, cr_loss=0.3862, over 4093457.51 frames. ], batch size: 53, lr: 5.20e-03, grad_scale: 32.0 2024-09-15 12:21:21,282 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.095e+02 2.225e+02 2.467e+02 3.509e+02, threshold=4.449e+02, percent-clipped=0.0 2024-09-15 12:21:24,667 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:21:24,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=285322.3333333333, ans=0.125 2024-09-15 12:21:28,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=285322.3333333333, ans=0.125 2024-09-15 12:21:39,093 INFO [train.py:1198] (0/2) Epoch 16, batch 4850, loss[loss=0.246, ctc_loss=0.173, cr_loss=0.365, over 19352.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1677, cr_loss=0.3859, over 4101451.55 frames. ], batch size: 90, lr: 5.20e-03, grad_scale: 16.0 2024-09-15 12:22:10,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=285407.3333333333, ans=0.1 2024-09-15 12:22:57,596 INFO [train.py:1198] (0/2) Epoch 16, batch 4900, loss[loss=0.2013, ctc_loss=0.1346, cr_loss=0.3336, over 20947.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1679, cr_loss=0.3859, over 4086116.21 frames. ], batch size: 50, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:23:08,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=285492.3333333333, ans=0.125 2024-09-15 12:23:25,625 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-09-15 12:23:26,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=285549.0, ans=0.2 2024-09-15 12:23:56,168 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.062e+02 2.171e+02 2.368e+02 4.130e+02, threshold=4.342e+02, percent-clipped=0.0 2024-09-15 12:24:12,400 INFO [train.py:1198] (0/2) Epoch 16, batch 4950, loss[loss=0.2085, ctc_loss=0.1396, cr_loss=0.3444, over 19831.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1679, cr_loss=0.385, over 4069667.17 frames. ], batch size: 44, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:25:27,108 INFO [train.py:1198] (0/2) Epoch 16, batch 5000, loss[loss=0.2185, ctc_loss=0.1478, cr_loss=0.3534, over 20962.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1682, cr_loss=0.386, over 4074529.93 frames. ], batch size: 50, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:25:50,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=285804.0, ans=0.0 2024-09-15 12:26:19,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=285860.6666666667, ans=0.2 2024-09-15 12:26:20,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=285860.6666666667, ans=0.02 2024-09-15 12:26:27,469 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.031e+02 2.174e+02 2.370e+02 4.597e+02, threshold=4.348e+02, percent-clipped=1.0 2024-09-15 12:26:44,031 INFO [train.py:1198] (0/2) Epoch 16, batch 5050, loss[loss=0.2703, ctc_loss=0.1866, cr_loss=0.4183, over 21011.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.169, cr_loss=0.3873, over 4079817.57 frames. ], batch size: 63, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:26:45,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=285917.3333333333, ans=0.1 2024-09-15 12:27:00,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=285945.6666666667, ans=0.2 2024-09-15 12:27:13,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=285974.0, ans=0.04949747468305833 2024-09-15 12:27:18,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=285974.0, ans=0.025 2024-09-15 12:27:57,811 INFO [train.py:1198] (0/2) Epoch 16, batch 5100, loss[loss=0.2426, ctc_loss=0.1685, cr_loss=0.3707, over 21003.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.169, cr_loss=0.3874, over 4088979.37 frames. ], batch size: 61, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:28:31,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=286115.6666666667, ans=0.0 2024-09-15 12:28:37,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=286115.6666666667, ans=0.125 2024-09-15 12:28:56,181 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.107e+02 2.230e+02 2.385e+02 4.635e+02, threshold=4.460e+02, percent-clipped=2.0 2024-09-15 12:28:59,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.47 vs. limit=10.0 2024-09-15 12:29:12,540 INFO [train.py:1198] (0/2) Epoch 16, batch 5150, loss[loss=0.2711, ctc_loss=0.1863, cr_loss=0.4242, over 20692.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1693, cr_loss=0.3881, over 4100631.33 frames. ], batch size: 68, lr: 5.19e-03, grad_scale: 16.0 2024-09-15 12:29:41,907 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=22.5 2024-09-15 12:30:27,317 INFO [train.py:1198] (0/2) Epoch 16, batch 5200, loss[loss=0.2529, ctc_loss=0.1752, cr_loss=0.3885, over 20359.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1683, cr_loss=0.3866, over 4107509.34 frames. ], batch size: 74, lr: 5.19e-03, grad_scale: 32.0 2024-09-15 12:30:35,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=286342.3333333333, ans=0.125 2024-09-15 12:31:17,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=286427.3333333333, ans=0.0 2024-09-15 12:31:27,525 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.069e+02 2.180e+02 2.413e+02 4.104e+02, threshold=4.359e+02, percent-clipped=0.0 2024-09-15 12:31:44,112 INFO [train.py:1198] (0/2) Epoch 16, batch 5250, loss[loss=0.244, ctc_loss=0.1675, cr_loss=0.3825, over 20728.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1675, cr_loss=0.3859, over 4106724.59 frames. ], batch size: 71, lr: 5.19e-03, grad_scale: 32.0 2024-09-15 12:31:51,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=286484.0, ans=0.0 2024-09-15 12:31:58,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-09-15 12:32:06,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=286512.3333333333, ans=0.0 2024-09-15 12:32:25,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=286540.6666666667, ans=0.0 2024-09-15 12:32:27,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.98 vs. limit=12.0 2024-09-15 12:32:43,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=286597.3333333333, ans=0.0 2024-09-15 12:32:52,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=286597.3333333333, ans=0.0 2024-09-15 12:32:58,242 INFO [train.py:1198] (0/2) Epoch 16, batch 5300, loss[loss=0.2597, ctc_loss=0.173, cr_loss=0.4334, over 20963.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1677, cr_loss=0.3858, over 4088481.97 frames. ], batch size: 58, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:33:08,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=286625.6666666667, ans=0.2 2024-09-15 12:33:20,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=286654.0, ans=0.1 2024-09-15 12:33:24,513 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-15 12:33:36,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=286682.3333333333, ans=0.125 2024-09-15 12:33:41,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=286710.6666666667, ans=0.125 2024-09-15 12:33:56,199 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.073e+02 2.226e+02 2.425e+02 4.322e+02, threshold=4.453e+02, percent-clipped=0.0 2024-09-15 12:34:12,830 INFO [train.py:1198] (0/2) Epoch 16, batch 5350, loss[loss=0.27, ctc_loss=0.1868, cr_loss=0.4163, over 20988.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1675, cr_loss=0.3851, over 4092071.75 frames. ], batch size: 61, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:34:30,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=286795.6666666667, ans=0.125 2024-09-15 12:34:57,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286852.3333333333, ans=0.1 2024-09-15 12:35:00,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=286852.3333333333, ans=0.125 2024-09-15 12:35:10,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=286880.6666666667, ans=0.2 2024-09-15 12:35:26,377 INFO [train.py:1198] (0/2) Epoch 16, batch 5400, loss[loss=0.2624, ctc_loss=0.1843, cr_loss=0.3906, over 19474.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1664, cr_loss=0.3839, over 4096508.68 frames. ], batch size: 90, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:36:00,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286965.6666666667, ans=0.1 2024-09-15 12:36:21,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=22.5 2024-09-15 12:36:22,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=286994.0, ans=0.2 2024-09-15 12:36:26,256 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.051e+02 2.207e+02 2.374e+02 3.301e+02, threshold=4.414e+02, percent-clipped=0.0 2024-09-15 12:36:42,710 INFO [train.py:1198] (0/2) Epoch 16, batch 5450, loss[loss=0.2527, ctc_loss=0.1739, cr_loss=0.394, over 20840.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1673, cr_loss=0.3847, over 4078407.21 frames. ], batch size: 59, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:37:17,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=287107.3333333333, ans=0.0 2024-09-15 12:37:20,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=287107.3333333333, ans=0.125 2024-09-15 12:37:35,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=287135.6666666667, ans=0.125 2024-09-15 12:37:47,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=287164.0, ans=0.5 2024-09-15 12:37:53,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=287164.0, ans=0.125 2024-09-15 12:37:57,210 INFO [train.py:1198] (0/2) Epoch 16, batch 5500, loss[loss=0.2361, ctc_loss=0.1591, cr_loss=0.3852, over 20132.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1673, cr_loss=0.3851, over 4079145.83 frames. ], batch size: 80, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:38:12,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=287220.6666666667, ans=0.0 2024-09-15 12:38:20,814 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-15 12:38:45,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287277.3333333333, ans=0.1 2024-09-15 12:38:55,738 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.109e+02 2.293e+02 2.538e+02 4.168e+02, threshold=4.586e+02, percent-clipped=0.0 2024-09-15 12:39:01,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=287305.6666666667, ans=0.125 2024-09-15 12:39:12,231 INFO [train.py:1198] (0/2) Epoch 16, batch 5550, loss[loss=0.2257, ctc_loss=0.155, cr_loss=0.3539, over 20806.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1671, cr_loss=0.3851, over 4090104.00 frames. ], batch size: 53, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:39:26,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=287362.3333333333, ans=0.0 2024-09-15 12:39:53,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=287390.6666666667, ans=0.0 2024-09-15 12:40:25,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287447.3333333333, ans=0.1 2024-09-15 12:40:28,472 INFO [train.py:1198] (0/2) Epoch 16, batch 5600, loss[loss=0.2373, ctc_loss=0.1627, cr_loss=0.3729, over 21078.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1675, cr_loss=0.3854, over 4085872.04 frames. ], batch size: 56, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:40:42,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=287504.0, ans=0.0 2024-09-15 12:40:56,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.83 vs. limit=10.0 2024-09-15 12:41:07,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=287532.3333333333, ans=0.125 2024-09-15 12:41:25,973 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.132e+02 2.266e+02 2.445e+02 3.727e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-15 12:41:29,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287589.0, ans=0.1 2024-09-15 12:41:42,381 INFO [train.py:1198] (0/2) Epoch 16, batch 5650, loss[loss=0.225, ctc_loss=0.153, cr_loss=0.3599, over 20890.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1683, cr_loss=0.3866, over 4083894.91 frames. ], batch size: 54, lr: 5.18e-03, grad_scale: 32.0 2024-09-15 12:41:49,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=287617.3333333333, ans=0.125 2024-09-15 12:42:29,924 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.558e-03 2024-09-15 12:42:40,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=287730.6666666667, ans=0.125 2024-09-15 12:42:43,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287730.6666666667, ans=0.1 2024-09-15 12:42:44,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=287730.6666666667, ans=0.0 2024-09-15 12:42:46,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=287730.6666666667, ans=0.0 2024-09-15 12:42:53,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=287730.6666666667, ans=0.2 2024-09-15 12:42:56,522 INFO [train.py:1198] (0/2) Epoch 16, batch 5700, loss[loss=0.269, ctc_loss=0.1846, cr_loss=0.4222, over 21009.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1678, cr_loss=0.3855, over 4090683.93 frames. ], batch size: 61, lr: 5.17e-03, grad_scale: 32.0 2024-09-15 12:43:13,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-15 12:43:32,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=287815.6666666667, ans=0.125 2024-09-15 12:43:35,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287815.6666666667, ans=0.1 2024-09-15 12:43:54,632 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.037e+02 2.209e+02 2.367e+02 3.816e+02, threshold=4.418e+02, percent-clipped=0.0 2024-09-15 12:44:02,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=287872.3333333333, ans=0.125 2024-09-15 12:44:10,839 INFO [train.py:1198] (0/2) Epoch 16, batch 5750, loss[loss=0.2386, ctc_loss=0.1606, cr_loss=0.3899, over 20903.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1684, cr_loss=0.3862, over 4080085.07 frames. ], batch size: 54, lr: 5.17e-03, grad_scale: 32.0 2024-09-15 12:44:52,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=287957.3333333333, ans=0.125 2024-09-15 12:45:13,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=288014.0, ans=0.0 2024-09-15 12:45:28,133 INFO [train.py:1198] (0/2) Epoch 16, batch 5800, loss[loss=0.2234, ctc_loss=0.1503, cr_loss=0.3658, over 20989.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1681, cr_loss=0.3862, over 4092306.98 frames. ], batch size: 48, lr: 5.17e-03, grad_scale: 16.0 2024-09-15 12:45:37,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=288042.3333333333, ans=0.125 2024-09-15 12:46:27,999 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.698e+02 2.017e+02 2.163e+02 2.358e+02 4.756e+02, threshold=4.326e+02, percent-clipped=1.0 2024-09-15 12:46:32,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=288155.6666666667, ans=0.125 2024-09-15 12:46:38,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288155.6666666667, ans=0.1 2024-09-15 12:46:43,096 INFO [train.py:1198] (0/2) Epoch 16, batch 5850, loss[loss=0.2477, ctc_loss=0.1695, cr_loss=0.3911, over 21040.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1687, cr_loss=0.3876, over 4099690.65 frames. ], batch size: 63, lr: 5.17e-03, grad_scale: 16.0 2024-09-15 12:46:56,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=288212.3333333333, ans=0.2 2024-09-15 12:47:11,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=288240.6666666667, ans=0.0 2024-09-15 12:47:11,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=288240.6666666667, ans=0.0 2024-09-15 12:47:29,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=288269.0, ans=0.0 2024-09-15 12:47:37,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=288269.0, ans=0.0 2024-09-15 12:47:53,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=288297.3333333333, ans=0.125 2024-09-15 12:47:57,536 INFO [train.py:1198] (0/2) Epoch 16, batch 5900, loss[loss=0.1959, ctc_loss=0.1294, cr_loss=0.3327, over 19940.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1685, cr_loss=0.3876, over 4094548.40 frames. ], batch size: 44, lr: 5.17e-03, grad_scale: 16.0 2024-09-15 12:48:02,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=288325.6666666667, ans=0.125 2024-09-15 12:48:06,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=288325.6666666667, ans=0.125 2024-09-15 12:48:17,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=288354.0, ans=0.125 2024-09-15 12:48:34,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=288382.3333333333, ans=0.0 2024-09-15 12:48:40,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288382.3333333333, ans=0.1 2024-09-15 12:48:57,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=288439.0, ans=0.125 2024-09-15 12:48:59,169 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.687e+02 2.052e+02 2.188e+02 2.422e+02 3.889e+02, threshold=4.376e+02, percent-clipped=0.0 2024-09-15 12:49:05,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288439.0, ans=0.1 2024-09-15 12:49:08,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=288439.0, ans=0.0 2024-09-15 12:49:12,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=288467.3333333333, ans=0.125 2024-09-15 12:49:14,160 INFO [train.py:1198] (0/2) Epoch 16, batch 5950, loss[loss=0.2074, ctc_loss=0.1401, cr_loss=0.3361, over 21010.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1678, cr_loss=0.387, over 4105457.61 frames. ], batch size: 52, lr: 5.17e-03, grad_scale: 16.0 2024-09-15 12:49:30,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=288495.6666666667, ans=0.125 2024-09-15 12:49:39,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=288495.6666666667, ans=0.125 2024-09-15 12:49:55,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=288524.0, ans=0.2 2024-09-15 12:50:00,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=288552.3333333333, ans=0.05 2024-09-15 12:50:11,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=288580.6666666667, ans=0.0 2024-09-15 12:50:16,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=288580.6666666667, ans=0.1 2024-09-15 12:50:19,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.77 vs. limit=15.0 2024-09-15 12:50:28,263 INFO [train.py:1198] (0/2) Epoch 16, batch 6000, loss[loss=0.2537, ctc_loss=0.1739, cr_loss=0.3989, over 21048.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.168, cr_loss=0.3874, over 4107655.35 frames. ], batch size: 53, lr: 5.17e-03, grad_scale: 32.0 2024-09-15 12:50:28,264 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 12:50:52,614 INFO [train.py:1230] (0/2) Epoch 16, validation: loss=0.04619, ctc_loss=0.04619, cr_loss=1.014e-14, over 944034.00 frames. 2024-09-15 12:50:52,614 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 12:51:20,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-09-15 12:51:21,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=288665.6666666667, ans=0.5 2024-09-15 12:51:36,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=288694.0, ans=0.025 2024-09-15 12:51:51,902 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.049e+02 2.226e+02 2.408e+02 3.127e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 12:52:06,477 INFO [train.py:1198] (0/2) Epoch 16, batch 6050, loss[loss=0.2543, ctc_loss=0.1758, cr_loss=0.3922, over 21016.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1688, cr_loss=0.3889, over 4114085.83 frames. ], batch size: 61, lr: 5.17e-03, grad_scale: 32.0 2024-09-15 12:52:24,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=288779.0, ans=0.125 2024-09-15 12:52:24,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=288779.0, ans=10.0 2024-09-15 12:52:34,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-15 12:52:46,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=288807.3333333333, ans=0.04949747468305833 2024-09-15 12:53:15,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=288864.0, ans=0.125 2024-09-15 12:53:22,811 INFO [train.py:1198] (0/2) Epoch 16, batch 6100, loss[loss=0.3002, ctc_loss=0.2183, cr_loss=0.4094, over 14589.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.17, cr_loss=0.3907, over 4097318.07 frames. ], batch size: 150, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:53:23,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=288892.3333333333, ans=0.125 2024-09-15 12:54:22,464 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.023e+02 2.194e+02 2.376e+02 3.598e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-15 12:54:37,065 INFO [train.py:1198] (0/2) Epoch 16, batch 6150, loss[loss=0.2134, ctc_loss=0.1446, cr_loss=0.3443, over 20974.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1705, cr_loss=0.3903, over 4077312.76 frames. ], batch size: 52, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:54:37,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289034.0, ans=0.1 2024-09-15 12:54:43,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=289034.0, ans=0.0 2024-09-15 12:55:00,999 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 12:55:22,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=289119.0, ans=0.0 2024-09-15 12:55:47,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=289147.3333333333, ans=0.1 2024-09-15 12:55:50,304 INFO [train.py:1198] (0/2) Epoch 16, batch 6200, loss[loss=0.283, ctc_loss=0.1983, cr_loss=0.4238, over 21022.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1708, cr_loss=0.3895, over 4052356.42 frames. ], batch size: 62, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:56:05,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=289204.0, ans=0.125 2024-09-15 12:56:07,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-15 12:56:50,265 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.012e+02 2.192e+02 2.343e+02 9.139e+02, threshold=4.384e+02, percent-clipped=1.0 2024-09-15 12:57:04,916 INFO [train.py:1198] (0/2) Epoch 16, batch 6250, loss[loss=0.224, ctc_loss=0.1519, cr_loss=0.3605, over 20784.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1716, cr_loss=0.3897, over 4039359.52 frames. ], batch size: 53, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:57:13,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289317.3333333333, ans=0.1 2024-09-15 12:57:25,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=289345.6666666667, ans=0.0 2024-09-15 12:58:17,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=289459.0, ans=0.05 2024-09-15 12:58:18,562 INFO [train.py:1198] (0/2) Epoch 16, batch 6300, loss[loss=0.2647, ctc_loss=0.1826, cr_loss=0.4106, over 19464.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.173, cr_loss=0.3904, over 4007857.42 frames. ], batch size: 90, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:58:42,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=289487.3333333333, ans=0.125 2024-09-15 12:59:00,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.24 vs. limit=22.5 2024-09-15 12:59:01,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=289544.0, ans=0.125 2024-09-15 12:59:16,042 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.225e+02 2.457e+02 2.678e+02 4.874e+02, threshold=4.914e+02, percent-clipped=1.0 2024-09-15 12:59:26,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=289572.3333333333, ans=0.125 2024-09-15 12:59:30,116 INFO [train.py:1198] (0/2) Epoch 16, batch 6350, loss[loss=0.2672, ctc_loss=0.1894, cr_loss=0.3888, over 14127.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1793, cr_loss=0.3933, over 3799225.72 frames. ], batch size: 149, lr: 5.16e-03, grad_scale: 32.0 2024-09-15 12:59:53,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=289629.0, ans=0.125 2024-09-15 12:59:55,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=289629.0, ans=0.125 2024-09-15 13:00:15,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=289685.6666666667, ans=0.125 2024-09-15 13:00:16,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=289685.6666666667, ans=0.125 2024-09-15 13:00:28,526 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-16.pt 2024-09-15 13:01:18,115 INFO [train.py:1198] (0/2) Epoch 17, batch 0, loss[loss=0.252, ctc_loss=0.1765, cr_loss=0.3775, over 21044.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1765, cr_loss=0.3775, over 21044.00 frames. ], batch size: 62, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:01:18,116 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 13:01:32,568 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3513, 4.1183, 3.1023, 3.9779], device='cuda:0') 2024-09-15 13:01:36,323 INFO [train.py:1230] (0/2) Epoch 17, validation: loss=0.04638, ctc_loss=0.04638, cr_loss=9.883e-15, over 944034.00 frames. 2024-09-15 13:01:36,324 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 13:01:40,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.27 vs. limit=10.0 2024-09-15 13:01:56,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=289745.1666666667, ans=0.125 2024-09-15 13:02:33,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=289801.8333333333, ans=0.0 2024-09-15 13:02:53,268 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.178e+02 2.559e+02 2.883e+02 3.390e+02, threshold=5.119e+02, percent-clipped=0.0 2024-09-15 13:02:54,845 INFO [train.py:1198] (0/2) Epoch 17, batch 50, loss[loss=0.2733, ctc_loss=0.1906, cr_loss=0.4134, over 21081.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1734, cr_loss=0.3887, over 912940.28 frames. ], batch size: 59, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:03:10,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=289886.8333333333, ans=0.0 2024-09-15 13:03:14,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=289886.8333333333, ans=0.0 2024-09-15 13:03:36,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=289915.1666666667, ans=0.125 2024-09-15 13:04:00,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=289971.8333333333, ans=0.2 2024-09-15 13:04:09,981 INFO [train.py:1198] (0/2) Epoch 17, batch 100, loss[loss=0.2164, ctc_loss=0.1457, cr_loss=0.3534, over 20961.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.172, cr_loss=0.3904, over 1612636.91 frames. ], batch size: 50, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:04:16,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-15 13:04:16,611 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-15 13:05:15,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=290113.5, ans=0.0 2024-09-15 13:05:27,371 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.028e+02 2.174e+02 2.312e+02 4.055e+02, threshold=4.348e+02, percent-clipped=0.0 2024-09-15 13:05:27,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=290141.8333333333, ans=0.0 2024-09-15 13:05:28,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-15 13:05:28,937 INFO [train.py:1198] (0/2) Epoch 17, batch 150, loss[loss=0.2305, ctc_loss=0.1537, cr_loss=0.3841, over 20802.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1734, cr_loss=0.3935, over 2153313.08 frames. ], batch size: 53, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:05:39,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=290141.8333333333, ans=0.0 2024-09-15 13:05:49,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-15 13:06:03,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290198.5, ans=0.1 2024-09-15 13:06:11,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=290198.5, ans=0.07 2024-09-15 13:06:33,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=290255.1666666667, ans=0.125 2024-09-15 13:06:43,744 INFO [train.py:1198] (0/2) Epoch 17, batch 200, loss[loss=0.3055, ctc_loss=0.2239, cr_loss=0.408, over 14286.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1708, cr_loss=0.3901, over 2582660.84 frames. ], batch size: 149, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:07:00,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2024-09-15 13:07:04,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=290311.8333333333, ans=0.125 2024-09-15 13:07:18,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=290340.1666666667, ans=0.0 2024-09-15 13:07:33,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=290368.5, ans=0.125 2024-09-15 13:07:53,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=290396.8333333333, ans=0.125 2024-09-15 13:07:56,248 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.025e+02 2.195e+02 2.365e+02 4.015e+02, threshold=4.390e+02, percent-clipped=0.0 2024-09-15 13:07:57,831 INFO [train.py:1198] (0/2) Epoch 17, batch 250, loss[loss=0.211, ctc_loss=0.1446, cr_loss=0.332, over 21085.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1703, cr_loss=0.3894, over 2915730.07 frames. ], batch size: 53, lr: 5.00e-03, grad_scale: 32.0 2024-09-15 13:07:58,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=290425.1666666667, ans=0.2 2024-09-15 13:08:04,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=290425.1666666667, ans=0.2 2024-09-15 13:08:38,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=290481.8333333333, ans=0.125 2024-09-15 13:08:59,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=290538.5, ans=0.125 2024-09-15 13:09:16,235 INFO [train.py:1198] (0/2) Epoch 17, batch 300, loss[loss=0.2718, ctc_loss=0.19, cr_loss=0.4092, over 20016.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1712, cr_loss=0.3893, over 3150513.07 frames. ], batch size: 80, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:09:38,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=290595.1666666667, ans=0.125 2024-09-15 13:09:39,361 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-15 13:09:42,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290595.1666666667, ans=0.1 2024-09-15 13:10:17,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=290680.1666666667, ans=0.0 2024-09-15 13:10:32,664 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.028e+02 2.166e+02 2.390e+02 6.262e+02, threshold=4.332e+02, percent-clipped=1.0 2024-09-15 13:10:34,259 INFO [train.py:1198] (0/2) Epoch 17, batch 350, loss[loss=0.2336, ctc_loss=0.1588, cr_loss=0.3743, over 20977.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1698, cr_loss=0.3885, over 3365893.48 frames. ], batch size: 55, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:10:34,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=290708.5, ans=0.125 2024-09-15 13:10:58,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=290736.8333333333, ans=0.025 2024-09-15 13:11:32,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=290821.8333333333, ans=0.2 2024-09-15 13:11:49,035 INFO [train.py:1198] (0/2) Epoch 17, batch 400, loss[loss=0.2488, ctc_loss=0.17, cr_loss=0.3939, over 21054.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1698, cr_loss=0.3898, over 3533921.48 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:11:52,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=290850.1666666667, ans=0.0 2024-09-15 13:11:55,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=290850.1666666667, ans=0.2 2024-09-15 13:12:31,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=290906.8333333333, ans=0.125 2024-09-15 13:13:03,054 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.699e+02 2.081e+02 2.190e+02 2.341e+02 4.195e+02, threshold=4.381e+02, percent-clipped=0.0 2024-09-15 13:13:04,615 INFO [train.py:1198] (0/2) Epoch 17, batch 450, loss[loss=0.2421, ctc_loss=0.1627, cr_loss=0.3969, over 20861.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1688, cr_loss=0.388, over 3663258.62 frames. ], batch size: 57, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:13:22,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=291020.1666666667, ans=0.125 2024-09-15 13:13:24,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=291020.1666666667, ans=0.125 2024-09-15 13:14:22,412 INFO [train.py:1198] (0/2) Epoch 17, batch 500, loss[loss=0.2482, ctc_loss=0.1728, cr_loss=0.3771, over 21024.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1685, cr_loss=0.3884, over 3758043.61 frames. ], batch size: 62, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:14:48,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=291161.8333333333, ans=0.09899494936611666 2024-09-15 13:14:52,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=291190.1666666667, ans=0.5 2024-09-15 13:15:37,111 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.671e+02 2.007e+02 2.130e+02 2.264e+02 2.791e+02, threshold=4.259e+02, percent-clipped=0.0 2024-09-15 13:15:37,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=291275.1666666667, ans=0.1 2024-09-15 13:15:38,649 INFO [train.py:1198] (0/2) Epoch 17, batch 550, loss[loss=0.2411, ctc_loss=0.1634, cr_loss=0.3884, over 20698.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1675, cr_loss=0.3878, over 3849515.46 frames. ], batch size: 68, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:15:44,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=291275.1666666667, ans=0.0 2024-09-15 13:15:52,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291303.5, ans=0.1 2024-09-15 13:16:14,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=291331.8333333333, ans=0.2 2024-09-15 13:16:43,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=291388.5, ans=0.125 2024-09-15 13:16:56,482 INFO [train.py:1198] (0/2) Epoch 17, batch 600, loss[loss=0.2767, ctc_loss=0.196, cr_loss=0.4031, over 20784.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1673, cr_loss=0.3873, over 3914337.41 frames. ], batch size: 71, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:17:15,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=291445.1666666667, ans=0.0 2024-09-15 13:17:20,678 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.014e-02 2024-09-15 13:17:33,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=291473.5, ans=0.125 2024-09-15 13:17:38,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.37 vs. limit=22.5 2024-09-15 13:18:09,618 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.050e+02 2.229e+02 2.481e+02 3.764e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-15 13:18:11,138 INFO [train.py:1198] (0/2) Epoch 17, batch 650, loss[loss=0.25, ctc_loss=0.1712, cr_loss=0.3942, over 21035.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1672, cr_loss=0.3866, over 3946080.50 frames. ], batch size: 63, lr: 4.99e-03, grad_scale: 32.0 2024-09-15 13:18:14,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=291558.5, ans=0.0 2024-09-15 13:18:20,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=291558.5, ans=0.0 2024-09-15 13:19:02,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291643.5, ans=0.1 2024-09-15 13:19:06,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.44 vs. limit=15.0 2024-09-15 13:19:22,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=291671.8333333333, ans=0.0 2024-09-15 13:19:26,139 INFO [train.py:1198] (0/2) Epoch 17, batch 700, loss[loss=0.2607, ctc_loss=0.1755, cr_loss=0.4257, over 21083.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.167, cr_loss=0.3859, over 3989950.79 frames. ], batch size: 59, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:19:26,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291700.1666666667, ans=0.1 2024-09-15 13:19:32,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=291700.1666666667, ans=0.125 2024-09-15 13:20:08,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=291756.8333333333, ans=0.0 2024-09-15 13:20:42,328 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.063e+02 2.189e+02 2.376e+02 3.729e+02, threshold=4.378e+02, percent-clipped=0.0 2024-09-15 13:20:43,872 INFO [train.py:1198] (0/2) Epoch 17, batch 750, loss[loss=0.241, ctc_loss=0.1626, cr_loss=0.3919, over 20998.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.166, cr_loss=0.3841, over 4026325.50 frames. ], batch size: 55, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:21:17,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=291898.5, ans=0.1 2024-09-15 13:21:18,958 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=12.0 2024-09-15 13:22:02,438 INFO [train.py:1198] (0/2) Epoch 17, batch 800, loss[loss=0.2268, ctc_loss=0.1486, cr_loss=0.3907, over 20787.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.166, cr_loss=0.3845, over 4045293.46 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:22:05,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.14 vs. limit=10.0 2024-09-15 13:22:19,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=292011.8333333333, ans=0.2 2024-09-15 13:22:23,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=292011.8333333333, ans=0.2 2024-09-15 13:23:01,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=292096.8333333333, ans=0.125 2024-09-15 13:23:16,026 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.069e+02 2.163e+02 2.357e+02 5.044e+02, threshold=4.326e+02, percent-clipped=1.0 2024-09-15 13:23:17,492 INFO [train.py:1198] (0/2) Epoch 17, batch 850, loss[loss=0.296, ctc_loss=0.2149, cr_loss=0.4058, over 14641.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1674, cr_loss=0.3865, over 4057292.05 frames. ], batch size: 149, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:23:17,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=292125.1666666667, ans=0.0 2024-09-15 13:23:20,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=292125.1666666667, ans=0.0 2024-09-15 13:24:01,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=292210.1666666667, ans=0.125 2024-09-15 13:24:03,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=292210.1666666667, ans=0.0 2024-09-15 13:24:13,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=292210.1666666667, ans=0.125 2024-09-15 13:24:31,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=292266.8333333333, ans=0.125 2024-09-15 13:24:32,846 INFO [train.py:1198] (0/2) Epoch 17, batch 900, loss[loss=0.2699, ctc_loss=0.1885, cr_loss=0.4067, over 20676.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1683, cr_loss=0.3874, over 4047418.76 frames. ], batch size: 66, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:24:34,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=292266.8333333333, ans=0.025 2024-09-15 13:24:35,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-09-15 13:24:39,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=292266.8333333333, ans=0.125 2024-09-15 13:24:40,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=292266.8333333333, ans=0.1 2024-09-15 13:24:48,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=292295.1666666667, ans=0.125 2024-09-15 13:24:48,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-09-15 13:24:49,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=292295.1666666667, ans=0.0 2024-09-15 13:24:57,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=292295.1666666667, ans=0.0 2024-09-15 13:25:10,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292323.5, ans=0.1 2024-09-15 13:25:21,900 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-09-15 13:25:29,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=292351.8333333333, ans=0.0 2024-09-15 13:25:38,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=292380.1666666667, ans=0.125 2024-09-15 13:25:50,405 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.058e+02 2.193e+02 2.365e+02 3.432e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 13:25:50,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=292408.5, ans=0.125 2024-09-15 13:25:51,827 INFO [train.py:1198] (0/2) Epoch 17, batch 950, loss[loss=0.2494, ctc_loss=0.1693, cr_loss=0.4001, over 20784.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.168, cr_loss=0.3869, over 4057705.48 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:26:01,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.98 vs. limit=10.0 2024-09-15 13:26:04,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292408.5, ans=0.1 2024-09-15 13:26:12,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2024-09-15 13:26:23,616 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:26:33,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=292465.1666666667, ans=0.125 2024-09-15 13:26:52,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.19 vs. limit=22.5 2024-09-15 13:27:06,564 INFO [train.py:1198] (0/2) Epoch 17, batch 1000, loss[loss=0.2533, ctc_loss=0.176, cr_loss=0.3865, over 20972.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1679, cr_loss=0.3874, over 4073754.64 frames. ], batch size: 55, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:27:11,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=292550.1666666667, ans=0.125 2024-09-15 13:27:40,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-09-15 13:28:23,167 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.095e+02 2.212e+02 2.441e+02 3.460e+02, threshold=4.424e+02, percent-clipped=0.0 2024-09-15 13:28:24,690 INFO [train.py:1198] (0/2) Epoch 17, batch 1050, loss[loss=0.2394, ctc_loss=0.1649, cr_loss=0.3721, over 20826.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1678, cr_loss=0.3867, over 4079463.85 frames. ], batch size: 59, lr: 4.98e-03, grad_scale: 32.0 2024-09-15 13:28:42,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.89 vs. limit=10.0 2024-09-15 13:28:58,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=292748.5, ans=0.1 2024-09-15 13:29:13,077 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.22 vs. limit=10.0 2024-09-15 13:29:25,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=292805.1666666667, ans=0.2 2024-09-15 13:29:31,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=292805.1666666667, ans=0.2 2024-09-15 13:29:40,494 INFO [train.py:1198] (0/2) Epoch 17, batch 1100, loss[loss=0.245, ctc_loss=0.1665, cr_loss=0.3926, over 21005.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1681, cr_loss=0.3865, over 4076177.64 frames. ], batch size: 63, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:29:43,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=292833.5, ans=0.125 2024-09-15 13:30:00,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=292861.8333333333, ans=0.125 2024-09-15 13:30:13,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=292890.1666666667, ans=0.125 2024-09-15 13:30:15,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=292890.1666666667, ans=0.0 2024-09-15 13:30:16,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=292890.1666666667, ans=0.125 2024-09-15 13:30:53,636 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.097e+02 2.212e+02 2.363e+02 3.389e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-15 13:30:55,144 INFO [train.py:1198] (0/2) Epoch 17, batch 1150, loss[loss=0.2765, ctc_loss=0.1924, cr_loss=0.4203, over 20668.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.169, cr_loss=0.388, over 4079459.35 frames. ], batch size: 68, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:31:07,410 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:31:08,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=292975.1666666667, ans=0.125 2024-09-15 13:31:09,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=292975.1666666667, ans=0.1 2024-09-15 13:31:20,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293003.5, ans=0.1 2024-09-15 13:32:11,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=293116.8333333333, ans=0.125 2024-09-15 13:32:13,106 INFO [train.py:1198] (0/2) Epoch 17, batch 1200, loss[loss=0.2019, ctc_loss=0.1351, cr_loss=0.3339, over 21006.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1689, cr_loss=0.3877, over 4073269.52 frames. ], batch size: 48, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:32:28,867 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:32:36,587 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2024-09-15 13:32:39,186 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:32:42,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=293173.5, ans=0.0 2024-09-15 13:32:45,258 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:33:06,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=293201.8333333333, ans=0.125 2024-09-15 13:33:30,669 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.094e+02 2.264e+02 2.504e+02 3.447e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-15 13:33:32,188 INFO [train.py:1198] (0/2) Epoch 17, batch 1250, loss[loss=0.2361, ctc_loss=0.1591, cr_loss=0.3852, over 21018.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1677, cr_loss=0.3856, over 4073013.47 frames. ], batch size: 61, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:34:47,588 INFO [train.py:1198] (0/2) Epoch 17, batch 1300, loss[loss=0.2654, ctc_loss=0.1833, cr_loss=0.4101, over 20037.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.167, cr_loss=0.3852, over 4077510.50 frames. ], batch size: 80, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:34:50,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=293400.1666666667, ans=0.0 2024-09-15 13:35:24,034 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:35:31,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-15 13:35:40,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=293485.1666666667, ans=0.04949747468305833 2024-09-15 13:36:00,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=293513.5, ans=0.125 2024-09-15 13:36:01,436 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.095e+02 2.247e+02 2.366e+02 3.947e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-15 13:36:02,962 INFO [train.py:1198] (0/2) Epoch 17, batch 1350, loss[loss=0.2143, ctc_loss=0.1434, cr_loss=0.3543, over 21028.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1665, cr_loss=0.3846, over 4084611.67 frames. ], batch size: 63, lr: 4.97e-03, grad_scale: 32.0 2024-09-15 13:36:25,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293570.1666666667, ans=0.1 2024-09-15 13:36:47,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=293626.8333333333, ans=0.1 2024-09-15 13:37:07,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.42 vs. limit=15.0 2024-09-15 13:37:11,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=293655.1666666667, ans=0.0 2024-09-15 13:37:14,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=293655.1666666667, ans=0.125 2024-09-15 13:37:18,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=8.0 2024-09-15 13:37:21,406 INFO [train.py:1198] (0/2) Epoch 17, batch 1400, loss[loss=0.2531, ctc_loss=0.1717, cr_loss=0.4073, over 21007.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1663, cr_loss=0.3847, over 4085474.26 frames. ], batch size: 61, lr: 4.97e-03, grad_scale: 64.0 2024-09-15 13:38:13,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-15 13:38:35,049 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.045e+02 2.188e+02 2.368e+02 3.462e+02, threshold=4.376e+02, percent-clipped=0.0 2024-09-15 13:38:36,659 INFO [train.py:1198] (0/2) Epoch 17, batch 1450, loss[loss=0.2643, ctc_loss=0.1936, cr_loss=0.3533, over 13961.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1666, cr_loss=0.3845, over 4071783.37 frames. ], batch size: 150, lr: 4.97e-03, grad_scale: 64.0 2024-09-15 13:39:23,025 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.20 vs. limit=12.0 2024-09-15 13:39:40,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=293938.5, ans=0.125 2024-09-15 13:39:43,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=293938.5, ans=0.0 2024-09-15 13:39:55,418 INFO [train.py:1198] (0/2) Epoch 17, batch 1500, loss[loss=0.2561, ctc_loss=0.175, cr_loss=0.4053, over 20962.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1669, cr_loss=0.3847, over 4077673.99 frames. ], batch size: 64, lr: 4.97e-03, grad_scale: 64.0 2024-09-15 13:40:23,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=293995.1666666667, ans=0.0 2024-09-15 13:40:53,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=294051.8333333333, ans=0.09899494936611666 2024-09-15 13:41:09,607 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.010e+02 2.197e+02 2.365e+02 6.314e+02, threshold=4.393e+02, percent-clipped=1.0 2024-09-15 13:41:11,183 INFO [train.py:1198] (0/2) Epoch 17, batch 1550, loss[loss=0.2199, ctc_loss=0.1505, cr_loss=0.3467, over 20786.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.167, cr_loss=0.3849, over 4075293.17 frames. ], batch size: 53, lr: 4.96e-03, grad_scale: 64.0 2024-09-15 13:41:22,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=294108.5, ans=0.125 2024-09-15 13:41:42,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.20 vs. limit=22.5 2024-09-15 13:41:52,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=294165.1666666667, ans=0.125 2024-09-15 13:41:57,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=294193.5, ans=0.05 2024-09-15 13:41:58,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=294193.5, ans=0.025 2024-09-15 13:42:29,228 INFO [train.py:1198] (0/2) Epoch 17, batch 1600, loss[loss=0.2533, ctc_loss=0.1707, cr_loss=0.4127, over 20977.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.167, cr_loss=0.3855, over 4074407.78 frames. ], batch size: 58, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:42:57,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.14 vs. limit=22.5 2024-09-15 13:42:59,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=294306.8333333333, ans=0.0 2024-09-15 13:43:02,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=294306.8333333333, ans=0.125 2024-09-15 13:43:12,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-15 13:43:37,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=294363.5, ans=0.2 2024-09-15 13:43:45,109 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.071e+02 2.192e+02 2.418e+02 4.463e+02, threshold=4.384e+02, percent-clipped=1.0 2024-09-15 13:43:45,131 INFO [train.py:1198] (0/2) Epoch 17, batch 1650, loss[loss=0.2185, ctc_loss=0.1452, cr_loss=0.3667, over 20957.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1673, cr_loss=0.3869, over 4085582.13 frames. ], batch size: 51, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:43:57,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=294391.8333333333, ans=0.125 2024-09-15 13:44:27,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.46 vs. limit=15.0 2024-09-15 13:44:49,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=294505.1666666667, ans=0.0 2024-09-15 13:45:02,971 INFO [train.py:1198] (0/2) Epoch 17, batch 1700, loss[loss=0.2706, ctc_loss=0.1846, cr_loss=0.4299, over 21012.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.167, cr_loss=0.386, over 4085206.85 frames. ], batch size: 63, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:45:13,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=294533.5, ans=0.0 2024-09-15 13:45:24,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=294561.8333333333, ans=0.125 2024-09-15 13:45:45,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=294590.1666666667, ans=0.0 2024-09-15 13:46:12,589 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-104000.pt 2024-09-15 13:46:13,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=294646.8333333333, ans=0.0 2024-09-15 13:46:19,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.710e+02 2.064e+02 2.202e+02 2.336e+02 7.186e+02, threshold=4.404e+02, percent-clipped=1.0 2024-09-15 13:46:19,491 INFO [train.py:1198] (0/2) Epoch 17, batch 1750, loss[loss=0.251, ctc_loss=0.1715, cr_loss=0.3975, over 20811.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1669, cr_loss=0.3858, over 4083515.50 frames. ], batch size: 53, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:46:32,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2024-09-15 13:46:46,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294703.5, ans=0.1 2024-09-15 13:46:51,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-09-15 13:46:56,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=294731.8333333333, ans=0.1 2024-09-15 13:47:16,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=294760.1666666667, ans=0.025 2024-09-15 13:47:30,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=294788.5, ans=0.125 2024-09-15 13:47:33,888 INFO [train.py:1198] (0/2) Epoch 17, batch 1800, loss[loss=0.2442, ctc_loss=0.1687, cr_loss=0.3774, over 20844.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1673, cr_loss=0.3867, over 4093453.82 frames. ], batch size: 65, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:47:43,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=294816.8333333333, ans=0.0 2024-09-15 13:48:01,332 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:48:01,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2024-09-15 13:48:32,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=294901.8333333333, ans=0.0 2024-09-15 13:48:32,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=294901.8333333333, ans=0.125 2024-09-15 13:48:33,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=294901.8333333333, ans=0.125 2024-09-15 13:48:47,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=294930.1666666667, ans=0.125 2024-09-15 13:48:48,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=294930.1666666667, ans=0.125 2024-09-15 13:48:49,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=22.5 2024-09-15 13:48:51,342 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.766e+02 2.008e+02 2.119e+02 2.283e+02 3.531e+02, threshold=4.239e+02, percent-clipped=0.0 2024-09-15 13:48:51,363 INFO [train.py:1198] (0/2) Epoch 17, batch 1850, loss[loss=0.2421, ctc_loss=0.1635, cr_loss=0.3931, over 21074.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1662, cr_loss=0.3856, over 4101668.46 frames. ], batch size: 59, lr: 4.96e-03, grad_scale: 32.0 2024-09-15 13:49:02,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=294958.5, ans=0.025 2024-09-15 13:49:32,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=295015.1666666667, ans=0.125 2024-09-15 13:49:57,969 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:50:06,364 INFO [train.py:1198] (0/2) Epoch 17, batch 1900, loss[loss=0.2555, ctc_loss=0.175, cr_loss=0.4027, over 20588.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1667, cr_loss=0.3861, over 4099613.91 frames. ], batch size: 75, lr: 4.96e-03, grad_scale: 16.0 2024-09-15 13:50:24,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=295128.5, ans=0.125 2024-09-15 13:50:30,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=295128.5, ans=0.125 2024-09-15 13:50:44,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=295156.8333333333, ans=0.125 2024-09-15 13:50:51,837 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=15.0 2024-09-15 13:51:15,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=295213.5, ans=0.125 2024-09-15 13:51:24,534 INFO [train.py:1198] (0/2) Epoch 17, batch 1950, loss[loss=0.2781, ctc_loss=0.1934, cr_loss=0.4237, over 21066.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1668, cr_loss=0.3859, over 4106377.78 frames. ], batch size: 59, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:51:26,033 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.021e+02 2.155e+02 2.379e+02 3.372e+02, threshold=4.309e+02, percent-clipped=0.0 2024-09-15 13:51:50,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=295270.1666666667, ans=0.0 2024-09-15 13:52:04,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295298.5, ans=0.1 2024-09-15 13:52:20,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=295326.8333333333, ans=0.125 2024-09-15 13:52:39,746 INFO [train.py:1198] (0/2) Epoch 17, batch 2000, loss[loss=0.2437, ctc_loss=0.1691, cr_loss=0.3731, over 20941.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1676, cr_loss=0.3873, over 4102307.00 frames. ], batch size: 64, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:53:13,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=295440.1666666667, ans=0.125 2024-09-15 13:53:23,925 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 13:53:47,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295496.8333333333, ans=0.1 2024-09-15 13:53:49,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.97 vs. limit=22.5 2024-09-15 13:53:53,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=295525.1666666667, ans=0.125 2024-09-15 13:53:54,884 INFO [train.py:1198] (0/2) Epoch 17, batch 2050, loss[loss=0.2296, ctc_loss=0.1564, cr_loss=0.366, over 21020.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1667, cr_loss=0.3865, over 4112996.62 frames. ], batch size: 61, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:54:00,725 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.099e+02 2.233e+02 2.487e+02 4.359e+02, threshold=4.466e+02, percent-clipped=2.0 2024-09-15 13:54:09,128 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=15.0 2024-09-15 13:54:09,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=295525.1666666667, ans=0.125 2024-09-15 13:54:34,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=295581.8333333333, ans=0.025 2024-09-15 13:54:47,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=295610.1666666667, ans=0.07 2024-09-15 13:54:50,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=295610.1666666667, ans=0.125 2024-09-15 13:55:03,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=295638.5, ans=0.125 2024-09-15 13:55:12,304 INFO [train.py:1198] (0/2) Epoch 17, batch 2100, loss[loss=0.2413, ctc_loss=0.1626, cr_loss=0.3935, over 20842.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1676, cr_loss=0.388, over 4103128.37 frames. ], batch size: 59, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:55:13,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=22.5 2024-09-15 13:55:26,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295695.1666666667, ans=0.1 2024-09-15 13:55:47,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=295723.5, ans=0.2 2024-09-15 13:55:53,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=295723.5, ans=0.125 2024-09-15 13:56:19,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=295780.1666666667, ans=0.125 2024-09-15 13:56:29,972 INFO [train.py:1198] (0/2) Epoch 17, batch 2150, loss[loss=0.2308, ctc_loss=0.1553, cr_loss=0.3771, over 20999.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1669, cr_loss=0.3867, over 4107893.62 frames. ], batch size: 61, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:56:32,986 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.015e+02 2.167e+02 2.319e+02 3.201e+02, threshold=4.334e+02, percent-clipped=0.0 2024-09-15 13:57:06,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=295865.1666666667, ans=0.125 2024-09-15 13:57:17,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-09-15 13:57:27,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=295893.5, ans=0.0 2024-09-15 13:57:34,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=295921.8333333333, ans=0.125 2024-09-15 13:57:35,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-15 13:57:45,069 INFO [train.py:1198] (0/2) Epoch 17, batch 2200, loss[loss=0.2455, ctc_loss=0.1641, cr_loss=0.4068, over 20814.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1665, cr_loss=0.3858, over 4108473.77 frames. ], batch size: 59, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:57:57,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=295950.1666666667, ans=0.95 2024-09-15 13:59:00,516 INFO [train.py:1198] (0/2) Epoch 17, batch 2250, loss[loss=0.2546, ctc_loss=0.1745, cr_loss=0.4003, over 20970.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1664, cr_loss=0.3858, over 4112345.99 frames. ], batch size: 67, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 13:59:03,330 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.051e+02 2.236e+02 2.524e+02 3.269e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-15 13:59:08,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=296091.8333333333, ans=0.0 2024-09-15 13:59:29,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=296148.5, ans=0.125 2024-09-15 13:59:32,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2024-09-15 13:59:54,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=296176.8333333333, ans=0.125 2024-09-15 14:00:04,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296205.1666666667, ans=0.1 2024-09-15 14:00:07,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=296205.1666666667, ans=0.125 2024-09-15 14:00:17,369 INFO [train.py:1198] (0/2) Epoch 17, batch 2300, loss[loss=0.202, ctc_loss=0.1359, cr_loss=0.3308, over 20926.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1663, cr_loss=0.3856, over 4115148.50 frames. ], batch size: 48, lr: 4.95e-03, grad_scale: 16.0 2024-09-15 14:00:35,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=296261.8333333333, ans=0.125 2024-09-15 14:00:56,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=296290.1666666667, ans=0.0 2024-09-15 14:01:06,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-15 14:01:14,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=296318.5, ans=0.125 2024-09-15 14:01:28,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=296346.8333333333, ans=0.125 2024-09-15 14:01:32,215 INFO [train.py:1198] (0/2) Epoch 17, batch 2350, loss[loss=0.257, ctc_loss=0.1742, cr_loss=0.4144, over 20883.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1671, cr_loss=0.3872, over 4108822.26 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 16.0 2024-09-15 14:01:35,175 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.035e+02 2.207e+02 2.444e+02 3.534e+02, threshold=4.415e+02, percent-clipped=0.0 2024-09-15 14:01:44,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=296375.1666666667, ans=15.0 2024-09-15 14:02:21,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=296460.1666666667, ans=0.125 2024-09-15 14:02:38,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296488.5, ans=0.1 2024-09-15 14:02:49,866 INFO [train.py:1198] (0/2) Epoch 17, batch 2400, loss[loss=0.2134, ctc_loss=0.1445, cr_loss=0.3448, over 20956.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1675, cr_loss=0.3876, over 4096330.24 frames. ], batch size: 51, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:03:08,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-09-15 14:03:10,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=296545.1666666667, ans=0.125 2024-09-15 14:03:34,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=296601.8333333333, ans=0.125 2024-09-15 14:03:36,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=296601.8333333333, ans=0.04949747468305833 2024-09-15 14:03:40,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=296601.8333333333, ans=0.125 2024-09-15 14:04:04,798 INFO [train.py:1198] (0/2) Epoch 17, batch 2450, loss[loss=0.2118, ctc_loss=0.1443, cr_loss=0.3376, over 20967.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1667, cr_loss=0.386, over 4100846.29 frames. ], batch size: 51, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:04:07,772 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.676e+02 2.073e+02 2.225e+02 2.492e+02 4.389e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-15 14:05:16,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=296771.8333333333, ans=0.0 2024-09-15 14:05:23,775 INFO [train.py:1198] (0/2) Epoch 17, batch 2500, loss[loss=0.2561, ctc_loss=0.1775, cr_loss=0.3932, over 20675.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1672, cr_loss=0.386, over 4099992.13 frames. ], batch size: 71, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:06:01,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=296856.8333333333, ans=0.0 2024-09-15 14:06:15,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=296885.1666666667, ans=0.5 2024-09-15 14:06:22,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-15 14:06:25,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=296913.5, ans=0.125 2024-09-15 14:06:31,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=296913.5, ans=0.025 2024-09-15 14:06:38,416 INFO [train.py:1198] (0/2) Epoch 17, batch 2550, loss[loss=0.2567, ctc_loss=0.1742, cr_loss=0.4123, over 20695.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1661, cr_loss=0.3846, over 4111262.22 frames. ], batch size: 66, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:06:41,553 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.099e+02 2.244e+02 2.410e+02 3.985e+02, threshold=4.488e+02, percent-clipped=0.0 2024-09-15 14:07:37,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.68 vs. limit=10.0 2024-09-15 14:07:51,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=297055.1666666667, ans=0.125 2024-09-15 14:07:56,079 INFO [train.py:1198] (0/2) Epoch 17, batch 2600, loss[loss=0.2356, ctc_loss=0.1597, cr_loss=0.3793, over 21024.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1649, cr_loss=0.3825, over 4107254.06 frames. ], batch size: 63, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:08:05,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=297083.5, ans=0.0 2024-09-15 14:08:12,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297111.8333333333, ans=0.1 2024-09-15 14:08:44,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=297168.5, ans=0.2 2024-09-15 14:09:10,704 INFO [train.py:1198] (0/2) Epoch 17, batch 2650, loss[loss=0.2591, ctc_loss=0.1763, cr_loss=0.4138, over 20978.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.165, cr_loss=0.3828, over 4105185.03 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:09:13,921 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.064e+02 2.236e+02 2.470e+02 3.144e+02, threshold=4.471e+02, percent-clipped=0.0 2024-09-15 14:09:24,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=297253.5, ans=0.0 2024-09-15 14:09:27,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=297253.5, ans=0.2 2024-09-15 14:10:16,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297338.5, ans=0.1 2024-09-15 14:10:21,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=297338.5, ans=0.125 2024-09-15 14:10:25,552 INFO [train.py:1198] (0/2) Epoch 17, batch 2700, loss[loss=0.2286, ctc_loss=0.1535, cr_loss=0.3758, over 20660.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1645, cr_loss=0.3824, over 4106334.17 frames. ], batch size: 68, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:10:56,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297423.5, ans=0.1 2024-09-15 14:10:59,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=297423.5, ans=0.0 2024-09-15 14:11:18,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297451.8333333333, ans=0.1 2024-09-15 14:11:38,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=297480.1666666667, ans=0.0 2024-09-15 14:11:44,311 INFO [train.py:1198] (0/2) Epoch 17, batch 2750, loss[loss=0.2539, ctc_loss=0.1755, cr_loss=0.3923, over 21006.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.165, cr_loss=0.3831, over 4096738.06 frames. ], batch size: 63, lr: 4.94e-03, grad_scale: 32.0 2024-09-15 14:11:47,250 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.036e+02 2.144e+02 2.342e+02 3.425e+02, threshold=4.287e+02, percent-clipped=0.0 2024-09-15 14:11:47,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=297508.5, ans=0.0 2024-09-15 14:11:56,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-15 14:12:01,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=297536.8333333333, ans=0.125 2024-09-15 14:12:17,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=297565.1666666667, ans=0.04949747468305833 2024-09-15 14:12:59,311 INFO [train.py:1198] (0/2) Epoch 17, batch 2800, loss[loss=0.2443, ctc_loss=0.1661, cr_loss=0.391, over 20795.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.164, cr_loss=0.3819, over 4112125.77 frames. ], batch size: 56, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:13:00,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=22.5 2024-09-15 14:13:01,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=297650.1666666667, ans=0.125 2024-09-15 14:13:17,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297678.5, ans=0.1 2024-09-15 14:13:24,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.89 vs. limit=22.5 2024-09-15 14:13:24,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=297678.5, ans=0.0 2024-09-15 14:13:33,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=297706.8333333333, ans=0.125 2024-09-15 14:13:36,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=8.0 2024-09-15 14:14:03,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=297763.5, ans=0.0 2024-09-15 14:14:05,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=297763.5, ans=0.09899494936611666 2024-09-15 14:14:15,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=297791.8333333333, ans=0.2 2024-09-15 14:14:16,949 INFO [train.py:1198] (0/2) Epoch 17, batch 2850, loss[loss=0.2305, ctc_loss=0.1576, cr_loss=0.3646, over 20764.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1659, cr_loss=0.3853, over 4106712.47 frames. ], batch size: 56, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:14:19,829 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.046e+02 2.142e+02 2.377e+02 3.016e+02, threshold=4.283e+02, percent-clipped=0.0 2024-09-15 14:14:23,775 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2024-09-15 14:14:29,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-09-15 14:14:30,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=297820.1666666667, ans=0.125 2024-09-15 14:14:42,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=297820.1666666667, ans=0.125 2024-09-15 14:14:51,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=297848.5, ans=0.125 2024-09-15 14:14:54,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=297848.5, ans=0.125 2024-09-15 14:15:08,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=297876.8333333333, ans=0.125 2024-09-15 14:15:11,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=297876.8333333333, ans=0.125 2024-09-15 14:15:26,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=297905.1666666667, ans=0.125 2024-09-15 14:15:32,261 INFO [train.py:1198] (0/2) Epoch 17, batch 2900, loss[loss=0.2397, ctc_loss=0.1645, cr_loss=0.3762, over 20816.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1665, cr_loss=0.3859, over 4107090.17 frames. ], batch size: 59, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:16:50,974 INFO [train.py:1198] (0/2) Epoch 17, batch 2950, loss[loss=0.2137, ctc_loss=0.1467, cr_loss=0.3351, over 20982.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1659, cr_loss=0.3844, over 4109670.17 frames. ], batch size: 51, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:16:53,920 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.050e+02 2.156e+02 2.312e+02 2.932e+02, threshold=4.311e+02, percent-clipped=0.0 2024-09-15 14:17:06,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=298103.5, ans=0.035 2024-09-15 14:17:14,167 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2024-09-15 14:17:15,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=298103.5, ans=0.125 2024-09-15 14:17:18,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298103.5, ans=0.1 2024-09-15 14:17:45,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=298160.1666666667, ans=0.125 2024-09-15 14:17:51,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=298188.5, ans=0.125 2024-09-15 14:17:51,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=298188.5, ans=0.125 2024-09-15 14:18:05,984 INFO [train.py:1198] (0/2) Epoch 17, batch 3000, loss[loss=0.2011, ctc_loss=0.1347, cr_loss=0.332, over 20237.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1662, cr_loss=0.3844, over 4098229.88 frames. ], batch size: 45, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:18:05,985 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 14:18:25,257 INFO [train.py:1230] (0/2) Epoch 17, validation: loss=0.04582, ctc_loss=0.04582, cr_loss=1.047e-14, over 944034.00 frames. 2024-09-15 14:18:25,257 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 14:18:25,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=298216.8333333333, ans=0.0 2024-09-15 14:18:37,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=298216.8333333333, ans=0.125 2024-09-15 14:18:52,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=298245.1666666667, ans=0.2 2024-09-15 14:19:10,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=298273.5, ans=0.025 2024-09-15 14:19:42,957 INFO [train.py:1198] (0/2) Epoch 17, batch 3050, loss[loss=0.292, ctc_loss=0.2111, cr_loss=0.4046, over 14234.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1674, cr_loss=0.3864, over 4101044.54 frames. ], batch size: 149, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:19:45,773 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.109e+02 2.235e+02 2.405e+02 3.205e+02, threshold=4.469e+02, percent-clipped=0.0 2024-09-15 14:20:33,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=298443.5, ans=0.0 2024-09-15 14:20:36,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=298443.5, ans=0.0 2024-09-15 14:20:56,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=22.5 2024-09-15 14:20:58,659 INFO [train.py:1198] (0/2) Epoch 17, batch 3100, loss[loss=0.2675, ctc_loss=0.1827, cr_loss=0.424, over 20995.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1675, cr_loss=0.3873, over 4099090.62 frames. ], batch size: 61, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:21:02,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=298500.1666666667, ans=0.125 2024-09-15 14:21:14,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2024-09-15 14:21:53,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=298585.1666666667, ans=0.0 2024-09-15 14:22:17,351 INFO [train.py:1198] (0/2) Epoch 17, batch 3150, loss[loss=0.2172, ctc_loss=0.146, cr_loss=0.3559, over 20978.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1676, cr_loss=0.3875, over 4088654.89 frames. ], batch size: 52, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:22:20,403 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.066e+02 2.183e+02 2.374e+02 4.735e+02, threshold=4.367e+02, percent-clipped=2.0 2024-09-15 14:22:23,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=298641.8333333333, ans=0.0 2024-09-15 14:22:28,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=298641.8333333333, ans=0.0 2024-09-15 14:22:37,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=298670.1666666667, ans=0.025 2024-09-15 14:22:38,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=298670.1666666667, ans=0.0 2024-09-15 14:23:14,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=298726.8333333333, ans=0.125 2024-09-15 14:23:22,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=298755.1666666667, ans=0.2 2024-09-15 14:23:26,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298755.1666666667, ans=0.1 2024-09-15 14:23:31,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298783.5, ans=0.1 2024-09-15 14:23:32,701 INFO [train.py:1198] (0/2) Epoch 17, batch 3200, loss[loss=0.277, ctc_loss=0.1937, cr_loss=0.4163, over 20059.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.168, cr_loss=0.3881, over 4095506.53 frames. ], batch size: 80, lr: 4.93e-03, grad_scale: 32.0 2024-09-15 14:23:36,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2024-09-15 14:23:46,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=298811.8333333333, ans=0.2 2024-09-15 14:23:59,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-09-15 14:24:08,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=298840.1666666667, ans=0.0 2024-09-15 14:24:43,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298896.8333333333, ans=0.1 2024-09-15 14:24:50,508 INFO [train.py:1198] (0/2) Epoch 17, batch 3250, loss[loss=0.2351, ctc_loss=0.1628, cr_loss=0.3614, over 20646.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1672, cr_loss=0.3869, over 4108893.53 frames. ], batch size: 68, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:24:54,971 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.089e+02 2.223e+02 2.425e+02 3.392e+02, threshold=4.446e+02, percent-clipped=0.0 2024-09-15 14:25:02,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-15 14:25:13,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-09-15 14:25:29,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=298981.8333333333, ans=0.0 2024-09-15 14:26:05,594 INFO [train.py:1198] (0/2) Epoch 17, batch 3300, loss[loss=0.2466, ctc_loss=0.1674, cr_loss=0.3963, over 20975.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1667, cr_loss=0.3863, over 4108500.94 frames. ], batch size: 61, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:27:20,655 INFO [train.py:1198] (0/2) Epoch 17, batch 3350, loss[loss=0.2403, ctc_loss=0.164, cr_loss=0.3816, over 20850.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.167, cr_loss=0.3863, over 4088181.14 frames. ], batch size: 65, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:27:21,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299208.5, ans=0.1 2024-09-15 14:27:25,272 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.045e+02 2.190e+02 2.326e+02 4.997e+02, threshold=4.380e+02, percent-clipped=1.0 2024-09-15 14:27:51,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=299265.1666666667, ans=0.025 2024-09-15 14:28:27,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-15 14:28:34,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.21 vs. limit=15.0 2024-09-15 14:28:39,381 INFO [train.py:1198] (0/2) Epoch 17, batch 3400, loss[loss=0.2424, ctc_loss=0.1634, cr_loss=0.3952, over 20858.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1662, cr_loss=0.3852, over 4102954.81 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:28:48,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=299350.1666666667, ans=0.125 2024-09-15 14:29:23,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=299435.1666666667, ans=0.025 2024-09-15 14:29:54,513 INFO [train.py:1198] (0/2) Epoch 17, batch 3450, loss[loss=0.2118, ctc_loss=0.1398, cr_loss=0.3598, over 20984.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1651, cr_loss=0.3838, over 4107014.87 frames. ], batch size: 58, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:29:56,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299491.8333333333, ans=0.1 2024-09-15 14:29:58,880 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+02 2.049e+02 2.191e+02 2.372e+02 3.489e+02, threshold=4.383e+02, percent-clipped=0.0 2024-09-15 14:30:00,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=299491.8333333333, ans=0.125 2024-09-15 14:30:27,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=299548.5, ans=0.025 2024-09-15 14:30:46,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=299576.8333333333, ans=0.035 2024-09-15 14:30:51,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=299576.8333333333, ans=0.125 2024-09-15 14:30:57,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299605.1666666667, ans=0.1 2024-09-15 14:31:12,344 INFO [train.py:1198] (0/2) Epoch 17, batch 3500, loss[loss=0.2494, ctc_loss=0.171, cr_loss=0.3919, over 20777.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1652, cr_loss=0.3842, over 4106343.04 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:31:20,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=299633.5, ans=0.0 2024-09-15 14:31:42,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299690.1666666667, ans=0.1 2024-09-15 14:31:43,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=299690.1666666667, ans=0.0 2024-09-15 14:31:49,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=299690.1666666667, ans=0.025 2024-09-15 14:31:54,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=22.5 2024-09-15 14:31:55,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=299690.1666666667, ans=0.0 2024-09-15 14:32:12,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2024-09-15 14:32:29,026 INFO [train.py:1198] (0/2) Epoch 17, batch 3550, loss[loss=0.2835, ctc_loss=0.1965, cr_loss=0.4346, over 21069.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1642, cr_loss=0.3828, over 4117865.94 frames. ], batch size: 59, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:32:33,407 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.715e+02 2.077e+02 2.267e+02 2.390e+02 4.371e+02, threshold=4.534e+02, percent-clipped=0.0 2024-09-15 14:32:38,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=299775.1666666667, ans=10.0 2024-09-15 14:32:39,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=299775.1666666667, ans=0.0 2024-09-15 14:32:41,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299775.1666666667, ans=0.1 2024-09-15 14:32:45,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=299803.5, ans=0.125 2024-09-15 14:32:47,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=299803.5, ans=0.125 2024-09-15 14:32:54,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=299803.5, ans=0.025 2024-09-15 14:33:17,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.24 vs. limit=10.0 2024-09-15 14:33:46,974 INFO [train.py:1198] (0/2) Epoch 17, batch 3600, loss[loss=0.2327, ctc_loss=0.1553, cr_loss=0.3869, over 20948.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1649, cr_loss=0.3838, over 4104975.36 frames. ], batch size: 49, lr: 4.92e-03, grad_scale: 32.0 2024-09-15 14:34:23,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=299973.5, ans=0.025 2024-09-15 14:34:25,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=299973.5, ans=0.0 2024-09-15 14:34:55,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=22.5 2024-09-15 14:35:02,595 INFO [train.py:1198] (0/2) Epoch 17, batch 3650, loss[loss=0.2166, ctc_loss=0.1454, cr_loss=0.3559, over 20781.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1647, cr_loss=0.3837, over 4107759.41 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:35:06,848 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.709e+02 2.087e+02 2.181e+02 2.335e+02 3.955e+02, threshold=4.362e+02, percent-clipped=0.0 2024-09-15 14:35:28,493 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=15.0 2024-09-15 14:35:31,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2024-09-15 14:35:34,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=300115.1666666667, ans=0.125 2024-09-15 14:35:53,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=300143.5, ans=0.125 2024-09-15 14:36:09,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=300171.8333333333, ans=0.025 2024-09-15 14:36:14,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=300171.8333333333, ans=0.2 2024-09-15 14:36:20,301 INFO [train.py:1198] (0/2) Epoch 17, batch 3700, loss[loss=0.2188, ctc_loss=0.1489, cr_loss=0.3494, over 20979.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1657, cr_loss=0.385, over 4093338.45 frames. ], batch size: 49, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:36:28,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=300200.1666666667, ans=0.125 2024-09-15 14:36:57,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=300256.8333333333, ans=0.1 2024-09-15 14:37:35,944 INFO [train.py:1198] (0/2) Epoch 17, batch 3750, loss[loss=0.2807, ctc_loss=0.1951, cr_loss=0.4279, over 20985.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1658, cr_loss=0.3855, over 4103069.29 frames. ], batch size: 64, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:37:40,544 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.107e+02 2.281e+02 2.521e+02 4.388e+02, threshold=4.561e+02, percent-clipped=1.0 2024-09-15 14:37:41,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2024-09-15 14:38:12,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2024-09-15 14:38:32,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=300426.8333333333, ans=0.125 2024-09-15 14:38:38,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=300455.1666666667, ans=0.2 2024-09-15 14:38:45,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.61 vs. limit=10.0 2024-09-15 14:38:51,438 INFO [train.py:1198] (0/2) Epoch 17, batch 3800, loss[loss=0.2357, ctc_loss=0.1555, cr_loss=0.4013, over 21063.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1653, cr_loss=0.385, over 4109416.92 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:39:48,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300568.5, ans=0.1 2024-09-15 14:40:08,986 INFO [train.py:1198] (0/2) Epoch 17, batch 3850, loss[loss=0.2119, ctc_loss=0.1409, cr_loss=0.3551, over 20997.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1661, cr_loss=0.3858, over 4107460.42 frames. ], batch size: 52, lr: 4.91e-03, grad_scale: 16.0 2024-09-15 14:40:14,842 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.070e+02 2.233e+02 2.370e+02 4.826e+02, threshold=4.467e+02, percent-clipped=1.0 2024-09-15 14:40:16,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=300625.1666666667, ans=0.125 2024-09-15 14:41:23,837 INFO [train.py:1198] (0/2) Epoch 17, batch 3900, loss[loss=0.2786, ctc_loss=0.1947, cr_loss=0.4197, over 19527.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1661, cr_loss=0.3855, over 4104241.61 frames. ], batch size: 90, lr: 4.91e-03, grad_scale: 16.0 2024-09-15 14:42:27,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=300880.1666666667, ans=0.0 2024-09-15 14:42:42,035 INFO [train.py:1198] (0/2) Epoch 17, batch 3950, loss[loss=0.2751, ctc_loss=0.1891, cr_loss=0.4302, over 19415.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.166, cr_loss=0.385, over 4107820.62 frames. ], batch size: 90, lr: 4.91e-03, grad_scale: 16.0 2024-09-15 14:42:42,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=300908.5, ans=0.0 2024-09-15 14:42:48,157 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.106e+02 2.234e+02 2.468e+02 5.301e+02, threshold=4.468e+02, percent-clipped=1.0 2024-09-15 14:42:57,833 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=22.5 2024-09-15 14:43:10,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=300965.1666666667, ans=0.025 2024-09-15 14:43:34,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=300993.5, ans=0.07 2024-09-15 14:43:54,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-09-15 14:43:57,770 INFO [train.py:1198] (0/2) Epoch 17, batch 4000, loss[loss=0.2566, ctc_loss=0.1748, cr_loss=0.4092, over 19379.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1659, cr_loss=0.3842, over 4094440.76 frames. ], batch size: 90, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:44:13,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=301078.5, ans=0.125 2024-09-15 14:44:21,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.52 vs. limit=22.5 2024-09-15 14:44:43,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=301135.1666666667, ans=0.025 2024-09-15 14:44:44,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2024-09-15 14:44:46,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2024-09-15 14:45:15,374 INFO [train.py:1198] (0/2) Epoch 17, batch 4050, loss[loss=0.1965, ctc_loss=0.1273, cr_loss=0.346, over 21086.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1657, cr_loss=0.3839, over 4080320.55 frames. ], batch size: 53, lr: 4.91e-03, grad_scale: 32.0 2024-09-15 14:45:18,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=301191.8333333333, ans=0.125 2024-09-15 14:45:21,235 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.106e+02 2.228e+02 2.386e+02 4.544e+02, threshold=4.456e+02, percent-clipped=1.0 2024-09-15 14:45:43,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=301248.5, ans=0.0 2024-09-15 14:45:51,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=301248.5, ans=0.0 2024-09-15 14:45:57,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=301248.5, ans=0.2 2024-09-15 14:46:17,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=301305.1666666667, ans=0.0 2024-09-15 14:46:30,136 INFO [train.py:1198] (0/2) Epoch 17, batch 4100, loss[loss=0.2476, ctc_loss=0.1679, cr_loss=0.3984, over 20639.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.165, cr_loss=0.3836, over 4095473.42 frames. ], batch size: 68, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:46:52,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301361.8333333333, ans=0.1 2024-09-15 14:47:21,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2024-09-15 14:47:48,004 INFO [train.py:1198] (0/2) Epoch 17, batch 4150, loss[loss=0.2623, ctc_loss=0.1806, cr_loss=0.4085, over 21078.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1651, cr_loss=0.3838, over 4099232.38 frames. ], batch size: 59, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:47:53,851 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.680e+02 2.056e+02 2.176e+02 2.386e+02 3.783e+02, threshold=4.352e+02, percent-clipped=0.0 2024-09-15 14:47:54,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=301475.1666666667, ans=0.125 2024-09-15 14:48:28,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=301531.8333333333, ans=0.125 2024-09-15 14:49:02,741 INFO [train.py:1198] (0/2) Epoch 17, batch 4200, loss[loss=0.249, ctc_loss=0.1691, cr_loss=0.3998, over 21070.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1659, cr_loss=0.3855, over 4092365.99 frames. ], batch size: 59, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:49:31,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=301673.5, ans=0.125 2024-09-15 14:49:40,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=301673.5, ans=0.125 2024-09-15 14:49:43,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-09-15 14:49:47,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=301701.8333333333, ans=0.07 2024-09-15 14:49:57,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2024-09-15 14:50:17,616 INFO [train.py:1198] (0/2) Epoch 17, batch 4250, loss[loss=0.2422, ctc_loss=0.1691, cr_loss=0.3655, over 20996.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1669, cr_loss=0.3855, over 4075887.35 frames. ], batch size: 61, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:50:23,476 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.062e+02 2.250e+02 2.443e+02 4.732e+02, threshold=4.501e+02, percent-clipped=1.0 2024-09-15 14:50:25,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=301758.5, ans=0.0 2024-09-15 14:51:20,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=301871.8333333333, ans=0.125 2024-09-15 14:51:32,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301871.8333333333, ans=0.1 2024-09-15 14:51:35,478 INFO [train.py:1198] (0/2) Epoch 17, batch 4300, loss[loss=0.2334, ctc_loss=0.1577, cr_loss=0.3786, over 20800.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1663, cr_loss=0.3845, over 4091710.99 frames. ], batch size: 53, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:51:40,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=301900.1666666667, ans=0.5 2024-09-15 14:51:53,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301928.5, ans=0.1 2024-09-15 14:52:42,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=302013.5, ans=0.125 2024-09-15 14:52:49,693 INFO [train.py:1198] (0/2) Epoch 17, batch 4350, loss[loss=0.2083, ctc_loss=0.1419, cr_loss=0.3323, over 20777.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1666, cr_loss=0.3858, over 4098455.87 frames. ], batch size: 53, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:52:55,792 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.097e+02 2.181e+02 2.314e+02 2.892e+02, threshold=4.362e+02, percent-clipped=0.0 2024-09-15 14:53:56,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302155.1666666667, ans=0.1 2024-09-15 14:54:07,500 INFO [train.py:1198] (0/2) Epoch 17, batch 4400, loss[loss=0.2217, ctc_loss=0.1485, cr_loss=0.3661, over 19958.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1666, cr_loss=0.3856, over 4095440.00 frames. ], batch size: 44, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:54:25,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302211.8333333333, ans=0.1 2024-09-15 14:54:39,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=302240.1666666667, ans=0.125 2024-09-15 14:54:46,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=302240.1666666667, ans=0.95 2024-09-15 14:54:48,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2024-09-15 14:54:49,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=302240.1666666667, ans=0.2 2024-09-15 14:54:50,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.83 vs. limit=15.0 2024-09-15 14:54:54,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=302268.5, ans=0.0 2024-09-15 14:55:22,733 INFO [train.py:1198] (0/2) Epoch 17, batch 4450, loss[loss=0.2283, ctc_loss=0.1573, cr_loss=0.3549, over 20634.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1656, cr_loss=0.3839, over 4105940.96 frames. ], batch size: 71, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:55:28,630 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.075e+02 2.240e+02 2.483e+02 6.482e+02, threshold=4.480e+02, percent-clipped=1.0 2024-09-15 14:55:52,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=302381.8333333333, ans=0.125 2024-09-15 14:55:52,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=302381.8333333333, ans=0.125 2024-09-15 14:56:14,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=302410.1666666667, ans=0.125 2024-09-15 14:56:40,625 INFO [train.py:1198] (0/2) Epoch 17, batch 4500, loss[loss=0.2591, ctc_loss=0.1779, cr_loss=0.4058, over 20975.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1669, cr_loss=0.3859, over 4097380.82 frames. ], batch size: 58, lr: 4.90e-03, grad_scale: 32.0 2024-09-15 14:56:54,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=302495.1666666667, ans=0.125 2024-09-15 14:57:28,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302551.8333333333, ans=0.1 2024-09-15 14:57:55,596 INFO [train.py:1198] (0/2) Epoch 17, batch 4550, loss[loss=0.2348, ctc_loss=0.1575, cr_loss=0.3861, over 20022.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1668, cr_loss=0.3855, over 4085749.54 frames. ], batch size: 80, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 14:57:59,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=302608.5, ans=0.2 2024-09-15 14:58:01,623 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.070e+02 2.214e+02 2.404e+02 5.625e+02, threshold=4.427e+02, percent-clipped=1.0 2024-09-15 14:58:12,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302636.8333333333, ans=0.1 2024-09-15 14:58:33,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=302665.1666666667, ans=0.0 2024-09-15 14:58:37,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=302665.1666666667, ans=0.125 2024-09-15 14:58:48,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=302693.5, ans=0.125 2024-09-15 14:58:55,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=302693.5, ans=0.0 2024-09-15 14:59:13,619 INFO [train.py:1198] (0/2) Epoch 17, batch 4600, loss[loss=0.27, ctc_loss=0.1878, cr_loss=0.4109, over 20954.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1667, cr_loss=0.386, over 4096576.48 frames. ], batch size: 64, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 14:59:18,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=302750.1666666667, ans=0.2 2024-09-15 14:59:23,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=302750.1666666667, ans=0.025 2024-09-15 14:59:41,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=302778.5, ans=0.125 2024-09-15 15:00:28,625 INFO [train.py:1198] (0/2) Epoch 17, batch 4650, loss[loss=0.2811, ctc_loss=0.196, cr_loss=0.4257, over 18448.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1673, cr_loss=0.387, over 4098070.87 frames. ], batch size: 108, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:00:34,726 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.052e+02 2.194e+02 2.380e+02 3.361e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 15:00:44,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=302920.1666666667, ans=0.0 2024-09-15 15:00:57,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=302948.5, ans=0.125 2024-09-15 15:01:07,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302948.5, ans=0.1 2024-09-15 15:01:16,057 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2024-09-15 15:01:20,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=302976.8333333333, ans=0.125 2024-09-15 15:01:44,090 INFO [train.py:1198] (0/2) Epoch 17, batch 4700, loss[loss=0.2678, ctc_loss=0.1844, cr_loss=0.4167, over 20988.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1675, cr_loss=0.387, over 4102514.48 frames. ], batch size: 64, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:02:32,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=303118.5, ans=0.125 2024-09-15 15:02:44,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=303118.5, ans=0.125 2024-09-15 15:02:53,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-09-15 15:02:57,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=303146.8333333333, ans=0.125 2024-09-15 15:03:02,019 INFO [train.py:1198] (0/2) Epoch 17, batch 4750, loss[loss=0.2054, ctc_loss=0.1391, cr_loss=0.3317, over 20795.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1676, cr_loss=0.3872, over 4094487.52 frames. ], batch size: 53, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:03:08,008 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.666e+02 2.045e+02 2.168e+02 2.315e+02 2.890e+02, threshold=4.337e+02, percent-clipped=0.0 2024-09-15 15:03:22,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=22.5 2024-09-15 15:03:43,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303231.8333333333, ans=0.1 2024-09-15 15:04:16,705 INFO [train.py:1198] (0/2) Epoch 17, batch 4800, loss[loss=0.2557, ctc_loss=0.1779, cr_loss=0.3889, over 19499.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.168, cr_loss=0.3875, over 4098818.33 frames. ], batch size: 90, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:04:21,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=303316.8333333333, ans=0.125 2024-09-15 15:05:33,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=303458.5, ans=0.125 2024-09-15 15:05:34,739 INFO [train.py:1198] (0/2) Epoch 17, batch 4850, loss[loss=0.2171, ctc_loss=0.1465, cr_loss=0.3531, over 20954.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1688, cr_loss=0.3882, over 4096166.21 frames. ], batch size: 50, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:05:40,693 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.079e+02 2.185e+02 2.356e+02 3.985e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-15 15:05:48,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=303486.8333333333, ans=0.2 2024-09-15 15:05:51,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-09-15 15:05:54,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=303486.8333333333, ans=0.2 2024-09-15 15:05:56,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=303486.8333333333, ans=0.125 2024-09-15 15:06:00,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=303486.8333333333, ans=0.125 2024-09-15 15:06:36,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=303571.8333333333, ans=0.125 2024-09-15 15:06:42,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=303571.8333333333, ans=0.95 2024-09-15 15:06:49,735 INFO [train.py:1198] (0/2) Epoch 17, batch 4900, loss[loss=0.2744, ctc_loss=0.1925, cr_loss=0.4099, over 20977.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1682, cr_loss=0.3879, over 4100184.91 frames. ], batch size: 67, lr: 4.89e-03, grad_scale: 32.0 2024-09-15 15:08:06,991 INFO [train.py:1198] (0/2) Epoch 17, batch 4950, loss[loss=0.2758, ctc_loss=0.1938, cr_loss=0.4101, over 20714.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.167, cr_loss=0.3864, over 4097311.59 frames. ], batch size: 71, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:08:12,734 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.056e+02 2.198e+02 2.367e+02 3.342e+02, threshold=4.395e+02, percent-clipped=0.0 2024-09-15 15:08:21,030 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2024-09-15 15:08:29,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=303770.1666666667, ans=0.0 2024-09-15 15:08:43,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.34 vs. limit=10.0 2024-09-15 15:09:21,191 INFO [train.py:1198] (0/2) Epoch 17, batch 5000, loss[loss=0.2564, ctc_loss=0.1707, cr_loss=0.4284, over 20723.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1666, cr_loss=0.3861, over 4095232.40 frames. ], batch size: 71, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:09:31,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=303883.5, ans=0.0 2024-09-15 15:09:49,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.61 vs. limit=22.5 2024-09-15 15:10:10,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=303968.5, ans=0.2 2024-09-15 15:10:17,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=303968.5, ans=0.125 2024-09-15 15:10:28,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=303996.8333333333, ans=0.0 2024-09-15 15:10:28,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=303996.8333333333, ans=0.125 2024-09-15 15:10:35,442 INFO [train.py:1198] (0/2) Epoch 17, batch 5050, loss[loss=0.214, ctc_loss=0.1474, cr_loss=0.3328, over 20982.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1664, cr_loss=0.3863, over 4105002.49 frames. ], batch size: 48, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:10:41,521 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.082e+02 2.226e+02 2.461e+02 3.152e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-15 15:11:02,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=304053.5, ans=0.125 2024-09-15 15:11:35,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=304138.5, ans=0.95 2024-09-15 15:11:42,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=304138.5, ans=0.125 2024-09-15 15:11:49,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=304166.8333333333, ans=0.125 2024-09-15 15:11:50,189 INFO [train.py:1198] (0/2) Epoch 17, batch 5100, loss[loss=0.2335, ctc_loss=0.1576, cr_loss=0.3793, over 21073.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1662, cr_loss=0.3852, over 4096403.23 frames. ], batch size: 53, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:12:05,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=304195.1666666667, ans=0.125 2024-09-15 15:12:17,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304195.1666666667, ans=0.1 2024-09-15 15:12:30,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=304223.5, ans=0.125 2024-09-15 15:12:51,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304280.1666666667, ans=0.1 2024-09-15 15:12:54,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=304280.1666666667, ans=0.125 2024-09-15 15:12:56,737 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-15 15:12:59,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304280.1666666667, ans=0.1 2024-09-15 15:13:06,609 INFO [train.py:1198] (0/2) Epoch 17, batch 5150, loss[loss=0.2401, ctc_loss=0.1649, cr_loss=0.3763, over 21016.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1659, cr_loss=0.3852, over 4099437.64 frames. ], batch size: 63, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:13:12,282 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.104e+02 2.325e+02 2.609e+02 4.325e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-15 15:13:20,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=304336.8333333333, ans=0.125 2024-09-15 15:13:20,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=22.5 2024-09-15 15:13:21,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=304336.8333333333, ans=0.035 2024-09-15 15:13:23,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=304336.8333333333, ans=0.125 2024-09-15 15:13:31,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=304336.8333333333, ans=0.125 2024-09-15 15:13:46,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=304365.1666666667, ans=0.125 2024-09-15 15:14:20,509 INFO [train.py:1198] (0/2) Epoch 17, batch 5200, loss[loss=0.2656, ctc_loss=0.1839, cr_loss=0.4089, over 19290.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1652, cr_loss=0.3843, over 4096719.93 frames. ], batch size: 90, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:14:55,407 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:15:05,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304535.1666666667, ans=0.1 2024-09-15 15:15:10,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=304535.1666666667, ans=15.0 2024-09-15 15:15:21,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=304563.5, ans=0.125 2024-09-15 15:15:34,859 INFO [train.py:1198] (0/2) Epoch 17, batch 5250, loss[loss=0.2275, ctc_loss=0.1578, cr_loss=0.3485, over 20973.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1645, cr_loss=0.3822, over 4096961.13 frames. ], batch size: 51, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:15:40,791 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.688e+02 2.000e+02 2.130e+02 2.333e+02 2.745e+02, threshold=4.260e+02, percent-clipped=0.0 2024-09-15 15:16:16,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=304648.5, ans=0.0 2024-09-15 15:16:37,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=304705.1666666667, ans=0.125 2024-09-15 15:16:38,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=304705.1666666667, ans=0.0 2024-09-15 15:16:41,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=304705.1666666667, ans=0.125 2024-09-15 15:16:46,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=304705.1666666667, ans=0.125 2024-09-15 15:16:48,826 INFO [train.py:1198] (0/2) Epoch 17, batch 5300, loss[loss=0.2287, ctc_loss=0.1541, cr_loss=0.373, over 21028.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1647, cr_loss=0.3821, over 4096934.19 frames. ], batch size: 62, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:17:12,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=304761.8333333333, ans=0.125 2024-09-15 15:17:25,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=304790.1666666667, ans=0.125 2024-09-15 15:18:05,676 INFO [train.py:1198] (0/2) Epoch 17, batch 5350, loss[loss=0.2302, ctc_loss=0.1554, cr_loss=0.3743, over 21076.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1655, cr_loss=0.3832, over 4094239.30 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 32.0 2024-09-15 15:18:11,486 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.104e+02 2.238e+02 2.500e+02 4.002e+02, threshold=4.476e+02, percent-clipped=0.0 2024-09-15 15:18:30,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304903.5, ans=0.1 2024-09-15 15:19:18,446 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:19:19,545 INFO [train.py:1198] (0/2) Epoch 17, batch 5400, loss[loss=0.2293, ctc_loss=0.1558, cr_loss=0.3674, over 20899.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1668, cr_loss=0.386, over 4091318.09 frames. ], batch size: 54, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:19:31,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=305016.8333333333, ans=0.2 2024-09-15 15:20:02,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2024-09-15 15:20:04,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305101.8333333333, ans=0.1 2024-09-15 15:20:23,396 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:20:23,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=305130.1666666667, ans=0.125 2024-09-15 15:20:33,466 INFO [train.py:1198] (0/2) Epoch 17, batch 5450, loss[loss=0.2371, ctc_loss=0.1625, cr_loss=0.3728, over 20743.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1659, cr_loss=0.385, over 4100089.47 frames. ], batch size: 71, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:20:39,476 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.064e+02 2.229e+02 2.388e+02 4.472e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-15 15:20:41,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=305158.5, ans=0.125 2024-09-15 15:20:45,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=305158.5, ans=0.0 2024-09-15 15:21:07,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=305215.1666666667, ans=0.125 2024-09-15 15:21:12,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=305215.1666666667, ans=0.125 2024-09-15 15:21:31,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=12.0 2024-09-15 15:21:34,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=305271.8333333333, ans=10.0 2024-09-15 15:21:47,593 INFO [train.py:1198] (0/2) Epoch 17, batch 5500, loss[loss=0.2273, ctc_loss=0.1561, cr_loss=0.3558, over 20773.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.166, cr_loss=0.3848, over 4099958.01 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:21:58,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=305300.1666666667, ans=0.025 2024-09-15 15:22:02,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=305328.5, ans=0.125 2024-09-15 15:22:09,086 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:22:20,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=305356.8333333333, ans=0.0 2024-09-15 15:22:26,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=305356.8333333333, ans=0.0 2024-09-15 15:22:40,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=305385.1666666667, ans=0.125 2024-09-15 15:22:40,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=305385.1666666667, ans=0.0 2024-09-15 15:22:41,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=305385.1666666667, ans=0.0 2024-09-15 15:23:03,722 INFO [train.py:1198] (0/2) Epoch 17, batch 5550, loss[loss=0.2567, ctc_loss=0.1786, cr_loss=0.3908, over 20985.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1663, cr_loss=0.3856, over 4098786.13 frames. ], batch size: 64, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:23:09,671 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 1.985e+02 2.152e+02 2.317e+02 7.450e+02, threshold=4.304e+02, percent-clipped=1.0 2024-09-15 15:23:35,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=305498.5, ans=0.025 2024-09-15 15:23:56,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=305526.8333333333, ans=0.0 2024-09-15 15:23:58,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-15 15:24:01,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=305555.1666666667, ans=0.0 2024-09-15 15:24:02,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=305555.1666666667, ans=0.025 2024-09-15 15:24:14,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=305555.1666666667, ans=0.0 2024-09-15 15:24:17,562 INFO [train.py:1198] (0/2) Epoch 17, batch 5600, loss[loss=0.2461, ctc_loss=0.1687, cr_loss=0.3872, over 20991.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1655, cr_loss=0.3843, over 4105946.03 frames. ], batch size: 55, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:24:54,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=305640.1666666667, ans=0.125 2024-09-15 15:25:15,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305696.8333333333, ans=0.1 2024-09-15 15:25:30,831 INFO [train.py:1198] (0/2) Epoch 17, batch 5650, loss[loss=0.244, ctc_loss=0.1681, cr_loss=0.3794, over 20896.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1655, cr_loss=0.3848, over 4102276.10 frames. ], batch size: 54, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:25:34,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=305725.1666666667, ans=0.125 2024-09-15 15:25:36,882 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.098e+02 2.192e+02 2.395e+02 3.412e+02, threshold=4.383e+02, percent-clipped=0.0 2024-09-15 15:25:37,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=305725.1666666667, ans=0.125 2024-09-15 15:26:18,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=305810.1666666667, ans=0.125 2024-09-15 15:26:18,551 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:26:26,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2024-09-15 15:26:45,018 INFO [train.py:1198] (0/2) Epoch 17, batch 5700, loss[loss=0.195, ctc_loss=0.1308, cr_loss=0.3213, over 20940.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1655, cr_loss=0.3855, over 4110395.28 frames. ], batch size: 51, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:27:32,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=305951.8333333333, ans=0.125 2024-09-15 15:27:33,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=305951.8333333333, ans=0.2 2024-09-15 15:27:37,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=305951.8333333333, ans=0.5 2024-09-15 15:27:55,697 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-108000.pt 2024-09-15 15:28:02,382 INFO [train.py:1198] (0/2) Epoch 17, batch 5750, loss[loss=0.2118, ctc_loss=0.1421, cr_loss=0.3485, over 20778.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1663, cr_loss=0.3863, over 4107614.61 frames. ], batch size: 53, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:28:08,165 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.087e+02 2.200e+02 2.353e+02 3.014e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-15 15:28:11,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=22.5 2024-09-15 15:28:32,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306065.1666666667, ans=0.1 2024-09-15 15:28:35,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=306065.1666666667, ans=0.0 2024-09-15 15:28:39,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=306065.1666666667, ans=0.09899494936611666 2024-09-15 15:28:42,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=306065.1666666667, ans=0.04949747468305833 2024-09-15 15:29:16,304 INFO [train.py:1198] (0/2) Epoch 17, batch 5800, loss[loss=0.2593, ctc_loss=0.1779, cr_loss=0.407, over 19389.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1666, cr_loss=0.3866, over 4098541.74 frames. ], batch size: 90, lr: 4.87e-03, grad_scale: 32.0 2024-09-15 15:29:56,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=306206.8333333333, ans=0.125 2024-09-15 15:29:57,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=306206.8333333333, ans=0.0 2024-09-15 15:30:22,236 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2024-09-15 15:30:30,304 INFO [train.py:1198] (0/2) Epoch 17, batch 5850, loss[loss=0.224, ctc_loss=0.1512, cr_loss=0.3637, over 20073.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1668, cr_loss=0.3872, over 4095131.97 frames. ], batch size: 44, lr: 4.86e-03, grad_scale: 64.0 2024-09-15 15:30:36,199 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.073e+02 2.235e+02 2.582e+02 3.439e+02, threshold=4.470e+02, percent-clipped=0.0 2024-09-15 15:30:37,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306291.8333333333, ans=0.1 2024-09-15 15:30:59,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=306320.1666666667, ans=0.035 2024-09-15 15:31:33,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=306405.1666666667, ans=0.07 2024-09-15 15:31:42,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=306405.1666666667, ans=0.2 2024-09-15 15:31:46,450 INFO [train.py:1198] (0/2) Epoch 17, batch 5900, loss[loss=0.2621, ctc_loss=0.1839, cr_loss=0.3912, over 19486.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1663, cr_loss=0.3865, over 4098613.64 frames. ], batch size: 90, lr: 4.86e-03, grad_scale: 64.0 2024-09-15 15:31:47,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=22.5 2024-09-15 15:31:52,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=306433.5, ans=0.0 2024-09-15 15:32:24,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=306490.1666666667, ans=0.125 2024-09-15 15:32:24,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-15 15:32:42,253 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2024-09-15 15:33:00,701 INFO [train.py:1198] (0/2) Epoch 17, batch 5950, loss[loss=0.2001, ctc_loss=0.1337, cr_loss=0.3322, over 20946.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1667, cr_loss=0.3866, over 4091941.11 frames. ], batch size: 50, lr: 4.86e-03, grad_scale: 64.0 2024-09-15 15:33:05,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306575.1666666667, ans=0.1 2024-09-15 15:33:06,495 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.030e+02 2.193e+02 2.359e+02 3.386e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 15:33:48,356 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:33:58,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=306688.5, ans=0.125 2024-09-15 15:34:14,898 INFO [train.py:1198] (0/2) Epoch 17, batch 6000, loss[loss=0.2571, ctc_loss=0.1773, cr_loss=0.3986, over 21055.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1671, cr_loss=0.3866, over 4069342.71 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 32.0 2024-09-15 15:34:14,899 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 15:34:35,275 INFO [train.py:1230] (0/2) Epoch 17, validation: loss=0.0454, ctc_loss=0.0454, cr_loss=1.047e-14, over 944034.00 frames. 2024-09-15 15:34:35,276 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 15:34:38,767 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-15 15:35:32,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306801.8333333333, ans=0.1 2024-09-15 15:35:51,720 INFO [train.py:1198] (0/2) Epoch 17, batch 6050, loss[loss=0.2392, ctc_loss=0.1644, cr_loss=0.374, over 20683.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.167, cr_loss=0.3854, over 4054693.00 frames. ], batch size: 71, lr: 4.86e-03, grad_scale: 32.0 2024-09-15 15:35:59,137 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.702e+02 2.077e+02 2.264e+02 2.512e+02 4.569e+02, threshold=4.528e+02, percent-clipped=1.0 2024-09-15 15:36:02,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=306858.5, ans=0.0 2024-09-15 15:36:05,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=306886.8333333333, ans=0.0 2024-09-15 15:36:49,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=306971.8333333333, ans=0.125 2024-09-15 15:36:57,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2024-09-15 15:37:05,858 INFO [train.py:1198] (0/2) Epoch 17, batch 6100, loss[loss=0.2628, ctc_loss=0.1814, cr_loss=0.4069, over 20858.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1682, cr_loss=0.3868, over 4053097.98 frames. ], batch size: 65, lr: 4.86e-03, grad_scale: 32.0 2024-09-15 15:37:12,081 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 15:37:16,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=307000.1666666667, ans=10.0 2024-09-15 15:37:26,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=307028.5, ans=0.125 2024-09-15 15:37:59,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=307085.1666666667, ans=0.025 2024-09-15 15:38:20,265 INFO [train.py:1198] (0/2) Epoch 17, batch 6150, loss[loss=0.1901, ctc_loss=0.1233, cr_loss=0.334, over 20977.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1682, cr_loss=0.3877, over 4058936.16 frames. ], batch size: 52, lr: 4.86e-03, grad_scale: 32.0 2024-09-15 15:38:20,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307141.8333333333, ans=0.1 2024-09-15 15:38:22,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-09-15 15:38:27,361 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.113e+02 2.324e+02 2.544e+02 3.191e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-15 15:38:30,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=307141.8333333333, ans=0.125 2024-09-15 15:39:31,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=307255.1666666667, ans=0.2 2024-09-15 15:39:34,450 INFO [train.py:1198] (0/2) Epoch 17, batch 6200, loss[loss=0.216, ctc_loss=0.1489, cr_loss=0.3359, over 19983.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1685, cr_loss=0.3877, over 4053752.79 frames. ], batch size: 44, lr: 4.86e-03, grad_scale: 32.0 2024-09-15 15:40:00,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=307311.8333333333, ans=0.0 2024-09-15 15:40:31,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=307368.5, ans=0.2 2024-09-15 15:40:47,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=307425.1666666667, ans=0.07 2024-09-15 15:40:48,460 INFO [train.py:1198] (0/2) Epoch 17, batch 6250, loss[loss=0.1924, ctc_loss=0.1324, cr_loss=0.2999, over 20028.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1686, cr_loss=0.3874, over 4038135.53 frames. ], batch size: 44, lr: 4.86e-03, grad_scale: 16.0 2024-09-15 15:40:53,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=22.5 2024-09-15 15:40:57,547 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.078e+02 2.212e+02 2.395e+02 3.360e+02, threshold=4.424e+02, percent-clipped=0.0 2024-09-15 15:41:28,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=307481.8333333333, ans=0.125 2024-09-15 15:41:30,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=307481.8333333333, ans=10.0 2024-09-15 15:41:57,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=307538.5, ans=0.125 2024-09-15 15:42:01,393 INFO [train.py:1198] (0/2) Epoch 17, batch 6300, loss[loss=0.2876, ctc_loss=0.2092, cr_loss=0.392, over 13761.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1691, cr_loss=0.3857, over 3988769.00 frames. ], batch size: 150, lr: 4.85e-03, grad_scale: 16.0 2024-09-15 15:42:30,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=307623.5, ans=0.125 2024-09-15 15:43:01,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=307680.1666666667, ans=0.125 2024-09-15 15:43:06,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.14 vs. limit=5.0 2024-09-15 15:43:08,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=307680.1666666667, ans=0.2 2024-09-15 15:43:13,996 INFO [train.py:1198] (0/2) Epoch 17, batch 6350, loss[loss=0.2827, ctc_loss=0.2057, cr_loss=0.3852, over 14692.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1702, cr_loss=0.3843, over 3882193.17 frames. ], batch size: 149, lr: 4.85e-03, grad_scale: 16.0 2024-09-15 15:43:22,977 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.182e+02 2.341e+02 2.627e+02 4.942e+02, threshold=4.681e+02, percent-clipped=1.0 2024-09-15 15:43:24,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307708.5, ans=0.1 2024-09-15 15:43:41,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=307765.1666666667, ans=0.125 2024-09-15 15:43:46,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=307765.1666666667, ans=0.125 2024-09-15 15:43:47,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=307765.1666666667, ans=0.125 2024-09-15 15:44:05,949 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-15 15:44:12,093 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-17.pt 2024-09-15 15:44:57,828 INFO [train.py:1198] (0/2) Epoch 18, batch 0, loss[loss=0.2365, ctc_loss=0.1613, cr_loss=0.3763, over 20982.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1613, cr_loss=0.3763, over 20982.00 frames. ], batch size: 58, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:44:57,829 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 15:45:16,134 INFO [train.py:1230] (0/2) Epoch 18, validation: loss=0.04502, ctc_loss=0.04502, cr_loss=1.051e-14, over 944034.00 frames. 2024-09-15 15:45:16,135 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 15:45:22,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=307824.6666666667, ans=0.125 2024-09-15 15:45:25,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=307824.6666666667, ans=0.0 2024-09-15 15:46:01,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=307881.3333333333, ans=0.0 2024-09-15 15:46:11,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.15 vs. limit=10.0 2024-09-15 15:46:18,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=307909.6666666667, ans=0.125 2024-09-15 15:46:31,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=307938.0, ans=0.0 2024-09-15 15:46:37,543 INFO [train.py:1198] (0/2) Epoch 18, batch 50, loss[loss=0.2045, ctc_loss=0.1363, cr_loss=0.341, over 21068.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1664, cr_loss=0.3881, over 921306.64 frames. ], batch size: 53, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:46:48,863 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.88 vs. limit=15.0 2024-09-15 15:46:59,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.704e+02 2.121e+02 2.402e+02 2.678e+02 4.542e+02, threshold=4.805e+02, percent-clipped=0.0 2024-09-15 15:47:00,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=307994.6666666667, ans=0.2 2024-09-15 15:47:07,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=308023.0, ans=0.125 2024-09-15 15:47:52,394 INFO [train.py:1198] (0/2) Epoch 18, batch 100, loss[loss=0.1988, ctc_loss=0.1311, cr_loss=0.3387, over 21082.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1675, cr_loss=0.3901, over 1635132.78 frames. ], batch size: 53, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:47:57,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=308108.0, ans=0.5 2024-09-15 15:48:23,454 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2024-09-15 15:48:28,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=308164.6666666667, ans=0.0 2024-09-15 15:48:37,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=308193.0, ans=0.2 2024-09-15 15:49:06,919 INFO [train.py:1198] (0/2) Epoch 18, batch 150, loss[loss=0.2655, ctc_loss=0.183, cr_loss=0.4124, over 20067.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1694, cr_loss=0.3906, over 2158351.24 frames. ], batch size: 80, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:49:25,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=308278.0, ans=0.125 2024-09-15 15:49:29,670 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.042e+02 2.190e+02 2.337e+02 5.128e+02, threshold=4.380e+02, percent-clipped=1.0 2024-09-15 15:49:34,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=308278.0, ans=0.125 2024-09-15 15:49:48,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=308306.3333333333, ans=0.2 2024-09-15 15:50:00,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-15 15:50:06,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.09 vs. limit=22.5 2024-09-15 15:50:08,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308363.0, ans=0.1 2024-09-15 15:50:21,906 INFO [train.py:1198] (0/2) Epoch 18, batch 200, loss[loss=0.2359, ctc_loss=0.1572, cr_loss=0.3932, over 19833.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1665, cr_loss=0.3861, over 2584090.55 frames. ], batch size: 44, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:51:05,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=308476.3333333333, ans=0.05 2024-09-15 15:51:11,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=308476.3333333333, ans=0.125 2024-09-15 15:51:15,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=308476.3333333333, ans=0.125 2024-09-15 15:51:33,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=22.5 2024-09-15 15:51:39,665 INFO [train.py:1198] (0/2) Epoch 18, batch 250, loss[loss=0.2478, ctc_loss=0.1702, cr_loss=0.3883, over 21014.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1662, cr_loss=0.3859, over 2931240.06 frames. ], batch size: 63, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:51:42,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=8.0 2024-09-15 15:52:04,957 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.696e+02 2.067e+02 2.174e+02 2.368e+02 3.453e+02, threshold=4.349e+02, percent-clipped=0.0 2024-09-15 15:52:08,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=308561.3333333333, ans=0.95 2024-09-15 15:52:35,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=308618.0, ans=0.125 2024-09-15 15:52:49,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=308646.3333333333, ans=0.2 2024-09-15 15:52:57,535 INFO [train.py:1198] (0/2) Epoch 18, batch 300, loss[loss=0.2933, ctc_loss=0.2086, cr_loss=0.4235, over 18169.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1666, cr_loss=0.386, over 3197914.77 frames. ], batch size: 108, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:53:05,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=308674.6666666667, ans=0.125 2024-09-15 15:53:30,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=308731.3333333333, ans=0.025 2024-09-15 15:54:12,482 INFO [train.py:1198] (0/2) Epoch 18, batch 350, loss[loss=0.2419, ctc_loss=0.165, cr_loss=0.3845, over 20933.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1665, cr_loss=0.3862, over 3398509.37 frames. ], batch size: 60, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:54:26,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=308844.6666666667, ans=0.0 2024-09-15 15:54:35,067 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.073e+02 2.207e+02 2.411e+02 3.269e+02, threshold=4.415e+02, percent-clipped=0.0 2024-09-15 15:54:45,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=308873.0, ans=0.2 2024-09-15 15:55:24,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=308929.6666666667, ans=0.125 2024-09-15 15:55:24,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=308929.6666666667, ans=0.2 2024-09-15 15:55:26,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-15 15:55:27,586 INFO [train.py:1198] (0/2) Epoch 18, batch 400, loss[loss=0.2489, ctc_loss=0.1682, cr_loss=0.4035, over 20920.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1651, cr_loss=0.3843, over 3562385.13 frames. ], batch size: 60, lr: 4.71e-03, grad_scale: 32.0 2024-09-15 15:55:32,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=308958.0, ans=0.125 2024-09-15 15:55:57,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=309014.6666666667, ans=0.0 2024-09-15 15:55:59,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=12.0 2024-09-15 15:56:08,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=22.5 2024-09-15 15:56:42,493 INFO [train.py:1198] (0/2) Epoch 18, batch 450, loss[loss=0.2661, ctc_loss=0.1832, cr_loss=0.4146, over 20969.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1665, cr_loss=0.3864, over 3670049.24 frames. ], batch size: 64, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 15:56:47,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-09-15 15:56:50,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.80 vs. limit=10.0 2024-09-15 15:57:00,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=309128.0, ans=0.025 2024-09-15 15:57:01,208 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.87 vs. limit=6.0 2024-09-15 15:57:04,836 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.013e+02 2.139e+02 2.345e+02 2.699e+02, threshold=4.277e+02, percent-clipped=0.0 2024-09-15 15:57:35,295 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-15 15:57:58,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=309213.0, ans=0.125 2024-09-15 15:58:03,045 INFO [train.py:1198] (0/2) Epoch 18, batch 500, loss[loss=0.2447, ctc_loss=0.1652, cr_loss=0.3975, over 20649.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1656, cr_loss=0.3849, over 3759645.31 frames. ], batch size: 68, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 15:58:25,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=309269.6666666667, ans=0.0 2024-09-15 15:58:34,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=309298.0, ans=0.2 2024-09-15 15:58:38,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=309298.0, ans=0.125 2024-09-15 15:58:44,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=15.0 2024-09-15 15:58:51,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=309326.3333333333, ans=0.125 2024-09-15 15:58:59,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.78 vs. limit=10.0 2024-09-15 15:59:09,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309354.6666666667, ans=0.1 2024-09-15 15:59:18,605 INFO [train.py:1198] (0/2) Epoch 18, batch 550, loss[loss=0.2554, ctc_loss=0.1755, cr_loss=0.3997, over 20704.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1656, cr_loss=0.3849, over 3839553.77 frames. ], batch size: 71, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 15:59:30,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2024-09-15 15:59:41,171 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.062e+02 2.174e+02 2.381e+02 3.894e+02, threshold=4.348e+02, percent-clipped=0.0 2024-09-15 16:00:23,919 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-15 16:00:33,721 INFO [train.py:1198] (0/2) Epoch 18, batch 600, loss[loss=0.2119, ctc_loss=0.1411, cr_loss=0.3538, over 20990.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1659, cr_loss=0.3856, over 3894669.06 frames. ], batch size: 52, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:01:05,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=309581.3333333333, ans=0.0 2024-09-15 16:01:47,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=309666.3333333333, ans=0.125 2024-09-15 16:01:48,733 INFO [train.py:1198] (0/2) Epoch 18, batch 650, loss[loss=0.2604, ctc_loss=0.1841, cr_loss=0.3818, over 20542.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1653, cr_loss=0.3845, over 3931862.64 frames. ], batch size: 75, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:02:10,783 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.077e+02 2.206e+02 2.355e+02 2.917e+02, threshold=4.411e+02, percent-clipped=0.0 2024-09-15 16:02:20,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=309723.0, ans=0.0 2024-09-15 16:02:22,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=309723.0, ans=0.125 2024-09-15 16:02:29,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=309723.0, ans=0.025 2024-09-15 16:02:32,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=12.0 2024-09-15 16:02:45,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=309751.3333333333, ans=0.2 2024-09-15 16:03:03,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=309779.6666666667, ans=0.125 2024-09-15 16:03:05,974 INFO [train.py:1198] (0/2) Epoch 18, batch 700, loss[loss=0.2385, ctc_loss=0.1607, cr_loss=0.389, over 21029.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1655, cr_loss=0.3853, over 3974506.14 frames. ], batch size: 62, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:03:25,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=309836.3333333333, ans=0.125 2024-09-15 16:03:55,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309893.0, ans=0.1 2024-09-15 16:04:10,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=309921.3333333333, ans=0.125 2024-09-15 16:04:23,082 INFO [train.py:1198] (0/2) Epoch 18, batch 750, loss[loss=0.2774, ctc_loss=0.1904, cr_loss=0.4352, over 20716.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1659, cr_loss=0.386, over 3981360.18 frames. ], batch size: 71, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:04:39,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-15 16:04:45,824 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.063e+02 2.207e+02 2.386e+02 3.897e+02, threshold=4.413e+02, percent-clipped=0.0 2024-09-15 16:04:56,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=310006.3333333333, ans=0.2 2024-09-15 16:05:05,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=310006.3333333333, ans=0.025 2024-09-15 16:05:10,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=12.0 2024-09-15 16:05:16,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=310034.6666666667, ans=0.0 2024-09-15 16:05:38,503 INFO [train.py:1198] (0/2) Epoch 18, batch 800, loss[loss=0.2294, ctc_loss=0.153, cr_loss=0.3823, over 20777.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1655, cr_loss=0.3856, over 4019365.35 frames. ], batch size: 56, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:05:49,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2024-09-15 16:06:11,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=310148.0, ans=0.125 2024-09-15 16:06:49,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-09-15 16:06:53,618 INFO [train.py:1198] (0/2) Epoch 18, batch 850, loss[loss=0.2195, ctc_loss=0.1482, cr_loss=0.3566, over 20891.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1661, cr_loss=0.3862, over 4037949.88 frames. ], batch size: 54, lr: 4.70e-03, grad_scale: 32.0 2024-09-15 16:07:16,150 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.068e+02 2.198e+02 2.441e+02 5.210e+02, threshold=4.396e+02, percent-clipped=2.0 2024-09-15 16:08:08,623 INFO [train.py:1198] (0/2) Epoch 18, batch 900, loss[loss=0.2467, ctc_loss=0.1691, cr_loss=0.3878, over 20971.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1661, cr_loss=0.3868, over 4050775.28 frames. ], batch size: 58, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:08:11,179 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.88 vs. limit=10.0 2024-09-15 16:08:20,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=310374.6666666667, ans=0.04949747468305833 2024-09-15 16:08:33,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=310403.0, ans=0.1 2024-09-15 16:08:34,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=310403.0, ans=0.1 2024-09-15 16:08:44,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=15.0 2024-09-15 16:08:46,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=310431.3333333333, ans=0.05 2024-09-15 16:08:55,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=310431.3333333333, ans=0.125 2024-09-15 16:09:29,113 INFO [train.py:1198] (0/2) Epoch 18, batch 950, loss[loss=0.2215, ctc_loss=0.1515, cr_loss=0.3497, over 20936.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1666, cr_loss=0.387, over 4051489.63 frames. ], batch size: 60, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:09:50,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=310544.6666666667, ans=15.0 2024-09-15 16:09:53,066 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.094e+02 2.208e+02 2.373e+02 3.927e+02, threshold=4.417e+02, percent-clipped=0.0 2024-09-15 16:10:01,308 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2024-09-15 16:10:02,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=310573.0, ans=0.2 2024-09-15 16:10:26,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=310601.3333333333, ans=0.1 2024-09-15 16:10:44,163 INFO [train.py:1198] (0/2) Epoch 18, batch 1000, loss[loss=0.2973, ctc_loss=0.2108, cr_loss=0.4329, over 14699.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1665, cr_loss=0.3871, over 4048438.82 frames. ], batch size: 150, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:10:52,729 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2024-09-15 16:11:22,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.16 vs. limit=22.5 2024-09-15 16:11:32,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=310743.0, ans=0.125 2024-09-15 16:11:48,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=310771.3333333333, ans=0.2 2024-09-15 16:11:58,492 INFO [train.py:1198] (0/2) Epoch 18, batch 1050, loss[loss=0.269, ctc_loss=0.1876, cr_loss=0.4071, over 19377.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1666, cr_loss=0.3865, over 4038315.33 frames. ], batch size: 90, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:12:12,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=310828.0, ans=0.125 2024-09-15 16:12:22,481 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.080e+02 2.212e+02 2.362e+02 3.415e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-15 16:12:30,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=310856.3333333333, ans=0.125 2024-09-15 16:13:01,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-15 16:13:14,277 INFO [train.py:1198] (0/2) Epoch 18, batch 1100, loss[loss=0.213, ctc_loss=0.1434, cr_loss=0.3478, over 20282.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1673, cr_loss=0.3871, over 4049074.13 frames. ], batch size: 45, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:13:43,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=12.0 2024-09-15 16:14:03,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=311026.3333333333, ans=0.2 2024-09-15 16:14:14,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=311026.3333333333, ans=0.0 2024-09-15 16:14:35,167 INFO [train.py:1198] (0/2) Epoch 18, batch 1150, loss[loss=0.2242, ctc_loss=0.1497, cr_loss=0.3726, over 21078.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1665, cr_loss=0.3862, over 4067494.13 frames. ], batch size: 59, lr: 4.69e-03, grad_scale: 16.0 2024-09-15 16:14:59,153 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.724e+02 2.051e+02 2.187e+02 2.330e+02 3.028e+02, threshold=4.375e+02, percent-clipped=0.0 2024-09-15 16:15:50,840 INFO [train.py:1198] (0/2) Epoch 18, batch 1200, loss[loss=0.2598, ctc_loss=0.1795, cr_loss=0.4015, over 18387.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.165, cr_loss=0.3837, over 4074088.48 frames. ], batch size: 108, lr: 4.69e-03, grad_scale: 32.0 2024-09-15 16:15:57,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=311224.6666666667, ans=0.125 2024-09-15 16:16:08,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=311253.0, ans=0.2 2024-09-15 16:17:06,276 INFO [train.py:1198] (0/2) Epoch 18, batch 1250, loss[loss=0.2304, ctc_loss=0.158, cr_loss=0.3619, over 21051.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.165, cr_loss=0.3839, over 4078956.59 frames. ], batch size: 62, lr: 4.69e-03, grad_scale: 32.0 2024-09-15 16:17:26,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=311394.6666666667, ans=0.2 2024-09-15 16:17:30,507 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.106e+02 2.222e+02 2.491e+02 4.661e+02, threshold=4.444e+02, percent-clipped=1.0 2024-09-15 16:17:45,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=311423.0, ans=0.125 2024-09-15 16:17:47,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=311423.0, ans=0.0 2024-09-15 16:17:49,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-15 16:18:21,539 INFO [train.py:1198] (0/2) Epoch 18, batch 1300, loss[loss=0.2258, ctc_loss=0.1534, cr_loss=0.3621, over 19860.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1656, cr_loss=0.385, over 4082783.81 frames. ], batch size: 44, lr: 4.69e-03, grad_scale: 32.0 2024-09-15 16:18:41,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=311536.3333333333, ans=0.5 2024-09-15 16:19:36,699 INFO [train.py:1198] (0/2) Epoch 18, batch 1350, loss[loss=0.2521, ctc_loss=0.1737, cr_loss=0.3922, over 20337.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1654, cr_loss=0.3838, over 4076753.25 frames. ], batch size: 74, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:20:03,699 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.043e+02 2.226e+02 2.470e+02 3.858e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 16:20:42,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=311763.0, ans=0.07 2024-09-15 16:20:58,327 INFO [train.py:1198] (0/2) Epoch 18, batch 1400, loss[loss=0.2864, ctc_loss=0.2004, cr_loss=0.4298, over 18639.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1659, cr_loss=0.3839, over 4062549.19 frames. ], batch size: 108, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:22:02,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-09-15 16:22:13,526 INFO [train.py:1198] (0/2) Epoch 18, batch 1450, loss[loss=0.2374, ctc_loss=0.1628, cr_loss=0.3733, over 21069.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1652, cr_loss=0.3836, over 4076712.44 frames. ], batch size: 59, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:22:15,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=22.5 2024-09-15 16:22:19,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=311933.0, ans=0.125 2024-09-15 16:22:21,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=311933.0, ans=0.1 2024-09-15 16:22:37,793 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.678e+02 2.072e+02 2.254e+02 2.499e+02 3.931e+02, threshold=4.508e+02, percent-clipped=0.0 2024-09-15 16:23:11,258 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=22.5 2024-09-15 16:23:28,487 INFO [train.py:1198] (0/2) Epoch 18, batch 1500, loss[loss=0.2452, ctc_loss=0.1645, cr_loss=0.4037, over 20959.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1666, cr_loss=0.3856, over 4069658.60 frames. ], batch size: 55, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:23:28,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=312074.6666666667, ans=0.1 2024-09-15 16:23:39,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2024-09-15 16:23:40,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=312074.6666666667, ans=0.125 2024-09-15 16:23:41,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2024-09-15 16:23:43,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2024-09-15 16:23:48,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=312103.0, ans=0.125 2024-09-15 16:23:55,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2024-09-15 16:24:17,904 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.99 vs. limit=12.0 2024-09-15 16:24:21,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=312159.6666666667, ans=0.0 2024-09-15 16:24:27,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=312188.0, ans=0.2 2024-09-15 16:24:29,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=312188.0, ans=0.0 2024-09-15 16:24:43,767 INFO [train.py:1198] (0/2) Epoch 18, batch 1550, loss[loss=0.2652, ctc_loss=0.1842, cr_loss=0.4052, over 19938.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1663, cr_loss=0.3855, over 4077408.88 frames. ], batch size: 80, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:24:51,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=312216.3333333333, ans=0.125 2024-09-15 16:25:03,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=312244.6666666667, ans=0.125 2024-09-15 16:25:07,415 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.060e+02 2.181e+02 2.357e+02 4.136e+02, threshold=4.362e+02, percent-clipped=0.0 2024-09-15 16:25:21,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=312273.0, ans=0.2 2024-09-15 16:25:50,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=22.5 2024-09-15 16:26:01,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=312329.6666666667, ans=0.09899494936611666 2024-09-15 16:26:04,473 INFO [train.py:1198] (0/2) Epoch 18, batch 1600, loss[loss=0.2447, ctc_loss=0.1689, cr_loss=0.3787, over 20811.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1662, cr_loss=0.3848, over 4079139.57 frames. ], batch size: 65, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:26:15,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=312358.0, ans=0.04949747468305833 2024-09-15 16:26:24,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=312386.3333333333, ans=0.2 2024-09-15 16:26:25,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=312386.3333333333, ans=0.0 2024-09-15 16:26:37,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=312414.6666666667, ans=0.0 2024-09-15 16:26:48,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=22.5 2024-09-15 16:27:19,985 INFO [train.py:1198] (0/2) Epoch 18, batch 1650, loss[loss=0.2292, ctc_loss=0.1557, cr_loss=0.3672, over 21033.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1654, cr_loss=0.3847, over 4094235.30 frames. ], batch size: 62, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:27:27,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=312499.6666666667, ans=0.125 2024-09-15 16:27:43,815 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.724e+02 2.047e+02 2.189e+02 2.321e+02 6.898e+02, threshold=4.378e+02, percent-clipped=1.0 2024-09-15 16:27:56,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=312556.3333333333, ans=0.035 2024-09-15 16:28:17,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=312584.6666666667, ans=0.125 2024-09-15 16:28:28,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=312613.0, ans=0.2 2024-09-15 16:28:35,064 INFO [train.py:1198] (0/2) Epoch 18, batch 1700, loss[loss=0.2348, ctc_loss=0.1562, cr_loss=0.3933, over 20759.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1648, cr_loss=0.3844, over 4097656.43 frames. ], batch size: 56, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:28:35,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=312641.3333333333, ans=0.2 2024-09-15 16:28:50,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=312669.6666666667, ans=0.125 2024-09-15 16:28:50,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=312669.6666666667, ans=0.035 2024-09-15 16:29:17,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312698.0, ans=0.1 2024-09-15 16:29:41,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=312754.6666666667, ans=0.125 2024-09-15 16:29:50,088 INFO [train.py:1198] (0/2) Epoch 18, batch 1750, loss[loss=0.2424, ctc_loss=0.1676, cr_loss=0.3739, over 21033.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1649, cr_loss=0.3843, over 4101349.34 frames. ], batch size: 61, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:30:02,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=312783.0, ans=0.0 2024-09-15 16:30:14,273 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.047e+02 2.172e+02 2.427e+02 3.153e+02, threshold=4.345e+02, percent-clipped=0.0 2024-09-15 16:30:24,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=312839.6666666667, ans=0.125 2024-09-15 16:30:26,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=312839.6666666667, ans=0.0 2024-09-15 16:30:28,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=312839.6666666667, ans=0.125 2024-09-15 16:30:46,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312868.0, ans=0.1 2024-09-15 16:30:52,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=312896.3333333333, ans=0.125 2024-09-15 16:30:55,232 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 16:31:05,122 INFO [train.py:1198] (0/2) Epoch 18, batch 1800, loss[loss=0.2094, ctc_loss=0.1403, cr_loss=0.3459, over 20946.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1649, cr_loss=0.3843, over 4106094.69 frames. ], batch size: 49, lr: 4.68e-03, grad_scale: 32.0 2024-09-15 16:31:26,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=312953.0, ans=0.0 2024-09-15 16:31:31,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=312953.0, ans=0.2 2024-09-15 16:31:46,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=312981.3333333333, ans=0.0 2024-09-15 16:32:19,819 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 16:32:25,480 INFO [train.py:1198] (0/2) Epoch 18, batch 1850, loss[loss=0.2418, ctc_loss=0.1642, cr_loss=0.3881, over 20771.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1651, cr_loss=0.3847, over 4101504.24 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 16.0 2024-09-15 16:32:36,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2024-09-15 16:32:50,907 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.044e+02 2.184e+02 2.333e+02 4.056e+02, threshold=4.367e+02, percent-clipped=0.0 2024-09-15 16:32:51,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=313094.6666666667, ans=0.0 2024-09-15 16:32:51,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=313094.6666666667, ans=0.125 2024-09-15 16:32:57,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313123.0, ans=0.1 2024-09-15 16:33:06,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=313123.0, ans=0.125 2024-09-15 16:33:18,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=313151.3333333333, ans=0.0 2024-09-15 16:33:26,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313179.6666666667, ans=0.1 2024-09-15 16:33:35,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-09-15 16:33:40,757 INFO [train.py:1198] (0/2) Epoch 18, batch 1900, loss[loss=0.2431, ctc_loss=0.1659, cr_loss=0.3856, over 20837.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1653, cr_loss=0.3849, over 4094661.82 frames. ], batch size: 65, lr: 4.67e-03, grad_scale: 16.0 2024-09-15 16:34:03,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=313236.3333333333, ans=0.025 2024-09-15 16:34:19,042 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=12.0 2024-09-15 16:34:35,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=313293.0, ans=0.125 2024-09-15 16:34:41,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=313321.3333333333, ans=0.0 2024-09-15 16:34:55,950 INFO [train.py:1198] (0/2) Epoch 18, batch 1950, loss[loss=0.253, ctc_loss=0.1759, cr_loss=0.3855, over 21023.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1656, cr_loss=0.3848, over 4088421.47 frames. ], batch size: 63, lr: 4.67e-03, grad_scale: 16.0 2024-09-15 16:35:18,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=313378.0, ans=0.025 2024-09-15 16:35:21,625 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.121e+02 2.301e+02 2.467e+02 3.179e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-15 16:35:44,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=313434.6666666667, ans=0.125 2024-09-15 16:35:49,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=313434.6666666667, ans=0.5 2024-09-15 16:36:11,248 INFO [train.py:1198] (0/2) Epoch 18, batch 2000, loss[loss=0.225, ctc_loss=0.1518, cr_loss=0.3662, over 20939.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1645, cr_loss=0.3833, over 4095588.54 frames. ], batch size: 50, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:36:19,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=313491.3333333333, ans=0.125 2024-09-15 16:36:45,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-15 16:37:07,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=313576.3333333333, ans=0.125 2024-09-15 16:37:31,714 INFO [train.py:1198] (0/2) Epoch 18, batch 2050, loss[loss=0.3277, ctc_loss=0.2378, cr_loss=0.4494, over 14241.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1659, cr_loss=0.3855, over 4097436.75 frames. ], batch size: 150, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:37:39,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=313633.0, ans=0.0 2024-09-15 16:37:57,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.080e+02 2.224e+02 2.431e+02 3.023e+02, threshold=4.448e+02, percent-clipped=0.0 2024-09-15 16:38:05,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-15 16:38:29,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=313718.0, ans=0.0 2024-09-15 16:38:47,196 INFO [train.py:1198] (0/2) Epoch 18, batch 2100, loss[loss=0.2191, ctc_loss=0.1467, cr_loss=0.3618, over 20945.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1643, cr_loss=0.3832, over 4112379.98 frames. ], batch size: 50, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:38:58,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=313774.6666666667, ans=0.125 2024-09-15 16:39:40,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=313859.6666666667, ans=0.2 2024-09-15 16:40:02,536 INFO [train.py:1198] (0/2) Epoch 18, batch 2150, loss[loss=0.207, ctc_loss=0.1424, cr_loss=0.323, over 20938.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1642, cr_loss=0.3826, over 4106346.24 frames. ], batch size: 49, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:40:18,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=313944.6666666667, ans=0.125 2024-09-15 16:40:28,255 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.075e+02 2.196e+02 2.387e+02 4.503e+02, threshold=4.393e+02, percent-clipped=1.0 2024-09-15 16:40:31,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=313973.0, ans=0.0 2024-09-15 16:40:46,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=314001.3333333333, ans=0.2 2024-09-15 16:41:17,738 INFO [train.py:1198] (0/2) Epoch 18, batch 2200, loss[loss=0.2661, ctc_loss=0.1864, cr_loss=0.3983, over 20674.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1641, cr_loss=0.3827, over 4111329.22 frames. ], batch size: 71, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:41:27,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=314058.0, ans=0.04949747468305833 2024-09-15 16:41:39,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=314086.3333333333, ans=0.07 2024-09-15 16:41:42,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314086.3333333333, ans=0.1 2024-09-15 16:42:33,356 INFO [train.py:1198] (0/2) Epoch 18, batch 2250, loss[loss=0.1921, ctc_loss=0.1279, cr_loss=0.3211, over 20951.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.164, cr_loss=0.3829, over 4120326.91 frames. ], batch size: 50, lr: 4.67e-03, grad_scale: 32.0 2024-09-15 16:42:33,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=314199.6666666667, ans=0.0 2024-09-15 16:42:42,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=314199.6666666667, ans=0.125 2024-09-15 16:42:56,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=314228.0, ans=0.125 2024-09-15 16:43:01,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.087e+02 2.211e+02 2.436e+02 5.064e+02, threshold=4.422e+02, percent-clipped=1.0 2024-09-15 16:43:15,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.25 vs. limit=10.0 2024-09-15 16:43:39,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2024-09-15 16:43:54,108 INFO [train.py:1198] (0/2) Epoch 18, batch 2300, loss[loss=0.2072, ctc_loss=0.139, cr_loss=0.341, over 21059.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1647, cr_loss=0.3842, over 4097342.57 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:44:23,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=22.5 2024-09-15 16:44:32,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.09 vs. limit=15.0 2024-09-15 16:45:09,216 INFO [train.py:1198] (0/2) Epoch 18, batch 2350, loss[loss=0.2252, ctc_loss=0.1534, cr_loss=0.3591, over 21046.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1643, cr_loss=0.3832, over 4088989.28 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:45:35,112 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.108e+02 2.281e+02 2.436e+02 3.272e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-15 16:45:53,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314568.0, ans=0.1 2024-09-15 16:45:57,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=314568.0, ans=0.0 2024-09-15 16:46:24,678 INFO [train.py:1198] (0/2) Epoch 18, batch 2400, loss[loss=0.2519, ctc_loss=0.1734, cr_loss=0.3929, over 21068.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1643, cr_loss=0.3832, over 4090843.58 frames. ], batch size: 62, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:46:37,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=314624.6666666667, ans=0.125 2024-09-15 16:46:47,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=314653.0, ans=0.04949747468305833 2024-09-15 16:46:50,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=314653.0, ans=0.025 2024-09-15 16:47:39,954 INFO [train.py:1198] (0/2) Epoch 18, batch 2450, loss[loss=0.266, ctc_loss=0.1816, cr_loss=0.4222, over 19638.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1643, cr_loss=0.3833, over 4098560.54 frames. ], batch size: 90, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:47:41,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=314766.3333333333, ans=0.025 2024-09-15 16:48:00,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=314794.6666666667, ans=0.05 2024-09-15 16:48:05,798 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.686e+02 2.060e+02 2.224e+02 2.494e+02 4.847e+02, threshold=4.449e+02, percent-clipped=1.0 2024-09-15 16:49:01,454 INFO [train.py:1198] (0/2) Epoch 18, batch 2500, loss[loss=0.2326, ctc_loss=0.1538, cr_loss=0.3941, over 21006.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1631, cr_loss=0.3818, over 4113735.46 frames. ], batch size: 48, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:49:39,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=314964.6666666667, ans=0.125 2024-09-15 16:49:52,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=314993.0, ans=0.0 2024-09-15 16:50:16,193 INFO [train.py:1198] (0/2) Epoch 18, batch 2550, loss[loss=0.2698, ctc_loss=0.1875, cr_loss=0.4114, over 20060.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1638, cr_loss=0.3836, over 4117877.97 frames. ], batch size: 80, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:50:27,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=315049.6666666667, ans=0.05 2024-09-15 16:50:31,929 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-15 16:50:32,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2024-09-15 16:50:32,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=315078.0, ans=0.125 2024-09-15 16:50:41,734 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.119e+02 2.271e+02 2.541e+02 5.593e+02, threshold=4.541e+02, percent-clipped=2.0 2024-09-15 16:50:42,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=315078.0, ans=0.125 2024-09-15 16:51:00,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=315134.6666666667, ans=0.1 2024-09-15 16:51:27,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=315163.0, ans=0.125 2024-09-15 16:51:31,768 INFO [train.py:1198] (0/2) Epoch 18, batch 2600, loss[loss=0.2457, ctc_loss=0.1673, cr_loss=0.392, over 21001.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.164, cr_loss=0.3836, over 4121812.89 frames. ], batch size: 63, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:52:02,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=315248.0, ans=0.025 2024-09-15 16:52:15,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=315276.3333333333, ans=0.125 2024-09-15 16:52:16,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=315276.3333333333, ans=0.0 2024-09-15 16:52:25,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315276.3333333333, ans=0.1 2024-09-15 16:52:30,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2024-09-15 16:52:46,968 INFO [train.py:1198] (0/2) Epoch 18, batch 2650, loss[loss=0.2709, ctc_loss=0.1862, cr_loss=0.4232, over 20058.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1643, cr_loss=0.3842, over 4116265.09 frames. ], batch size: 80, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:53:12,650 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.091e+02 2.227e+02 2.454e+02 3.694e+02, threshold=4.454e+02, percent-clipped=0.0 2024-09-15 16:53:23,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-15 16:53:50,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=315446.3333333333, ans=0.125 2024-09-15 16:54:02,259 INFO [train.py:1198] (0/2) Epoch 18, batch 2700, loss[loss=0.2163, ctc_loss=0.1441, cr_loss=0.3613, over 20952.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1644, cr_loss=0.3844, over 4117227.13 frames. ], batch size: 48, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:54:08,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=315474.6666666667, ans=0.0 2024-09-15 16:54:48,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=315531.3333333333, ans=0.125 2024-09-15 16:55:16,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=315588.0, ans=0.0 2024-09-15 16:55:23,343 INFO [train.py:1198] (0/2) Epoch 18, batch 2750, loss[loss=0.2689, ctc_loss=0.1842, cr_loss=0.4234, over 20695.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1645, cr_loss=0.3845, over 4118806.01 frames. ], batch size: 71, lr: 4.66e-03, grad_scale: 32.0 2024-09-15 16:55:48,903 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.045e+02 2.157e+02 2.327e+02 3.592e+02, threshold=4.314e+02, percent-clipped=0.0 2024-09-15 16:56:12,236 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2024-09-15 16:56:38,855 INFO [train.py:1198] (0/2) Epoch 18, batch 2800, loss[loss=0.2607, ctc_loss=0.181, cr_loss=0.3986, over 18352.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1646, cr_loss=0.3849, over 4119544.07 frames. ], batch size: 108, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 16:56:59,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2024-09-15 16:57:01,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=315786.3333333333, ans=0.0 2024-09-15 16:57:03,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=315786.3333333333, ans=0.2 2024-09-15 16:57:40,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=315871.3333333333, ans=0.025 2024-09-15 16:57:43,631 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 16:57:53,783 INFO [train.py:1198] (0/2) Epoch 18, batch 2850, loss[loss=0.2319, ctc_loss=0.1571, cr_loss=0.3738, over 20970.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.165, cr_loss=0.3854, over 4117281.47 frames. ], batch size: 55, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 16:57:58,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=315899.6666666667, ans=0.125 2024-09-15 16:58:10,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=315928.0, ans=0.125 2024-09-15 16:58:18,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=315928.0, ans=0.025 2024-09-15 16:58:19,299 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.062e+02 2.184e+02 2.305e+02 4.302e+02, threshold=4.368e+02, percent-clipped=0.0 2024-09-15 16:58:19,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315928.0, ans=0.1 2024-09-15 16:58:59,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=316013.0, ans=0.2 2024-09-15 16:59:09,221 INFO [train.py:1198] (0/2) Epoch 18, batch 2900, loss[loss=0.2397, ctc_loss=0.1592, cr_loss=0.4027, over 20987.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1655, cr_loss=0.3853, over 4113855.99 frames. ], batch size: 55, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 16:59:31,111 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-15 17:00:28,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=316183.0, ans=0.0 2024-09-15 17:00:29,913 INFO [train.py:1198] (0/2) Epoch 18, batch 2950, loss[loss=0.2172, ctc_loss=0.1447, cr_loss=0.3626, over 20979.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1654, cr_loss=0.3856, over 4114522.77 frames. ], batch size: 49, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:00:55,875 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.049e+02 2.209e+02 2.438e+02 3.333e+02, threshold=4.419e+02, percent-clipped=0.0 2024-09-15 17:01:05,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=316239.6666666667, ans=0.125 2024-09-15 17:01:05,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=316239.6666666667, ans=0.125 2024-09-15 17:01:45,439 INFO [train.py:1198] (0/2) Epoch 18, batch 3000, loss[loss=0.1771, ctc_loss=0.117, cr_loss=0.3008, over 19836.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1648, cr_loss=0.3851, over 4121213.25 frames. ], batch size: 44, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:01:45,440 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 17:02:10,939 INFO [train.py:1230] (0/2) Epoch 18, validation: loss=0.0454, ctc_loss=0.0454, cr_loss=1.042e-14, over 944034.00 frames. 2024-09-15 17:02:10,940 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 17:02:32,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=316353.0, ans=0.125 2024-09-15 17:02:41,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=316381.3333333333, ans=0.05 2024-09-15 17:03:14,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=316438.0, ans=0.125 2024-09-15 17:03:15,920 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:03:17,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=316438.0, ans=0.0 2024-09-15 17:03:26,203 INFO [train.py:1198] (0/2) Epoch 18, batch 3050, loss[loss=0.2488, ctc_loss=0.1701, cr_loss=0.3933, over 19274.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1647, cr_loss=0.3851, over 4123717.06 frames. ], batch size: 90, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:03:50,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=316494.6666666667, ans=0.025 2024-09-15 17:03:51,727 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.654e+02 1.989e+02 2.136e+02 2.266e+02 3.019e+02, threshold=4.272e+02, percent-clipped=0.0 2024-09-15 17:03:52,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.02 vs. limit=10.0 2024-09-15 17:03:55,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=316523.0, ans=0.125 2024-09-15 17:04:07,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=316523.0, ans=0.025 2024-09-15 17:04:27,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=22.5 2024-09-15 17:04:32,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=316579.6666666667, ans=0.2 2024-09-15 17:04:41,476 INFO [train.py:1198] (0/2) Epoch 18, batch 3100, loss[loss=0.2214, ctc_loss=0.1449, cr_loss=0.3825, over 20933.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1648, cr_loss=0.3851, over 4115248.48 frames. ], batch size: 49, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:04:43,645 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=12.0 2024-09-15 17:05:03,003 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:05:05,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=316636.3333333333, ans=0.0 2024-09-15 17:05:10,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=316664.6666666667, ans=0.125 2024-09-15 17:05:51,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-09-15 17:06:00,322 INFO [train.py:1198] (0/2) Epoch 18, batch 3150, loss[loss=0.2564, ctc_loss=0.1772, cr_loss=0.396, over 21022.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1656, cr_loss=0.3862, over 4111654.22 frames. ], batch size: 62, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:06:28,893 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.103e+02 2.241e+02 2.472e+02 6.402e+02, threshold=4.482e+02, percent-clipped=1.0 2024-09-15 17:06:46,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=316806.3333333333, ans=0.0 2024-09-15 17:06:53,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=316834.6666666667, ans=0.0 2024-09-15 17:07:19,004 INFO [train.py:1198] (0/2) Epoch 18, batch 3200, loss[loss=0.2045, ctc_loss=0.133, cr_loss=0.3574, over 19009.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1655, cr_loss=0.3858, over 4107896.83 frames. ], batch size: 42, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:07:52,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316948.0, ans=0.1 2024-09-15 17:08:05,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.21 vs. limit=8.0 2024-09-15 17:08:26,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=317004.6666666667, ans=0.125 2024-09-15 17:08:34,679 INFO [train.py:1198] (0/2) Epoch 18, batch 3250, loss[loss=0.2406, ctc_loss=0.164, cr_loss=0.383, over 20888.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1656, cr_loss=0.3861, over 4111922.66 frames. ], batch size: 54, lr: 4.65e-03, grad_scale: 32.0 2024-09-15 17:08:51,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=317061.3333333333, ans=0.125 2024-09-15 17:09:00,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.008e+02 2.115e+02 2.277e+02 3.221e+02, threshold=4.229e+02, percent-clipped=0.0 2024-09-15 17:09:49,973 INFO [train.py:1198] (0/2) Epoch 18, batch 3300, loss[loss=0.2418, ctc_loss=0.1625, cr_loss=0.3965, over 20589.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1658, cr_loss=0.3865, over 4100070.39 frames. ], batch size: 75, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:10:22,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=317231.3333333333, ans=0.2 2024-09-15 17:10:50,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=317288.0, ans=0.0 2024-09-15 17:10:58,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=317288.0, ans=0.0 2024-09-15 17:11:02,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=317288.0, ans=0.125 2024-09-15 17:11:04,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=317316.3333333333, ans=0.0 2024-09-15 17:11:05,392 INFO [train.py:1198] (0/2) Epoch 18, batch 3350, loss[loss=0.2351, ctc_loss=0.1572, cr_loss=0.3893, over 20978.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.166, cr_loss=0.3861, over 4088635.28 frames. ], batch size: 48, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:11:08,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=317316.3333333333, ans=0.125 2024-09-15 17:11:13,258 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-112000.pt 2024-09-15 17:11:32,304 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.047e+02 2.168e+02 2.320e+02 5.001e+02, threshold=4.336e+02, percent-clipped=1.0 2024-09-15 17:11:39,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.33 vs. limit=10.0 2024-09-15 17:11:40,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=317373.0, ans=0.125 2024-09-15 17:11:58,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=317401.3333333333, ans=0.0 2024-09-15 17:12:01,502 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:12:02,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317401.3333333333, ans=0.1 2024-09-15 17:12:28,330 INFO [train.py:1198] (0/2) Epoch 18, batch 3400, loss[loss=0.2607, ctc_loss=0.1823, cr_loss=0.3921, over 20696.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1665, cr_loss=0.387, over 4097054.15 frames. ], batch size: 71, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:13:44,005 INFO [train.py:1198] (0/2) Epoch 18, batch 3450, loss[loss=0.236, ctc_loss=0.1585, cr_loss=0.3875, over 21010.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1662, cr_loss=0.3863, over 4102659.26 frames. ], batch size: 62, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:14:07,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317628.0, ans=0.1 2024-09-15 17:14:09,863 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.713e+02 2.047e+02 2.154e+02 2.265e+02 3.567e+02, threshold=4.307e+02, percent-clipped=0.0 2024-09-15 17:14:26,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=317656.3333333333, ans=0.125 2024-09-15 17:14:37,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=317684.6666666667, ans=0.2 2024-09-15 17:14:40,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=317684.6666666667, ans=0.125 2024-09-15 17:14:59,828 INFO [train.py:1198] (0/2) Epoch 18, batch 3500, loss[loss=0.2848, ctc_loss=0.1953, cr_loss=0.4475, over 20783.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1663, cr_loss=0.3867, over 4108129.70 frames. ], batch size: 65, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:15:14,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=317769.6666666667, ans=0.035 2024-09-15 17:15:14,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=317769.6666666667, ans=0.125 2024-09-15 17:15:58,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=317826.3333333333, ans=15.0 2024-09-15 17:16:16,081 INFO [train.py:1198] (0/2) Epoch 18, batch 3550, loss[loss=0.2565, ctc_loss=0.174, cr_loss=0.4121, over 20968.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1653, cr_loss=0.386, over 4118827.34 frames. ], batch size: 64, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:16:16,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=317883.0, ans=0.125 2024-09-15 17:16:31,972 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-09-15 17:16:41,904 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.046e+02 2.196e+02 2.393e+02 3.616e+02, threshold=4.392e+02, percent-clipped=0.0 2024-09-15 17:17:03,801 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2024-09-15 17:17:06,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317968.0, ans=0.1 2024-09-15 17:17:20,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=317996.3333333333, ans=0.5 2024-09-15 17:17:35,284 INFO [train.py:1198] (0/2) Epoch 18, batch 3600, loss[loss=0.1979, ctc_loss=0.1313, cr_loss=0.333, over 20209.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1649, cr_loss=0.3851, over 4108249.80 frames. ], batch size: 45, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:17:49,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318053.0, ans=0.1 2024-09-15 17:18:07,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=318081.3333333333, ans=0.125 2024-09-15 17:18:21,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=318109.6666666667, ans=0.125 2024-09-15 17:18:36,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=318138.0, ans=0.125 2024-09-15 17:18:42,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=318138.0, ans=0.2 2024-09-15 17:18:50,683 INFO [train.py:1198] (0/2) Epoch 18, batch 3650, loss[loss=0.2459, ctc_loss=0.1651, cr_loss=0.404, over 20963.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1644, cr_loss=0.3851, over 4109380.21 frames. ], batch size: 58, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:19:04,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=318194.6666666667, ans=0.0 2024-09-15 17:19:06,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-15 17:19:09,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=318194.6666666667, ans=0.125 2024-09-15 17:19:16,277 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.068e+02 2.198e+02 2.370e+02 5.605e+02, threshold=4.396e+02, percent-clipped=2.0 2024-09-15 17:20:06,246 INFO [train.py:1198] (0/2) Epoch 18, batch 3700, loss[loss=0.2274, ctc_loss=0.155, cr_loss=0.3617, over 20958.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1652, cr_loss=0.386, over 4094715.64 frames. ], batch size: 58, lr: 4.64e-03, grad_scale: 32.0 2024-09-15 17:20:12,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=318308.0, ans=0.125 2024-09-15 17:20:48,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=318364.6666666667, ans=0.125 2024-09-15 17:21:21,440 INFO [train.py:1198] (0/2) Epoch 18, batch 3750, loss[loss=0.2509, ctc_loss=0.1716, cr_loss=0.3966, over 20881.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.165, cr_loss=0.3856, over 4110688.41 frames. ], batch size: 57, lr: 4.63e-03, grad_scale: 32.0 2024-09-15 17:21:47,597 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.069e+02 2.194e+02 2.394e+02 5.256e+02, threshold=4.387e+02, percent-clipped=1.0 2024-09-15 17:21:47,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=318478.0, ans=0.125 2024-09-15 17:22:06,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=318534.6666666667, ans=0.0 2024-09-15 17:22:37,400 INFO [train.py:1198] (0/2) Epoch 18, batch 3800, loss[loss=0.2046, ctc_loss=0.1358, cr_loss=0.3442, over 20949.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1645, cr_loss=0.3849, over 4110061.41 frames. ], batch size: 48, lr: 4.63e-03, grad_scale: 32.0 2024-09-15 17:23:01,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=318619.6666666667, ans=0.2 2024-09-15 17:23:21,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=318648.0, ans=0.125 2024-09-15 17:23:34,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=318676.3333333333, ans=0.125 2024-09-15 17:23:58,665 INFO [train.py:1198] (0/2) Epoch 18, batch 3850, loss[loss=0.2137, ctc_loss=0.1437, cr_loss=0.3501, over 20797.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1636, cr_loss=0.3838, over 4120646.11 frames. ], batch size: 56, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:24:06,983 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2024-09-15 17:24:24,628 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.050e+02 2.198e+02 2.435e+02 4.376e+02, threshold=4.397e+02, percent-clipped=0.0 2024-09-15 17:24:47,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=318818.0, ans=0.0 2024-09-15 17:24:52,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=318818.0, ans=0.2 2024-09-15 17:25:05,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=318846.3333333333, ans=0.1 2024-09-15 17:25:11,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=318846.3333333333, ans=0.125 2024-09-15 17:25:14,652 INFO [train.py:1198] (0/2) Epoch 18, batch 3900, loss[loss=0.2235, ctc_loss=0.1524, cr_loss=0.3555, over 20969.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1626, cr_loss=0.3815, over 4124688.88 frames. ], batch size: 55, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:25:30,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=318903.0, ans=0.09899494936611666 2024-09-15 17:25:36,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2024-09-15 17:25:43,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=318931.3333333333, ans=0.125 2024-09-15 17:26:08,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=318959.6666666667, ans=0.125 2024-09-15 17:26:27,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=318988.0, ans=0.1 2024-09-15 17:26:30,401 INFO [train.py:1198] (0/2) Epoch 18, batch 3950, loss[loss=0.2398, ctc_loss=0.1626, cr_loss=0.386, over 20982.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1631, cr_loss=0.3829, over 4120451.50 frames. ], batch size: 55, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:26:38,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=319016.3333333333, ans=0.125 2024-09-15 17:26:56,410 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.046e+02 2.210e+02 2.454e+02 3.010e+02, threshold=4.421e+02, percent-clipped=0.0 2024-09-15 17:27:11,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.21 vs. limit=15.0 2024-09-15 17:27:46,765 INFO [train.py:1198] (0/2) Epoch 18, batch 4000, loss[loss=0.2181, ctc_loss=0.147, cr_loss=0.3555, over 21006.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1628, cr_loss=0.383, over 4119018.04 frames. ], batch size: 52, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:29:01,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319271.3333333333, ans=0.1 2024-09-15 17:29:05,351 INFO [train.py:1198] (0/2) Epoch 18, batch 4050, loss[loss=0.2463, ctc_loss=0.1658, cr_loss=0.4024, over 20874.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1632, cr_loss=0.3831, over 4119164.09 frames. ], batch size: 57, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:29:30,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=319328.0, ans=0.2 2024-09-15 17:29:33,536 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.092e+02 2.204e+02 2.412e+02 3.530e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-15 17:30:01,706 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-15 17:30:03,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=319384.6666666667, ans=0.125 2024-09-15 17:30:23,492 INFO [train.py:1198] (0/2) Epoch 18, batch 4100, loss[loss=0.2493, ctc_loss=0.1728, cr_loss=0.3825, over 20985.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1638, cr_loss=0.3832, over 4115642.78 frames. ], batch size: 64, lr: 4.63e-03, grad_scale: 64.0 2024-09-15 17:30:31,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=319441.3333333333, ans=0.1 2024-09-15 17:30:36,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=12.0 2024-09-15 17:30:47,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=319469.6666666667, ans=0.125 2024-09-15 17:30:50,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319469.6666666667, ans=0.1 2024-09-15 17:31:12,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-09-15 17:31:39,230 INFO [train.py:1198] (0/2) Epoch 18, batch 4150, loss[loss=0.2463, ctc_loss=0.1675, cr_loss=0.3945, over 20971.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1641, cr_loss=0.3834, over 4104591.37 frames. ], batch size: 58, lr: 4.63e-03, grad_scale: 32.0 2024-09-15 17:31:55,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-09-15 17:31:57,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=319611.3333333333, ans=0.2 2024-09-15 17:32:06,863 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.065e+02 2.197e+02 2.386e+02 3.146e+02, threshold=4.395e+02, percent-clipped=0.0 2024-09-15 17:32:08,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=319639.6666666667, ans=0.125 2024-09-15 17:32:16,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319639.6666666667, ans=0.1 2024-09-15 17:32:23,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=319668.0, ans=0.0 2024-09-15 17:32:55,191 INFO [train.py:1198] (0/2) Epoch 18, batch 4200, loss[loss=0.2917, ctc_loss=0.2053, cr_loss=0.4321, over 20964.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1646, cr_loss=0.3847, over 4112059.66 frames. ], batch size: 64, lr: 4.63e-03, grad_scale: 32.0 2024-09-15 17:32:55,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=319724.6666666667, ans=0.125 2024-09-15 17:32:58,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=319724.6666666667, ans=0.1 2024-09-15 17:33:28,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=319781.3333333333, ans=0.0 2024-09-15 17:33:56,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=319838.0, ans=0.2 2024-09-15 17:34:10,818 INFO [train.py:1198] (0/2) Epoch 18, batch 4250, loss[loss=0.2492, ctc_loss=0.1693, cr_loss=0.3996, over 20971.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1642, cr_loss=0.3836, over 4091499.09 frames. ], batch size: 63, lr: 4.62e-03, grad_scale: 32.0 2024-09-15 17:34:20,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=319866.3333333333, ans=0.125 2024-09-15 17:34:40,971 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.078e+02 2.251e+02 2.430e+02 5.083e+02, threshold=4.502e+02, percent-clipped=1.0 2024-09-15 17:35:33,157 INFO [train.py:1198] (0/2) Epoch 18, batch 4300, loss[loss=0.2788, ctc_loss=0.1923, cr_loss=0.4327, over 18151.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1632, cr_loss=0.3828, over 4101831.38 frames. ], batch size: 108, lr: 4.62e-03, grad_scale: 32.0 2024-09-15 17:35:37,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=320008.0, ans=0.125 2024-09-15 17:35:40,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=320008.0, ans=0.025 2024-09-15 17:35:48,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=320036.3333333333, ans=0.2 2024-09-15 17:35:49,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=8.0 2024-09-15 17:36:14,976 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:36:17,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=320093.0, ans=0.0 2024-09-15 17:36:37,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=320121.3333333333, ans=0.125 2024-09-15 17:36:49,843 INFO [train.py:1198] (0/2) Epoch 18, batch 4350, loss[loss=0.2134, ctc_loss=0.1402, cr_loss=0.3657, over 20875.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1628, cr_loss=0.3822, over 4105518.45 frames. ], batch size: 54, lr: 4.62e-03, grad_scale: 32.0 2024-09-15 17:37:17,371 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.708e+02 2.021e+02 2.201e+02 2.329e+02 4.676e+02, threshold=4.402e+02, percent-clipped=1.0 2024-09-15 17:37:26,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=320206.3333333333, ans=0.05 2024-09-15 17:37:32,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=320206.3333333333, ans=0.0 2024-09-15 17:38:05,840 INFO [train.py:1198] (0/2) Epoch 18, batch 4400, loss[loss=0.2134, ctc_loss=0.1443, cr_loss=0.3457, over 20970.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1624, cr_loss=0.3817, over 4110536.49 frames. ], batch size: 52, lr: 4.62e-03, grad_scale: 32.0 2024-09-15 17:38:23,087 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:38:26,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2024-09-15 17:38:52,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2024-09-15 17:38:54,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=320376.3333333333, ans=0.2 2024-09-15 17:39:14,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=320404.6666666667, ans=0.5 2024-09-15 17:39:21,940 INFO [train.py:1198] (0/2) Epoch 18, batch 4450, loss[loss=0.2527, ctc_loss=0.1723, cr_loss=0.402, over 20998.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1632, cr_loss=0.3826, over 4104674.21 frames. ], batch size: 61, lr: 4.62e-03, grad_scale: 32.0 2024-09-15 17:39:25,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=320433.0, ans=0.1 2024-09-15 17:39:49,091 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.090e+02 2.230e+02 2.433e+02 3.407e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-15 17:40:01,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=320489.6666666667, ans=0.0 2024-09-15 17:40:20,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=320518.0, ans=0.125 2024-09-15 17:40:42,485 INFO [train.py:1198] (0/2) Epoch 18, batch 4500, loss[loss=0.2394, ctc_loss=0.1617, cr_loss=0.3887, over 21014.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1632, cr_loss=0.3829, over 4100573.56 frames. ], batch size: 63, lr: 4.62e-03, grad_scale: 16.0 2024-09-15 17:40:54,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=320574.6666666667, ans=0.0 2024-09-15 17:41:05,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.37 vs. limit=10.0 2024-09-15 17:41:48,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=320688.0, ans=0.125 2024-09-15 17:42:00,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=320688.0, ans=0.125 2024-09-15 17:42:03,592 INFO [train.py:1198] (0/2) Epoch 18, batch 4550, loss[loss=0.2798, ctc_loss=0.1988, cr_loss=0.405, over 14270.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.164, cr_loss=0.384, over 4099592.53 frames. ], batch size: 150, lr: 4.62e-03, grad_scale: 16.0 2024-09-15 17:42:04,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=320716.3333333333, ans=0.125 2024-09-15 17:42:32,597 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.098e+02 2.267e+02 2.519e+02 7.236e+02, threshold=4.534e+02, percent-clipped=1.0 2024-09-15 17:42:53,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-09-15 17:43:06,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=320829.6666666667, ans=0.125 2024-09-15 17:43:20,504 INFO [train.py:1198] (0/2) Epoch 18, batch 4600, loss[loss=0.2643, ctc_loss=0.1793, cr_loss=0.4251, over 21024.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1652, cr_loss=0.3852, over 4071341.74 frames. ], batch size: 63, lr: 4.62e-03, grad_scale: 16.0 2024-09-15 17:43:42,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320886.3333333333, ans=0.1 2024-09-15 17:43:48,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=320886.3333333333, ans=0.05 2024-09-15 17:44:25,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=15.0 2024-09-15 17:44:27,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=320971.3333333333, ans=0.025 2024-09-15 17:44:28,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=15.0 2024-09-15 17:44:36,246 INFO [train.py:1198] (0/2) Epoch 18, batch 4650, loss[loss=0.1874, ctc_loss=0.1224, cr_loss=0.325, over 20991.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1638, cr_loss=0.3835, over 4092740.38 frames. ], batch size: 48, lr: 4.62e-03, grad_scale: 16.0 2024-09-15 17:45:05,582 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.058e+02 2.194e+02 2.426e+02 3.072e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-15 17:45:30,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=321084.6666666667, ans=0.125 2024-09-15 17:45:48,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=321113.0, ans=0.0 2024-09-15 17:45:53,109 INFO [train.py:1198] (0/2) Epoch 18, batch 4700, loss[loss=0.2537, ctc_loss=0.175, cr_loss=0.3936, over 20646.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1632, cr_loss=0.382, over 4094376.76 frames. ], batch size: 66, lr: 4.62e-03, grad_scale: 16.0 2024-09-15 17:46:17,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=321169.6666666667, ans=0.0 2024-09-15 17:46:51,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=321226.3333333333, ans=0.1 2024-09-15 17:47:16,774 INFO [train.py:1198] (0/2) Epoch 18, batch 4750, loss[loss=0.196, ctc_loss=0.1314, cr_loss=0.3232, over 20990.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1633, cr_loss=0.3811, over 4092158.59 frames. ], batch size: 50, lr: 4.61e-03, grad_scale: 16.0 2024-09-15 17:47:45,709 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.061e+02 2.212e+02 2.449e+02 5.359e+02, threshold=4.423e+02, percent-clipped=1.0 2024-09-15 17:48:05,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=321368.0, ans=0.0 2024-09-15 17:48:10,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=321368.0, ans=0.025 2024-09-15 17:48:10,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=321368.0, ans=0.0 2024-09-15 17:48:32,917 INFO [train.py:1198] (0/2) Epoch 18, batch 4800, loss[loss=0.2137, ctc_loss=0.1433, cr_loss=0.3517, over 20804.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1625, cr_loss=0.3802, over 4098967.07 frames. ], batch size: 53, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:49:25,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=321509.6666666667, ans=0.1 2024-09-15 17:49:48,165 INFO [train.py:1198] (0/2) Epoch 18, batch 4850, loss[loss=0.2309, ctc_loss=0.1569, cr_loss=0.3696, over 20791.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1639, cr_loss=0.3826, over 4101722.99 frames. ], batch size: 56, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:49:48,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-09-15 17:50:08,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=22.5 2024-09-15 17:50:09,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=321594.6666666667, ans=0.125 2024-09-15 17:50:16,370 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=12.0 2024-09-15 17:50:17,130 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.095e+02 2.202e+02 2.306e+02 3.759e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-15 17:50:27,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=321623.0, ans=0.125 2024-09-15 17:51:03,725 INFO [train.py:1198] (0/2) Epoch 18, batch 4900, loss[loss=0.3059, ctc_loss=0.2121, cr_loss=0.4688, over 18303.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1645, cr_loss=0.3841, over 4104483.97 frames. ], batch size: 108, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:51:09,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=321708.0, ans=0.1 2024-09-15 17:51:16,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=321708.0, ans=0.125 2024-09-15 17:51:35,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=321764.6666666667, ans=0.125 2024-09-15 17:51:36,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=12.0 2024-09-15 17:51:43,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=321764.6666666667, ans=0.125 2024-09-15 17:52:08,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-09-15 17:52:22,514 INFO [train.py:1198] (0/2) Epoch 18, batch 4950, loss[loss=0.2453, ctc_loss=0.1679, cr_loss=0.3872, over 21061.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1647, cr_loss=0.384, over 4094245.56 frames. ], batch size: 56, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:52:44,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=321878.0, ans=0.125 2024-09-15 17:52:50,496 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.140e+02 2.303e+02 2.502e+02 3.680e+02, threshold=4.606e+02, percent-clipped=0.0 2024-09-15 17:53:01,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=321906.3333333333, ans=0.0 2024-09-15 17:53:14,880 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 17:53:22,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=321934.6666666667, ans=0.2 2024-09-15 17:53:31,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=321963.0, ans=0.125 2024-09-15 17:53:39,624 INFO [train.py:1198] (0/2) Epoch 18, batch 5000, loss[loss=0.2564, ctc_loss=0.1747, cr_loss=0.4084, over 21042.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1641, cr_loss=0.383, over 4080909.03 frames. ], batch size: 62, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:53:52,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=321991.3333333333, ans=0.125 2024-09-15 17:53:58,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.47 vs. limit=5.0 2024-09-15 17:54:31,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=322076.3333333333, ans=0.125 2024-09-15 17:54:40,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=322104.6666666667, ans=0.04949747468305833 2024-09-15 17:54:53,787 INFO [train.py:1198] (0/2) Epoch 18, batch 5050, loss[loss=0.2425, ctc_loss=0.1677, cr_loss=0.3739, over 20833.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1648, cr_loss=0.3836, over 4080925.27 frames. ], batch size: 59, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:55:22,041 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.069e+02 2.175e+02 2.371e+02 4.721e+02, threshold=4.351e+02, percent-clipped=2.0 2024-09-15 17:55:43,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2024-09-15 17:56:07,976 INFO [train.py:1198] (0/2) Epoch 18, batch 5100, loss[loss=0.2572, ctc_loss=0.1773, cr_loss=0.3998, over 20945.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1648, cr_loss=0.3841, over 4086128.40 frames. ], batch size: 60, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:56:24,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-15 17:56:48,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-09-15 17:56:57,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=322359.6666666667, ans=0.0 2024-09-15 17:57:15,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=322388.0, ans=0.0 2024-09-15 17:57:22,194 INFO [train.py:1198] (0/2) Epoch 18, batch 5150, loss[loss=0.2546, ctc_loss=0.1787, cr_loss=0.3796, over 18066.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1652, cr_loss=0.3852, over 4089691.68 frames. ], batch size: 108, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:57:22,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=322416.3333333333, ans=0.2 2024-09-15 17:57:24,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322416.3333333333, ans=0.1 2024-09-15 17:57:30,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=22.5 2024-09-15 17:57:30,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=322416.3333333333, ans=12.0 2024-09-15 17:57:41,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=322444.6666666667, ans=0.0 2024-09-15 17:57:50,332 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.046e+02 2.252e+02 2.431e+02 5.879e+02, threshold=4.504e+02, percent-clipped=1.0 2024-09-15 17:57:56,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-09-15 17:58:22,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=322529.6666666667, ans=0.125 2024-09-15 17:58:35,904 INFO [train.py:1198] (0/2) Epoch 18, batch 5200, loss[loss=0.2685, ctc_loss=0.1849, cr_loss=0.4178, over 21045.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1655, cr_loss=0.3856, over 4081512.16 frames. ], batch size: 63, lr: 4.61e-03, grad_scale: 32.0 2024-09-15 17:58:42,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=322558.0, ans=0.2 2024-09-15 17:58:49,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=322586.3333333333, ans=0.125 2024-09-15 17:59:07,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=322614.6666666667, ans=0.2 2024-09-15 17:59:49,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=322699.6666666667, ans=0.0 2024-09-15 17:59:50,808 INFO [train.py:1198] (0/2) Epoch 18, batch 5250, loss[loss=0.219, ctc_loss=0.1487, cr_loss=0.3514, over 20977.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1662, cr_loss=0.3863, over 4071465.50 frames. ], batch size: 51, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:00:00,549 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.71 vs. limit=10.0 2024-09-15 18:00:19,017 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.084e+02 2.230e+02 2.413e+02 7.928e+02, threshold=4.461e+02, percent-clipped=1.0 2024-09-15 18:00:26,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=322756.3333333333, ans=0.125 2024-09-15 18:00:32,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=322756.3333333333, ans=0.025 2024-09-15 18:00:34,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=322784.6666666667, ans=0.2 2024-09-15 18:00:53,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-15 18:01:04,757 INFO [train.py:1198] (0/2) Epoch 18, batch 5300, loss[loss=0.2961, ctc_loss=0.2136, cr_loss=0.4127, over 14767.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1664, cr_loss=0.3863, over 4060889.29 frames. ], batch size: 149, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:01:24,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-15 18:02:01,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=322926.3333333333, ans=0.125 2024-09-15 18:02:16,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322954.6666666667, ans=0.1 2024-09-15 18:02:19,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=322954.6666666667, ans=0.125 2024-09-15 18:02:22,298 INFO [train.py:1198] (0/2) Epoch 18, batch 5350, loss[loss=0.2935, ctc_loss=0.208, cr_loss=0.4275, over 18069.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1665, cr_loss=0.3863, over 4061791.30 frames. ], batch size: 108, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:02:53,330 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.130e+02 2.257e+02 2.438e+02 3.200e+02, threshold=4.515e+02, percent-clipped=0.0 2024-09-15 18:02:56,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=323039.6666666667, ans=0.1 2024-09-15 18:03:10,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=323068.0, ans=0.0 2024-09-15 18:03:23,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=323096.3333333333, ans=0.5 2024-09-15 18:03:39,154 INFO [train.py:1198] (0/2) Epoch 18, batch 5400, loss[loss=0.2147, ctc_loss=0.1444, cr_loss=0.3517, over 21056.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1667, cr_loss=0.3872, over 4068591.71 frames. ], batch size: 53, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:03:43,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=323124.6666666667, ans=0.0 2024-09-15 18:03:54,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=323153.0, ans=0.125 2024-09-15 18:04:14,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=323181.3333333333, ans=0.035 2024-09-15 18:04:26,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=323209.6666666667, ans=0.125 2024-09-15 18:04:53,232 INFO [train.py:1198] (0/2) Epoch 18, batch 5450, loss[loss=0.2627, ctc_loss=0.1762, cr_loss=0.4325, over 20968.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1679, cr_loss=0.3887, over 4057373.27 frames. ], batch size: 64, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:05:02,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=323266.3333333333, ans=0.0 2024-09-15 18:05:10,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=323294.6666666667, ans=0.0 2024-09-15 18:05:21,702 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.113e+02 2.285e+02 2.447e+02 3.054e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-15 18:05:41,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=323351.3333333333, ans=0.125 2024-09-15 18:06:07,397 INFO [train.py:1198] (0/2) Epoch 18, batch 5500, loss[loss=0.2865, ctc_loss=0.2089, cr_loss=0.3877, over 14260.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1673, cr_loss=0.3878, over 4053923.55 frames. ], batch size: 149, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:06:16,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=323408.0, ans=0.1 2024-09-15 18:06:31,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=323436.3333333333, ans=0.1 2024-09-15 18:06:34,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=323436.3333333333, ans=0.125 2024-09-15 18:07:21,623 INFO [train.py:1198] (0/2) Epoch 18, batch 5550, loss[loss=0.2259, ctc_loss=0.1534, cr_loss=0.3625, over 21075.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1655, cr_loss=0.3848, over 4063900.93 frames. ], batch size: 53, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:07:38,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=323578.0, ans=0.125 2024-09-15 18:07:50,109 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.151e+02 2.305e+02 2.497e+02 4.741e+02, threshold=4.611e+02, percent-clipped=1.0 2024-09-15 18:07:53,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=323606.3333333333, ans=0.0 2024-09-15 18:08:36,735 INFO [train.py:1198] (0/2) Epoch 18, batch 5600, loss[loss=0.2794, ctc_loss=0.1906, cr_loss=0.4438, over 20856.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1658, cr_loss=0.3856, over 4072964.66 frames. ], batch size: 65, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:08:59,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=323719.6666666667, ans=0.0 2024-09-15 18:09:12,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=323748.0, ans=0.04949747468305833 2024-09-15 18:09:15,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=323748.0, ans=0.125 2024-09-15 18:09:46,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-09-15 18:09:47,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=323804.6666666667, ans=0.0 2024-09-15 18:09:51,286 INFO [train.py:1198] (0/2) Epoch 18, batch 5650, loss[loss=0.3181, ctc_loss=0.234, cr_loss=0.4206, over 14315.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1665, cr_loss=0.3863, over 4069103.66 frames. ], batch size: 149, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:10:07,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=323861.3333333333, ans=0.125 2024-09-15 18:10:10,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=323861.3333333333, ans=0.125 2024-09-15 18:10:16,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=323861.3333333333, ans=0.125 2024-09-15 18:10:19,124 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.101e+02 2.246e+02 2.405e+02 3.438e+02, threshold=4.492e+02, percent-clipped=0.0 2024-09-15 18:10:24,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.10 vs. limit=15.0 2024-09-15 18:10:33,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=323889.6666666667, ans=0.125 2024-09-15 18:11:07,579 INFO [train.py:1198] (0/2) Epoch 18, batch 5700, loss[loss=0.2586, ctc_loss=0.1767, cr_loss=0.4091, over 20978.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1655, cr_loss=0.3855, over 4076241.48 frames. ], batch size: 64, lr: 4.60e-03, grad_scale: 32.0 2024-09-15 18:11:28,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=324003.0, ans=0.025 2024-09-15 18:11:35,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=324003.0, ans=0.0 2024-09-15 18:11:47,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=324031.3333333333, ans=0.125 2024-09-15 18:11:53,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=324059.6666666667, ans=0.125 2024-09-15 18:12:17,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=324088.0, ans=0.125 2024-09-15 18:12:20,251 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.47 vs. limit=5.0 2024-09-15 18:12:25,117 INFO [train.py:1198] (0/2) Epoch 18, batch 5750, loss[loss=0.27, ctc_loss=0.1877, cr_loss=0.4119, over 18278.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1652, cr_loss=0.385, over 4082615.97 frames. ], batch size: 108, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:12:27,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-09-15 18:12:53,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.687e+02 2.077e+02 2.203e+02 2.459e+02 3.812e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-15 18:13:03,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=12.0 2024-09-15 18:13:39,859 INFO [train.py:1198] (0/2) Epoch 18, batch 5800, loss[loss=0.2235, ctc_loss=0.1481, cr_loss=0.3771, over 20781.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1649, cr_loss=0.3845, over 4079434.78 frames. ], batch size: 56, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:14:11,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=324314.6666666667, ans=0.0 2024-09-15 18:14:20,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=324314.6666666667, ans=0.125 2024-09-15 18:14:23,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=324343.0, ans=0.125 2024-09-15 18:14:24,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=324343.0, ans=0.0 2024-09-15 18:14:35,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=324343.0, ans=0.125 2024-09-15 18:14:38,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-09-15 18:14:39,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=324371.3333333333, ans=0.125 2024-09-15 18:14:42,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=324371.3333333333, ans=0.2 2024-09-15 18:14:45,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=324371.3333333333, ans=0.0 2024-09-15 18:14:53,898 INFO [train.py:1198] (0/2) Epoch 18, batch 5850, loss[loss=0.2399, ctc_loss=0.1608, cr_loss=0.3957, over 20978.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1651, cr_loss=0.3855, over 4077980.88 frames. ], batch size: 58, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:14:58,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=324399.6666666667, ans=0.05 2024-09-15 18:15:22,148 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.055e+02 2.169e+02 2.308e+02 4.486e+02, threshold=4.338e+02, percent-clipped=1.0 2024-09-15 18:15:22,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324456.3333333333, ans=0.1 2024-09-15 18:15:31,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=324456.3333333333, ans=0.0 2024-09-15 18:15:43,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324484.6666666667, ans=0.1 2024-09-15 18:16:08,530 INFO [train.py:1198] (0/2) Epoch 18, batch 5900, loss[loss=0.2805, ctc_loss=0.1955, cr_loss=0.4252, over 20624.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1656, cr_loss=0.3865, over 4090108.65 frames. ], batch size: 66, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:16:17,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324541.3333333333, ans=0.1 2024-09-15 18:16:22,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=324569.6666666667, ans=0.05 2024-09-15 18:16:47,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=324598.0, ans=0.04949747468305833 2024-09-15 18:16:55,235 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:16:59,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=324626.3333333333, ans=0.0 2024-09-15 18:17:11,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=324654.6666666667, ans=0.5 2024-09-15 18:17:11,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=324654.6666666667, ans=0.2 2024-09-15 18:17:23,099 INFO [train.py:1198] (0/2) Epoch 18, batch 5950, loss[loss=0.2471, ctc_loss=0.1711, cr_loss=0.3798, over 19381.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1653, cr_loss=0.3858, over 4082441.01 frames. ], batch size: 90, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:17:34,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=324683.0, ans=0.2 2024-09-15 18:17:37,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=324711.3333333333, ans=0.125 2024-09-15 18:17:43,521 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-09-15 18:17:51,458 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.671e+02 2.069e+02 2.193e+02 2.423e+02 7.278e+02, threshold=4.387e+02, percent-clipped=1.0 2024-09-15 18:18:02,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=324739.6666666667, ans=0.125 2024-09-15 18:18:15,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324768.0, ans=0.1 2024-09-15 18:18:22,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=324796.3333333333, ans=0.0 2024-09-15 18:18:25,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=324796.3333333333, ans=0.2 2024-09-15 18:18:29,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2024-09-15 18:18:36,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=324824.6666666667, ans=0.0 2024-09-15 18:18:37,504 INFO [train.py:1198] (0/2) Epoch 18, batch 6000, loss[loss=0.2042, ctc_loss=0.136, cr_loss=0.3408, over 20951.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1651, cr_loss=0.3855, over 4090789.04 frames. ], batch size: 50, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:18:37,506 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 18:19:01,521 INFO [train.py:1230] (0/2) Epoch 18, validation: loss=0.04538, ctc_loss=0.04538, cr_loss=1.038e-14, over 944034.00 frames. 2024-09-15 18:19:01,521 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 18:19:23,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=324853.0, ans=0.125 2024-09-15 18:20:18,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=324966.3333333333, ans=0.0 2024-09-15 18:20:19,148 INFO [train.py:1198] (0/2) Epoch 18, batch 6050, loss[loss=0.2869, ctc_loss=0.2023, cr_loss=0.4231, over 18263.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1649, cr_loss=0.3855, over 4081985.24 frames. ], batch size: 108, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:20:41,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=324994.6666666667, ans=0.07 2024-09-15 18:20:44,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=324994.6666666667, ans=0.0 2024-09-15 18:20:48,624 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.068e+02 2.218e+02 2.388e+02 4.560e+02, threshold=4.436e+02, percent-clipped=1.0 2024-09-15 18:21:34,055 INFO [train.py:1198] (0/2) Epoch 18, batch 6100, loss[loss=0.2573, ctc_loss=0.1771, cr_loss=0.4009, over 20975.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1652, cr_loss=0.3859, over 4068840.97 frames. ], batch size: 64, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:21:55,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-09-15 18:22:01,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=325136.3333333333, ans=0.125 2024-09-15 18:22:11,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=325164.6666666667, ans=0.125 2024-09-15 18:22:48,269 INFO [train.py:1198] (0/2) Epoch 18, batch 6150, loss[loss=0.2537, ctc_loss=0.1752, cr_loss=0.3929, over 20978.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1645, cr_loss=0.3839, over 4082109.57 frames. ], batch size: 58, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:23:02,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=325278.0, ans=0.125 2024-09-15 18:23:10,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=325278.0, ans=0.125 2024-09-15 18:23:12,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=325278.0, ans=0.1 2024-09-15 18:23:16,411 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.030e+02 2.241e+02 2.538e+02 4.663e+02, threshold=4.483e+02, percent-clipped=1.0 2024-09-15 18:23:54,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=325363.0, ans=0.125 2024-09-15 18:24:01,776 INFO [train.py:1198] (0/2) Epoch 18, batch 6200, loss[loss=0.2555, ctc_loss=0.1723, cr_loss=0.4159, over 21015.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1651, cr_loss=0.3836, over 4045125.46 frames. ], batch size: 63, lr: 4.59e-03, grad_scale: 32.0 2024-09-15 18:24:25,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=325419.6666666667, ans=0.0 2024-09-15 18:24:46,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=325476.3333333333, ans=0.0 2024-09-15 18:25:14,423 INFO [train.py:1198] (0/2) Epoch 18, batch 6250, loss[loss=0.255, ctc_loss=0.1769, cr_loss=0.3903, over 19747.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.165, cr_loss=0.3829, over 4036258.98 frames. ], batch size: 90, lr: 4.58e-03, grad_scale: 32.0 2024-09-15 18:25:20,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=325533.0, ans=0.125 2024-09-15 18:25:25,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=325533.0, ans=0.2 2024-09-15 18:25:26,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2024-09-15 18:25:39,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=325561.3333333333, ans=0.0 2024-09-15 18:25:39,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-15 18:25:42,111 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.110e+02 2.222e+02 2.392e+02 5.306e+02, threshold=4.444e+02, percent-clipped=1.0 2024-09-15 18:25:46,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=325589.6666666667, ans=0.1 2024-09-15 18:25:48,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=325589.6666666667, ans=0.2 2024-09-15 18:26:16,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.98 vs. limit=10.0 2024-09-15 18:26:23,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=325646.3333333333, ans=0.125 2024-09-15 18:26:26,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=325674.6666666667, ans=0.2 2024-09-15 18:26:27,893 INFO [train.py:1198] (0/2) Epoch 18, batch 6300, loss[loss=0.2681, ctc_loss=0.1835, cr_loss=0.4234, over 20047.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1655, cr_loss=0.3823, over 4011984.93 frames. ], batch size: 80, lr: 4.58e-03, grad_scale: 16.0 2024-09-15 18:26:34,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-09-15 18:26:35,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=22.5 2024-09-15 18:26:38,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=325674.6666666667, ans=0.0 2024-09-15 18:26:38,394 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:26:56,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=325731.3333333333, ans=0.2 2024-09-15 18:27:06,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=325731.3333333333, ans=0.04949747468305833 2024-09-15 18:27:36,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.94 vs. limit=15.0 2024-09-15 18:27:40,248 INFO [train.py:1198] (0/2) Epoch 18, batch 6350, loss[loss=0.2891, ctc_loss=0.208, cr_loss=0.4054, over 14292.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1701, cr_loss=0.3847, over 3841560.20 frames. ], batch size: 149, lr: 4.58e-03, grad_scale: 16.0 2024-09-15 18:28:00,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=325844.6666666667, ans=0.0 2024-09-15 18:28:08,800 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.268e+02 2.509e+02 2.765e+02 3.819e+02, threshold=5.018e+02, percent-clipped=0.0 2024-09-15 18:28:09,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=325873.0, ans=0.0 2024-09-15 18:28:13,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=325873.0, ans=0.125 2024-09-15 18:28:38,909 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-18.pt 2024-09-15 18:29:30,280 INFO [train.py:1198] (0/2) Epoch 19, batch 0, loss[loss=0.2706, ctc_loss=0.1877, cr_loss=0.4145, over 20736.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1877, cr_loss=0.4145, over 20736.00 frames. ], batch size: 71, lr: 4.46e-03, grad_scale: 32.0 2024-09-15 18:29:30,282 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 18:29:38,829 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5106, 5.0328, 4.4044, 4.8777], device='cuda:0') 2024-09-15 18:29:48,455 INFO [train.py:1230] (0/2) Epoch 19, validation: loss=0.04489, ctc_loss=0.04489, cr_loss=1.016e-14, over 944034.00 frames. 2024-09-15 18:29:48,456 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 18:29:51,026 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=15.0 2024-09-15 18:30:05,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=325960.8333333333, ans=0.125 2024-09-15 18:30:16,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=325960.8333333333, ans=0.125 2024-09-15 18:30:54,712 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2024-09-15 18:31:04,826 INFO [train.py:1198] (0/2) Epoch 19, batch 50, loss[loss=0.2421, ctc_loss=0.1639, cr_loss=0.3913, over 21026.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1643, cr_loss=0.3859, over 938866.88 frames. ], batch size: 63, lr: 4.46e-03, grad_scale: 32.0 2024-09-15 18:31:06,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=326074.1666666667, ans=0.125 2024-09-15 18:31:17,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=326074.1666666667, ans=0.125 2024-09-15 18:31:37,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=326130.8333333333, ans=0.1 2024-09-15 18:31:52,773 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.067e+02 2.315e+02 2.668e+02 3.599e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-15 18:31:56,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=326159.1666666667, ans=0.05 2024-09-15 18:31:57,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=326159.1666666667, ans=0.2 2024-09-15 18:32:24,509 INFO [train.py:1198] (0/2) Epoch 19, batch 100, loss[loss=0.2097, ctc_loss=0.139, cr_loss=0.3535, over 20963.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1613, cr_loss=0.3804, over 1640242.01 frames. ], batch size: 49, lr: 4.46e-03, grad_scale: 32.0 2024-09-15 18:32:28,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-15 18:32:34,370 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-15 18:32:48,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=326244.1666666667, ans=0.125 2024-09-15 18:33:16,320 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:33:23,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=326329.1666666667, ans=0.0 2024-09-15 18:33:35,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=326329.1666666667, ans=0.0 2024-09-15 18:33:40,073 INFO [train.py:1198] (0/2) Epoch 19, batch 150, loss[loss=0.2355, ctc_loss=0.1578, cr_loss=0.3884, over 20981.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1609, cr_loss=0.3797, over 2185312.17 frames. ], batch size: 58, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:33:47,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=22.5 2024-09-15 18:33:52,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=326357.5, ans=0.125 2024-09-15 18:34:22,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=326414.1666666667, ans=0.125 2024-09-15 18:34:23,788 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.717e+02 2.070e+02 2.223e+02 2.435e+02 4.334e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-15 18:34:34,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2024-09-15 18:34:37,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=326442.5, ans=0.0 2024-09-15 18:34:40,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=326470.8333333333, ans=0.1 2024-09-15 18:34:43,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=326470.8333333333, ans=0.04949747468305833 2024-09-15 18:34:46,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=326470.8333333333, ans=0.1 2024-09-15 18:34:55,683 INFO [train.py:1198] (0/2) Epoch 19, batch 200, loss[loss=0.2232, ctc_loss=0.1472, cr_loss=0.3801, over 20800.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.161, cr_loss=0.3795, over 2599440.67 frames. ], batch size: 53, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:34:56,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=326499.1666666667, ans=0.025 2024-09-15 18:34:57,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=326499.1666666667, ans=0.125 2024-09-15 18:35:11,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=326499.1666666667, ans=0.125 2024-09-15 18:35:18,286 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2024-09-15 18:35:22,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=326527.5, ans=0.0 2024-09-15 18:35:58,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=326612.5, ans=0.5 2024-09-15 18:36:05,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=326612.5, ans=0.035 2024-09-15 18:36:14,686 INFO [train.py:1198] (0/2) Epoch 19, batch 250, loss[loss=0.2084, ctc_loss=0.1395, cr_loss=0.3445, over 20985.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.163, cr_loss=0.3825, over 2923650.05 frames. ], batch size: 50, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:36:22,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=326640.8333333333, ans=0.125 2024-09-15 18:36:31,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=326669.1666666667, ans=0.125 2024-09-15 18:36:36,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=326669.1666666667, ans=0.125 2024-09-15 18:36:43,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=326697.5, ans=0.2 2024-09-15 18:36:59,098 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.070e+02 2.225e+02 2.386e+02 3.637e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 18:37:15,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=326754.1666666667, ans=0.1 2024-09-15 18:37:24,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=326754.1666666667, ans=0.125 2024-09-15 18:37:34,110 INFO [train.py:1198] (0/2) Epoch 19, batch 300, loss[loss=0.1921, ctc_loss=0.1298, cr_loss=0.3117, over 21018.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1636, cr_loss=0.3837, over 3191505.29 frames. ], batch size: 52, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:37:35,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=326782.5, ans=0.125 2024-09-15 18:37:54,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-15 18:38:22,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=326867.5, ans=0.125 2024-09-15 18:38:34,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=326895.8333333333, ans=0.0 2024-09-15 18:38:49,171 INFO [train.py:1198] (0/2) Epoch 19, batch 350, loss[loss=0.2961, ctc_loss=0.2092, cr_loss=0.4345, over 14668.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1635, cr_loss=0.3842, over 3400977.54 frames. ], batch size: 149, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:38:52,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=326924.1666666667, ans=0.125 2024-09-15 18:39:32,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.064e+02 2.242e+02 2.483e+02 3.467e+02, threshold=4.484e+02, percent-clipped=0.0 2024-09-15 18:39:41,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2024-09-15 18:40:00,283 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:40:04,357 INFO [train.py:1198] (0/2) Epoch 19, batch 400, loss[loss=0.2498, ctc_loss=0.1727, cr_loss=0.3856, over 19282.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.164, cr_loss=0.3843, over 3542041.76 frames. ], batch size: 90, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:40:07,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=327065.8333333333, ans=0.0 2024-09-15 18:40:24,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=327094.1666666667, ans=0.2 2024-09-15 18:40:44,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=327122.5, ans=0.0 2024-09-15 18:41:23,249 INFO [train.py:1198] (0/2) Epoch 19, batch 450, loss[loss=0.215, ctc_loss=0.143, cr_loss=0.3599, over 20020.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1643, cr_loss=0.3853, over 3675672.15 frames. ], batch size: 44, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:41:23,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=327207.5, ans=0.0 2024-09-15 18:41:32,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=327207.5, ans=0.0 2024-09-15 18:42:07,421 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.072e+02 2.186e+02 2.368e+02 4.597e+02, threshold=4.373e+02, percent-clipped=1.0 2024-09-15 18:42:09,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=327292.5, ans=0.0 2024-09-15 18:42:09,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=327292.5, ans=0.125 2024-09-15 18:42:12,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=327292.5, ans=0.0 2024-09-15 18:42:30,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327320.8333333333, ans=0.1 2024-09-15 18:42:36,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=327320.8333333333, ans=0.025 2024-09-15 18:42:38,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=327349.1666666667, ans=0.125 2024-09-15 18:42:39,557 INFO [train.py:1198] (0/2) Epoch 19, batch 500, loss[loss=0.2318, ctc_loss=0.1556, cr_loss=0.3812, over 21071.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1636, cr_loss=0.3842, over 3778483.37 frames. ], batch size: 59, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:42:52,959 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-15 18:43:18,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-15 18:43:51,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=327462.5, ans=0.2 2024-09-15 18:44:00,223 INFO [train.py:1198] (0/2) Epoch 19, batch 550, loss[loss=0.2544, ctc_loss=0.1734, cr_loss=0.4046, over 20875.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1654, cr_loss=0.3869, over 3829724.64 frames. ], batch size: 57, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:44:22,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=327519.1666666667, ans=0.125 2024-09-15 18:44:25,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327519.1666666667, ans=0.1 2024-09-15 18:44:29,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=327547.5, ans=0.125 2024-09-15 18:44:29,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=327547.5, ans=0.125 2024-09-15 18:44:44,583 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.086e+02 2.199e+02 2.384e+02 3.572e+02, threshold=4.399e+02, percent-clipped=0.0 2024-09-15 18:44:47,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=327575.8333333333, ans=0.125 2024-09-15 18:44:53,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2024-09-15 18:45:09,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=327604.1666666667, ans=0.0 2024-09-15 18:45:16,772 INFO [train.py:1198] (0/2) Epoch 19, batch 600, loss[loss=0.2966, ctc_loss=0.2125, cr_loss=0.4208, over 14383.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1655, cr_loss=0.3858, over 3865729.84 frames. ], batch size: 150, lr: 4.45e-03, grad_scale: 32.0 2024-09-15 18:45:23,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=327632.5, ans=0.125 2024-09-15 18:45:36,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=327660.8333333333, ans=0.5 2024-09-15 18:46:19,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=15.0 2024-09-15 18:46:20,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327745.8333333333, ans=0.1 2024-09-15 18:46:22,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=327745.8333333333, ans=0.04949747468305833 2024-09-15 18:46:32,682 INFO [train.py:1198] (0/2) Epoch 19, batch 650, loss[loss=0.241, ctc_loss=0.1635, cr_loss=0.3875, over 21069.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.166, cr_loss=0.3867, over 3908758.42 frames. ], batch size: 56, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:46:47,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-15 18:47:19,956 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.059e+02 2.175e+02 2.355e+02 3.259e+02, threshold=4.350e+02, percent-clipped=0.0 2024-09-15 18:47:32,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=327859.1666666667, ans=0.2 2024-09-15 18:47:32,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=327859.1666666667, ans=0.2 2024-09-15 18:47:50,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=327915.8333333333, ans=0.125 2024-09-15 18:47:52,151 INFO [train.py:1198] (0/2) Epoch 19, batch 700, loss[loss=0.2272, ctc_loss=0.1542, cr_loss=0.3651, over 20797.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1653, cr_loss=0.385, over 3949072.15 frames. ], batch size: 53, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:48:12,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-09-15 18:48:16,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=327944.1666666667, ans=0.125 2024-09-15 18:48:39,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=328000.8333333333, ans=0.0 2024-09-15 18:49:01,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=328029.1666666667, ans=0.125 2024-09-15 18:49:12,592 INFO [train.py:1198] (0/2) Epoch 19, batch 750, loss[loss=0.2098, ctc_loss=0.1427, cr_loss=0.336, over 20973.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1638, cr_loss=0.3839, over 3997106.96 frames. ], batch size: 55, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:49:27,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=328085.8333333333, ans=0.125 2024-09-15 18:49:56,860 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.055e+02 2.153e+02 2.349e+02 2.781e+02, threshold=4.305e+02, percent-clipped=0.0 2024-09-15 18:50:28,572 INFO [train.py:1198] (0/2) Epoch 19, batch 800, loss[loss=0.2711, ctc_loss=0.1875, cr_loss=0.4183, over 20963.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1634, cr_loss=0.3833, over 4023599.99 frames. ], batch size: 64, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:50:55,411 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.65 vs. limit=10.0 2024-09-15 18:51:32,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328312.5, ans=0.1 2024-09-15 18:51:44,705 INFO [train.py:1198] (0/2) Epoch 19, batch 850, loss[loss=0.1945, ctc_loss=0.1302, cr_loss=0.3215, over 21070.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1629, cr_loss=0.3828, over 4034171.69 frames. ], batch size: 53, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:52:21,286 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 18:52:31,521 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.077e+02 2.181e+02 2.349e+02 4.241e+02, threshold=4.361e+02, percent-clipped=0.0 2024-09-15 18:52:39,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=328425.8333333333, ans=0.125 2024-09-15 18:52:41,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.97 vs. limit=10.0 2024-09-15 18:52:44,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=328425.8333333333, ans=0.125 2024-09-15 18:52:51,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=328454.1666666667, ans=0.125 2024-09-15 18:53:03,094 INFO [train.py:1198] (0/2) Epoch 19, batch 900, loss[loss=0.2166, ctc_loss=0.1491, cr_loss=0.3374, over 20994.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1647, cr_loss=0.3848, over 4036159.11 frames. ], batch size: 55, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:53:27,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=328510.8333333333, ans=0.5 2024-09-15 18:53:48,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=328567.5, ans=0.0 2024-09-15 18:54:04,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-15 18:54:19,075 INFO [train.py:1198] (0/2) Epoch 19, batch 950, loss[loss=0.2568, ctc_loss=0.1765, cr_loss=0.4017, over 19881.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1653, cr_loss=0.3857, over 4043694.11 frames. ], batch size: 80, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:54:42,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=328652.5, ans=0.0 2024-09-15 18:54:44,777 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-116000.pt 2024-09-15 18:55:08,293 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.082e+02 2.206e+02 2.383e+02 4.044e+02, threshold=4.412e+02, percent-clipped=0.0 2024-09-15 18:55:32,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328737.5, ans=0.1 2024-09-15 18:55:40,821 INFO [train.py:1198] (0/2) Epoch 19, batch 1000, loss[loss=0.26, ctc_loss=0.1755, cr_loss=0.4226, over 21078.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.164, cr_loss=0.3842, over 4065892.30 frames. ], batch size: 59, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:55:42,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=328765.8333333333, ans=0.2 2024-09-15 18:56:57,473 INFO [train.py:1198] (0/2) Epoch 19, batch 1050, loss[loss=0.248, ctc_loss=0.169, cr_loss=0.3949, over 21042.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1644, cr_loss=0.3857, over 4080636.42 frames. ], batch size: 56, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:57:03,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328907.5, ans=0.1 2024-09-15 18:57:11,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=328935.8333333333, ans=0.0 2024-09-15 18:57:13,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=328935.8333333333, ans=0.0 2024-09-15 18:57:41,546 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.069e+02 2.181e+02 2.330e+02 6.080e+02, threshold=4.361e+02, percent-clipped=1.0 2024-09-15 18:57:55,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=328992.5, ans=0.0 2024-09-15 18:58:16,783 INFO [train.py:1198] (0/2) Epoch 19, batch 1100, loss[loss=0.2346, ctc_loss=0.1571, cr_loss=0.3873, over 21033.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1636, cr_loss=0.3846, over 4087318.60 frames. ], batch size: 62, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:58:32,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2024-09-15 18:59:01,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=329134.1666666667, ans=0.125 2024-09-15 18:59:04,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2024-09-15 18:59:12,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=329134.1666666667, ans=0.125 2024-09-15 18:59:33,467 INFO [train.py:1198] (0/2) Epoch 19, batch 1150, loss[loss=0.2474, ctc_loss=0.1677, cr_loss=0.3985, over 20846.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1636, cr_loss=0.3843, over 4088068.41 frames. ], batch size: 65, lr: 4.44e-03, grad_scale: 32.0 2024-09-15 18:59:46,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-09-15 19:00:18,129 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.697e+02 2.059e+02 2.192e+02 2.368e+02 3.140e+02, threshold=4.385e+02, percent-clipped=0.0 2024-09-15 19:00:53,982 INFO [train.py:1198] (0/2) Epoch 19, batch 1200, loss[loss=0.2583, ctc_loss=0.1711, cr_loss=0.436, over 21069.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.164, cr_loss=0.3852, over 4078807.09 frames. ], batch size: 59, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:01:02,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=329332.5, ans=0.125 2024-09-15 19:01:11,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=329360.8333333333, ans=0.2 2024-09-15 19:01:44,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=329417.5, ans=0.2 2024-09-15 19:02:01,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=329445.8333333333, ans=0.0 2024-09-15 19:02:06,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=329445.8333333333, ans=0.0 2024-09-15 19:02:10,737 INFO [train.py:1198] (0/2) Epoch 19, batch 1250, loss[loss=0.2638, ctc_loss=0.1798, cr_loss=0.4202, over 20354.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1636, cr_loss=0.3846, over 4088206.87 frames. ], batch size: 74, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:02:26,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=15.0 2024-09-15 19:02:54,870 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.051e+02 2.171e+02 2.356e+02 4.459e+02, threshold=4.341e+02, percent-clipped=1.0 2024-09-15 19:02:58,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=329559.1666666667, ans=0.0 2024-09-15 19:03:12,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329587.5, ans=0.1 2024-09-15 19:03:26,455 INFO [train.py:1198] (0/2) Epoch 19, batch 1300, loss[loss=0.2212, ctc_loss=0.15, cr_loss=0.3561, over 20926.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1648, cr_loss=0.3868, over 4092363.35 frames. ], batch size: 48, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:04:24,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329700.8333333333, ans=0.1 2024-09-15 19:04:35,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=329729.1666666667, ans=0.125 2024-09-15 19:04:45,381 INFO [train.py:1198] (0/2) Epoch 19, batch 1350, loss[loss=0.2769, ctc_loss=0.1917, cr_loss=0.4259, over 20668.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1641, cr_loss=0.3855, over 4106539.11 frames. ], batch size: 71, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:05:03,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=329785.8333333333, ans=0.0 2024-09-15 19:05:11,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=329785.8333333333, ans=0.2 2024-09-15 19:05:13,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=329785.8333333333, ans=0.125 2024-09-15 19:05:25,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=329814.1666666667, ans=0.2 2024-09-15 19:05:29,551 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.015e+02 2.192e+02 2.414e+02 3.289e+02, threshold=4.384e+02, percent-clipped=0.0 2024-09-15 19:05:34,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=12.0 2024-09-15 19:06:01,870 INFO [train.py:1198] (0/2) Epoch 19, batch 1400, loss[loss=0.2458, ctc_loss=0.1685, cr_loss=0.3863, over 20799.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1642, cr_loss=0.3848, over 4100221.58 frames. ], batch size: 53, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:06:16,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=329899.1666666667, ans=0.0 2024-09-15 19:06:22,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=329927.5, ans=0.2 2024-09-15 19:06:30,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.58 vs. limit=15.0 2024-09-15 19:06:34,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=329955.8333333333, ans=0.125 2024-09-15 19:06:45,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=329955.8333333333, ans=0.125 2024-09-15 19:07:20,983 INFO [train.py:1198] (0/2) Epoch 19, batch 1450, loss[loss=0.2604, ctc_loss=0.1808, cr_loss=0.3984, over 20294.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1646, cr_loss=0.3848, over 4079919.55 frames. ], batch size: 74, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:08:01,232 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.65 vs. limit=22.5 2024-09-15 19:08:04,730 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.034e+02 2.159e+02 2.290e+02 3.110e+02, threshold=4.319e+02, percent-clipped=0.0 2024-09-15 19:08:07,070 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2024-09-15 19:08:11,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=330125.8333333333, ans=0.125 2024-09-15 19:08:14,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=330125.8333333333, ans=0.2 2024-09-15 19:08:36,548 INFO [train.py:1198] (0/2) Epoch 19, batch 1500, loss[loss=0.3028, ctc_loss=0.2239, cr_loss=0.3947, over 14467.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1636, cr_loss=0.3828, over 4087770.84 frames. ], batch size: 149, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:08:43,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=330182.5, ans=0.125 2024-09-15 19:08:50,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=330210.8333333333, ans=0.0 2024-09-15 19:08:52,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=12.0 2024-09-15 19:08:55,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=330210.8333333333, ans=0.025 2024-09-15 19:08:55,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330210.8333333333, ans=0.1 2024-09-15 19:09:55,204 INFO [train.py:1198] (0/2) Epoch 19, batch 1550, loss[loss=0.2627, ctc_loss=0.18, cr_loss=0.4132, over 20933.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.163, cr_loss=0.3817, over 4099601.62 frames. ], batch size: 64, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:10:03,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.58 vs. limit=22.5 2024-09-15 19:10:09,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330352.5, ans=0.1 2024-09-15 19:10:36,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=330380.8333333333, ans=0.0 2024-09-15 19:10:39,145 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.059e+02 2.184e+02 2.376e+02 3.606e+02, threshold=4.368e+02, percent-clipped=0.0 2024-09-15 19:10:54,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=330437.5, ans=0.0 2024-09-15 19:11:11,094 INFO [train.py:1198] (0/2) Epoch 19, batch 1600, loss[loss=0.2651, ctc_loss=0.1791, cr_loss=0.4301, over 20653.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.163, cr_loss=0.3824, over 4107412.74 frames. ], batch size: 66, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:11:11,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=330465.8333333333, ans=0.125 2024-09-15 19:12:03,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2024-09-15 19:12:11,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=330550.8333333333, ans=0.0 2024-09-15 19:12:20,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=330579.1666666667, ans=0.0 2024-09-15 19:12:29,795 INFO [train.py:1198] (0/2) Epoch 19, batch 1650, loss[loss=0.2535, ctc_loss=0.1746, cr_loss=0.3945, over 20662.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1637, cr_loss=0.3833, over 4103072.30 frames. ], batch size: 68, lr: 4.43e-03, grad_scale: 32.0 2024-09-15 19:12:33,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=330607.5, ans=0.125 2024-09-15 19:12:33,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=330607.5, ans=15.0 2024-09-15 19:13:14,196 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.089e+02 2.262e+02 2.450e+02 6.551e+02, threshold=4.523e+02, percent-clipped=1.0 2024-09-15 19:13:32,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=330720.8333333333, ans=0.035 2024-09-15 19:13:45,979 INFO [train.py:1198] (0/2) Epoch 19, batch 1700, loss[loss=0.2151, ctc_loss=0.1454, cr_loss=0.3487, over 20960.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1635, cr_loss=0.3828, over 4090264.71 frames. ], batch size: 50, lr: 4.42e-03, grad_scale: 32.0 2024-09-15 19:13:58,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=330749.1666666667, ans=0.0 2024-09-15 19:14:06,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.47 vs. limit=22.5 2024-09-15 19:14:32,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330834.1666666667, ans=0.1 2024-09-15 19:14:34,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330834.1666666667, ans=0.1 2024-09-15 19:14:35,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=330834.1666666667, ans=0.0 2024-09-15 19:15:00,668 INFO [train.py:1198] (0/2) Epoch 19, batch 1750, loss[loss=0.246, ctc_loss=0.1672, cr_loss=0.3936, over 20949.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1631, cr_loss=0.3822, over 4084661.94 frames. ], batch size: 60, lr: 4.42e-03, grad_scale: 32.0 2024-09-15 19:15:30,372 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.64 vs. limit=10.0 2024-09-15 19:15:47,978 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.058e+02 2.227e+02 2.416e+02 3.846e+02, threshold=4.455e+02, percent-clipped=0.0 2024-09-15 19:16:12,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=331004.1666666667, ans=0.2 2024-09-15 19:16:14,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=331004.1666666667, ans=0.125 2024-09-15 19:16:19,290 INFO [train.py:1198] (0/2) Epoch 19, batch 1800, loss[loss=0.2312, ctc_loss=0.1563, cr_loss=0.3743, over 20963.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1634, cr_loss=0.3827, over 4080752.19 frames. ], batch size: 58, lr: 4.42e-03, grad_scale: 32.0 2024-09-15 19:16:34,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=331060.8333333333, ans=0.0 2024-09-15 19:16:42,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=331060.8333333333, ans=0.125 2024-09-15 19:16:45,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=331060.8333333333, ans=0.2 2024-09-15 19:17:37,450 INFO [train.py:1198] (0/2) Epoch 19, batch 1850, loss[loss=0.2356, ctc_loss=0.1575, cr_loss=0.3907, over 21057.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1639, cr_loss=0.384, over 4079362.71 frames. ], batch size: 56, lr: 4.42e-03, grad_scale: 32.0 2024-09-15 19:18:03,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=331202.5, ans=0.125 2024-09-15 19:18:21,369 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.060e+02 2.239e+02 2.481e+02 5.048e+02, threshold=4.478e+02, percent-clipped=2.0 2024-09-15 19:18:23,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=331259.1666666667, ans=0.125 2024-09-15 19:18:53,320 INFO [train.py:1198] (0/2) Epoch 19, batch 1900, loss[loss=0.2404, ctc_loss=0.1609, cr_loss=0.397, over 20965.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1638, cr_loss=0.3841, over 4074567.28 frames. ], batch size: 55, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:19:14,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.33 vs. limit=12.0 2024-09-15 19:20:00,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=331429.1666666667, ans=10.0 2024-09-15 19:20:08,977 INFO [train.py:1198] (0/2) Epoch 19, batch 1950, loss[loss=0.2137, ctc_loss=0.1419, cr_loss=0.359, over 20299.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1633, cr_loss=0.3838, over 4077505.30 frames. ], batch size: 45, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:20:41,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=331514.1666666667, ans=22.5 2024-09-15 19:20:52,593 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.129e+02 2.298e+02 2.458e+02 3.297e+02, threshold=4.596e+02, percent-clipped=0.0 2024-09-15 19:21:27,164 INFO [train.py:1198] (0/2) Epoch 19, batch 2000, loss[loss=0.2263, ctc_loss=0.1527, cr_loss=0.3683, over 20959.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1621, cr_loss=0.3819, over 4089466.36 frames. ], batch size: 51, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:21:39,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331599.1666666667, ans=0.1 2024-09-15 19:21:54,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=331627.5, ans=0.125 2024-09-15 19:22:43,063 INFO [train.py:1198] (0/2) Epoch 19, batch 2050, loss[loss=0.2783, ctc_loss=0.1928, cr_loss=0.4275, over 20836.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1626, cr_loss=0.3819, over 4086199.03 frames. ], batch size: 65, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:22:54,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=331740.8333333333, ans=0.125 2024-09-15 19:23:29,665 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.766e+02 2.010e+02 2.163e+02 2.371e+02 3.289e+02, threshold=4.327e+02, percent-clipped=0.0 2024-09-15 19:23:37,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=331825.8333333333, ans=0.125 2024-09-15 19:24:00,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331882.5, ans=0.1 2024-09-15 19:24:01,339 INFO [train.py:1198] (0/2) Epoch 19, batch 2100, loss[loss=0.2336, ctc_loss=0.1585, cr_loss=0.3756, over 20883.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1629, cr_loss=0.3828, over 4097225.76 frames. ], batch size: 54, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:24:01,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331882.5, ans=0.1 2024-09-15 19:24:03,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=331882.5, ans=0.0 2024-09-15 19:24:17,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=331910.8333333333, ans=0.125 2024-09-15 19:24:18,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=331910.8333333333, ans=0.125 2024-09-15 19:24:28,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=331910.8333333333, ans=0.0 2024-09-15 19:24:36,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=331939.1666666667, ans=0.0 2024-09-15 19:24:45,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-09-15 19:25:03,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=331995.8333333333, ans=0.0 2024-09-15 19:25:16,816 INFO [train.py:1198] (0/2) Epoch 19, batch 2150, loss[loss=0.2377, ctc_loss=0.161, cr_loss=0.3837, over 21014.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1623, cr_loss=0.3825, over 4101543.20 frames. ], batch size: 62, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:25:38,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=332052.5, ans=0.035 2024-09-15 19:25:56,833 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2024-09-15 19:26:00,347 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.023e+02 2.181e+02 2.338e+02 3.449e+02, threshold=4.361e+02, percent-clipped=0.0 2024-09-15 19:26:12,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332109.1666666667, ans=0.1 2024-09-15 19:26:21,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=332137.5, ans=0.0 2024-09-15 19:26:32,070 INFO [train.py:1198] (0/2) Epoch 19, batch 2200, loss[loss=0.3017, ctc_loss=0.2201, cr_loss=0.4079, over 14453.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1631, cr_loss=0.3834, over 4086013.00 frames. ], batch size: 149, lr: 4.42e-03, grad_scale: 64.0 2024-09-15 19:26:50,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=332194.1666666667, ans=0.0 2024-09-15 19:26:53,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=332194.1666666667, ans=0.0 2024-09-15 19:26:56,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332194.1666666667, ans=0.1 2024-09-15 19:27:32,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=332250.8333333333, ans=0.0 2024-09-15 19:27:36,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=12.0 2024-09-15 19:27:40,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=332279.1666666667, ans=0.125 2024-09-15 19:27:50,699 INFO [train.py:1198] (0/2) Epoch 19, batch 2250, loss[loss=0.2605, ctc_loss=0.1782, cr_loss=0.4119, over 19621.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1629, cr_loss=0.3819, over 4089579.31 frames. ], batch size: 90, lr: 4.41e-03, grad_scale: 64.0 2024-09-15 19:27:50,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=332307.5, ans=0.0 2024-09-15 19:28:10,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=332335.8333333333, ans=0.125 2024-09-15 19:28:24,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=332364.1666666667, ans=0.125 2024-09-15 19:28:35,085 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.053e+02 2.220e+02 2.394e+02 3.283e+02, threshold=4.440e+02, percent-clipped=0.0 2024-09-15 19:28:46,587 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.12 vs. limit=22.5 2024-09-15 19:29:09,776 INFO [train.py:1198] (0/2) Epoch 19, batch 2300, loss[loss=0.2813, ctc_loss=0.2052, cr_loss=0.3806, over 14682.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1629, cr_loss=0.3818, over 4084157.28 frames. ], batch size: 149, lr: 4.41e-03, grad_scale: 64.0 2024-09-15 19:29:15,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=15.0 2024-09-15 19:29:52,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=332505.8333333333, ans=0.125 2024-09-15 19:30:25,018 INFO [train.py:1198] (0/2) Epoch 19, batch 2350, loss[loss=0.1905, ctc_loss=0.1241, cr_loss=0.3321, over 21006.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1627, cr_loss=0.3818, over 4086181.05 frames. ], batch size: 52, lr: 4.41e-03, grad_scale: 64.0 2024-09-15 19:30:43,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=332619.1666666667, ans=0.5 2024-09-15 19:31:09,044 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.073e+02 2.181e+02 2.410e+02 3.114e+02, threshold=4.363e+02, percent-clipped=0.0 2024-09-15 19:31:23,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=332675.8333333333, ans=10.0 2024-09-15 19:31:23,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=332675.8333333333, ans=0.125 2024-09-15 19:31:40,771 INFO [train.py:1198] (0/2) Epoch 19, batch 2400, loss[loss=0.2094, ctc_loss=0.1407, cr_loss=0.3435, over 20988.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1625, cr_loss=0.3816, over 4089328.41 frames. ], batch size: 50, lr: 4.41e-03, grad_scale: 64.0 2024-09-15 19:31:50,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2024-09-15 19:32:46,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332845.8333333333, ans=0.1 2024-09-15 19:32:59,217 INFO [train.py:1198] (0/2) Epoch 19, batch 2450, loss[loss=0.2369, ctc_loss=0.1612, cr_loss=0.3785, over 21041.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1627, cr_loss=0.3821, over 4086907.92 frames. ], batch size: 62, lr: 4.41e-03, grad_scale: 32.0 2024-09-15 19:33:08,664 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 19:33:44,346 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.670e+02 2.073e+02 2.210e+02 2.379e+02 3.922e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-15 19:33:56,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=332959.1666666667, ans=0.1 2024-09-15 19:33:59,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=332987.5, ans=0.5 2024-09-15 19:34:14,673 INFO [train.py:1198] (0/2) Epoch 19, batch 2500, loss[loss=0.2138, ctc_loss=0.1438, cr_loss=0.3499, over 20994.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1635, cr_loss=0.3837, over 4085742.81 frames. ], batch size: 51, lr: 4.41e-03, grad_scale: 32.0 2024-09-15 19:34:44,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333044.1666666667, ans=0.1 2024-09-15 19:34:49,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=333072.5, ans=0.125 2024-09-15 19:35:02,616 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 19:35:07,568 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.01 vs. limit=10.0 2024-09-15 19:35:33,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=22.5 2024-09-15 19:35:34,247 INFO [train.py:1198] (0/2) Epoch 19, batch 2550, loss[loss=0.2466, ctc_loss=0.1721, cr_loss=0.3724, over 20665.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1641, cr_loss=0.3846, over 4094424.59 frames. ], batch size: 66, lr: 4.41e-03, grad_scale: 32.0 2024-09-15 19:35:37,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=333157.5, ans=0.125 2024-09-15 19:36:12,321 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 19:36:21,326 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.108e+02 2.254e+02 2.456e+02 3.449e+02, threshold=4.509e+02, percent-clipped=0.0 2024-09-15 19:36:25,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=12.0 2024-09-15 19:36:32,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=333242.5, ans=0.125 2024-09-15 19:36:35,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=333270.8333333333, ans=0.0 2024-09-15 19:36:50,591 INFO [train.py:1198] (0/2) Epoch 19, batch 2600, loss[loss=0.2522, ctc_loss=0.1737, cr_loss=0.3924, over 20395.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1632, cr_loss=0.3837, over 4091192.79 frames. ], batch size: 74, lr: 4.41e-03, grad_scale: 16.0 2024-09-15 19:37:01,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=333299.1666666667, ans=0.0 2024-09-15 19:37:04,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=333327.5, ans=0.2 2024-09-15 19:37:37,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=333384.1666666667, ans=0.0 2024-09-15 19:37:39,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2024-09-15 19:38:08,875 INFO [train.py:1198] (0/2) Epoch 19, batch 2650, loss[loss=0.2325, ctc_loss=0.1577, cr_loss=0.3737, over 21059.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1628, cr_loss=0.3831, over 4097948.50 frames. ], batch size: 56, lr: 4.41e-03, grad_scale: 16.0 2024-09-15 19:38:35,387 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-09-15 19:38:55,732 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.099e+02 2.211e+02 2.321e+02 3.833e+02, threshold=4.421e+02, percent-clipped=0.0 2024-09-15 19:39:24,365 INFO [train.py:1198] (0/2) Epoch 19, batch 2700, loss[loss=0.2083, ctc_loss=0.1404, cr_loss=0.3396, over 20923.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1629, cr_loss=0.383, over 4078987.72 frames. ], batch size: 49, lr: 4.41e-03, grad_scale: 16.0 2024-09-15 19:39:38,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=333610.8333333333, ans=0.95 2024-09-15 19:39:38,555 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-15 19:39:43,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2024-09-15 19:39:54,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=333639.1666666667, ans=0.0 2024-09-15 19:40:43,007 INFO [train.py:1198] (0/2) Epoch 19, batch 2750, loss[loss=0.3027, ctc_loss=0.2088, cr_loss=0.4695, over 18397.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1638, cr_loss=0.384, over 4082387.72 frames. ], batch size: 108, lr: 4.41e-03, grad_scale: 16.0 2024-09-15 19:41:06,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=333752.5, ans=0.125 2024-09-15 19:41:29,865 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.091e+02 2.189e+02 2.423e+02 3.049e+02, threshold=4.378e+02, percent-clipped=0.0 2024-09-15 19:41:36,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=333809.1666666667, ans=0.125 2024-09-15 19:41:58,694 INFO [train.py:1198] (0/2) Epoch 19, batch 2800, loss[loss=0.2476, ctc_loss=0.1679, cr_loss=0.3987, over 21039.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1633, cr_loss=0.3831, over 4084298.24 frames. ], batch size: 62, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:42:06,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=333865.8333333333, ans=0.0 2024-09-15 19:42:26,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=333894.1666666667, ans=0.125 2024-09-15 19:43:08,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-15 19:43:14,218 INFO [train.py:1198] (0/2) Epoch 19, batch 2850, loss[loss=0.2393, ctc_loss=0.1632, cr_loss=0.3807, over 20978.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1644, cr_loss=0.3849, over 4089440.90 frames. ], batch size: 55, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:43:19,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.65 vs. limit=10.0 2024-09-15 19:44:04,417 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.046e+02 2.163e+02 2.323e+02 4.362e+02, threshold=4.326e+02, percent-clipped=0.0 2024-09-15 19:44:23,239 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2024-09-15 19:44:25,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=334120.8333333333, ans=0.125 2024-09-15 19:44:32,938 INFO [train.py:1198] (0/2) Epoch 19, batch 2900, loss[loss=0.2069, ctc_loss=0.1398, cr_loss=0.3353, over 20967.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1635, cr_loss=0.3836, over 4095663.04 frames. ], batch size: 51, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:44:52,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=334177.5, ans=0.125 2024-09-15 19:44:53,742 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.78 vs. limit=5.0 2024-09-15 19:45:18,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=334234.1666666667, ans=0.125 2024-09-15 19:45:25,361 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2024-09-15 19:45:48,763 INFO [train.py:1198] (0/2) Epoch 19, batch 2950, loss[loss=0.272, ctc_loss=0.186, cr_loss=0.4301, over 19306.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1632, cr_loss=0.383, over 4098311.93 frames. ], batch size: 90, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:45:50,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=334290.8333333333, ans=0.125 2024-09-15 19:45:58,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=334290.8333333333, ans=0.0 2024-09-15 19:46:07,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=334319.1666666667, ans=0.2 2024-09-15 19:46:26,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2024-09-15 19:46:37,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334375.8333333333, ans=0.1 2024-09-15 19:46:38,688 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.074e+02 2.223e+02 2.386e+02 3.259e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-15 19:46:44,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-15 19:46:48,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=334375.8333333333, ans=0.125 2024-09-15 19:47:07,108 INFO [train.py:1198] (0/2) Epoch 19, batch 3000, loss[loss=0.1854, ctc_loss=0.1241, cr_loss=0.3066, over 20338.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1631, cr_loss=0.3823, over 4096876.20 frames. ], batch size: 45, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:47:07,110 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 19:47:27,925 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7588, 5.3034, 5.5835, 4.9878], device='cuda:0') 2024-09-15 19:47:28,541 INFO [train.py:1230] (0/2) Epoch 19, validation: loss=0.04462, ctc_loss=0.04462, cr_loss=1.059e-14, over 944034.00 frames. 2024-09-15 19:47:28,541 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 19:47:59,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334489.1666666667, ans=0.1 2024-09-15 19:48:03,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=334489.1666666667, ans=0.025 2024-09-15 19:48:12,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=334517.5, ans=0.125 2024-09-15 19:48:26,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2024-09-15 19:48:43,610 INFO [train.py:1198] (0/2) Epoch 19, batch 3050, loss[loss=0.2174, ctc_loss=0.1477, cr_loss=0.3481, over 20899.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1634, cr_loss=0.3831, over 4103855.52 frames. ], batch size: 54, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:48:59,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=334602.5, ans=0.125 2024-09-15 19:49:33,505 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.080e+02 2.248e+02 2.510e+02 8.196e+02, threshold=4.496e+02, percent-clipped=1.0 2024-09-15 19:49:38,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334659.1666666667, ans=0.1 2024-09-15 19:49:41,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=334659.1666666667, ans=0.0 2024-09-15 19:49:54,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2024-09-15 19:50:02,720 INFO [train.py:1198] (0/2) Epoch 19, batch 3100, loss[loss=0.2213, ctc_loss=0.1457, cr_loss=0.3781, over 20972.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1629, cr_loss=0.3829, over 4106417.83 frames. ], batch size: 51, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:50:12,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=334715.8333333333, ans=0.125 2024-09-15 19:50:20,485 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.36 vs. limit=15.0 2024-09-15 19:50:21,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=334744.1666666667, ans=0.0 2024-09-15 19:50:30,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=334744.1666666667, ans=0.2 2024-09-15 19:51:19,285 INFO [train.py:1198] (0/2) Epoch 19, batch 3150, loss[loss=0.2193, ctc_loss=0.1517, cr_loss=0.338, over 21074.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1629, cr_loss=0.3832, over 4107774.29 frames. ], batch size: 53, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:51:24,076 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 19:51:34,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=334885.8333333333, ans=0.125 2024-09-15 19:51:38,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-09-15 19:52:08,801 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.084e+02 2.200e+02 2.399e+02 3.213e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-15 19:52:09,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=334942.5, ans=10.0 2024-09-15 19:52:15,762 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-09-15 19:52:37,493 INFO [train.py:1198] (0/2) Epoch 19, batch 3200, loss[loss=0.2219, ctc_loss=0.1488, cr_loss=0.3654, over 20790.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1629, cr_loss=0.3839, over 4100174.83 frames. ], batch size: 53, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:52:45,308 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 19:52:56,485 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.07 vs. limit=15.0 2024-09-15 19:52:57,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2024-09-15 19:53:07,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=335055.8333333333, ans=0.0 2024-09-15 19:53:24,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=335084.1666666667, ans=0.125 2024-09-15 19:53:47,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=335112.5, ans=0.0 2024-09-15 19:53:53,188 INFO [train.py:1198] (0/2) Epoch 19, batch 3250, loss[loss=0.2442, ctc_loss=0.1685, cr_loss=0.3782, over 20862.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1632, cr_loss=0.3835, over 4100664.41 frames. ], batch size: 65, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:53:58,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-15 19:54:40,016 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.017e+02 2.171e+02 2.442e+02 9.883e+02, threshold=4.342e+02, percent-clipped=1.0 2024-09-15 19:54:43,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-09-15 19:54:50,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=335225.8333333333, ans=0.125 2024-09-15 19:55:02,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=335254.1666666667, ans=0.5 2024-09-15 19:55:11,522 INFO [train.py:1198] (0/2) Epoch 19, batch 3300, loss[loss=0.263, ctc_loss=0.1834, cr_loss=0.3979, over 14807.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1627, cr_loss=0.3826, over 4094710.60 frames. ], batch size: 149, lr: 4.40e-03, grad_scale: 32.0 2024-09-15 19:55:25,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=335310.8333333333, ans=0.125 2024-09-15 19:55:25,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=335310.8333333333, ans=0.2 2024-09-15 19:55:32,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=335310.8333333333, ans=0.125 2024-09-15 19:55:38,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=335310.8333333333, ans=0.125 2024-09-15 19:55:46,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=335339.1666666667, ans=0.2 2024-09-15 19:55:52,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=335339.1666666667, ans=0.2 2024-09-15 19:56:26,130 INFO [train.py:1198] (0/2) Epoch 19, batch 3350, loss[loss=0.2754, ctc_loss=0.1894, cr_loss=0.4303, over 20974.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1645, cr_loss=0.3853, over 4092720.12 frames. ], batch size: 67, lr: 4.39e-03, grad_scale: 32.0 2024-09-15 19:56:32,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=335424.1666666667, ans=0.125 2024-09-15 19:56:37,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335424.1666666667, ans=0.1 2024-09-15 19:56:38,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=335424.1666666667, ans=0.1 2024-09-15 19:56:55,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=335480.8333333333, ans=0.125 2024-09-15 19:57:12,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2024-09-15 19:57:13,076 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.093e+02 2.200e+02 2.394e+02 8.592e+02, threshold=4.400e+02, percent-clipped=2.0 2024-09-15 19:57:33,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335537.5, ans=0.1 2024-09-15 19:57:34,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=335537.5, ans=0.035 2024-09-15 19:57:39,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-09-15 19:57:44,884 INFO [train.py:1198] (0/2) Epoch 19, batch 3400, loss[loss=0.288, ctc_loss=0.2043, cr_loss=0.4186, over 18255.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1642, cr_loss=0.3851, over 4099693.95 frames. ], batch size: 108, lr: 4.39e-03, grad_scale: 32.0 2024-09-15 19:57:52,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=335565.8333333333, ans=0.125 2024-09-15 19:58:03,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=335594.1666666667, ans=0.0 2024-09-15 19:58:21,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=22.5 2024-09-15 19:59:00,748 INFO [train.py:1198] (0/2) Epoch 19, batch 3450, loss[loss=0.2914, ctc_loss=0.2113, cr_loss=0.4003, over 14399.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1636, cr_loss=0.3837, over 4091395.56 frames. ], batch size: 149, lr: 4.39e-03, grad_scale: 32.0 2024-09-15 19:59:49,284 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.053e+02 2.152e+02 2.370e+02 2.979e+02, threshold=4.304e+02, percent-clipped=0.0 2024-09-15 20:00:16,500 INFO [train.py:1198] (0/2) Epoch 19, batch 3500, loss[loss=0.2702, ctc_loss=0.1861, cr_loss=0.4205, over 19553.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1636, cr_loss=0.3841, over 4089994.21 frames. ], batch size: 90, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:00:45,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=335877.5, ans=0.2 2024-09-15 20:01:32,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=335962.5, ans=0.125 2024-09-15 20:01:35,115 INFO [train.py:1198] (0/2) Epoch 19, batch 3550, loss[loss=0.2191, ctc_loss=0.1486, cr_loss=0.3526, over 20387.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1648, cr_loss=0.3861, over 4081985.51 frames. ], batch size: 45, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:01:53,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=336019.1666666667, ans=0.2 2024-09-15 20:02:03,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336019.1666666667, ans=0.1 2024-09-15 20:02:23,855 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.054e+02 2.146e+02 2.355e+02 6.585e+02, threshold=4.293e+02, percent-clipped=1.0 2024-09-15 20:02:51,208 INFO [train.py:1198] (0/2) Epoch 19, batch 3600, loss[loss=0.2067, ctc_loss=0.1383, cr_loss=0.3419, over 20931.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1641, cr_loss=0.3851, over 4097389.33 frames. ], batch size: 49, lr: 4.39e-03, grad_scale: 32.0 2024-09-15 20:03:58,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=336245.8333333333, ans=0.0 2024-09-15 20:04:04,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=336245.8333333333, ans=0.2 2024-09-15 20:04:10,426 INFO [train.py:1198] (0/2) Epoch 19, batch 3650, loss[loss=0.1945, ctc_loss=0.13, cr_loss=0.3226, over 20966.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.165, cr_loss=0.3862, over 4086638.79 frames. ], batch size: 48, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:04:10,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=336274.1666666667, ans=0.0 2024-09-15 20:04:33,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=336302.5, ans=0.035 2024-09-15 20:04:51,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=336330.8333333333, ans=10.0 2024-09-15 20:04:54,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=336359.1666666667, ans=0.0 2024-09-15 20:04:59,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=336359.1666666667, ans=0.125 2024-09-15 20:05:00,554 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.085e+02 2.200e+02 2.356e+02 3.013e+02, threshold=4.400e+02, percent-clipped=0.0 2024-09-15 20:05:07,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2024-09-15 20:05:25,986 INFO [train.py:1198] (0/2) Epoch 19, batch 3700, loss[loss=0.2329, ctc_loss=0.156, cr_loss=0.3847, over 21045.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1648, cr_loss=0.385, over 4090874.87 frames. ], batch size: 56, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:05:56,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=336472.5, ans=0.0 2024-09-15 20:06:05,652 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:06:18,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=336500.8333333333, ans=0.125 2024-09-15 20:06:26,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=336529.1666666667, ans=0.125 2024-09-15 20:06:44,482 INFO [train.py:1198] (0/2) Epoch 19, batch 3750, loss[loss=0.2456, ctc_loss=0.1658, cr_loss=0.3989, over 21053.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1637, cr_loss=0.3839, over 4104309.59 frames. ], batch size: 62, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:07:34,500 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.024e+02 2.132e+02 2.275e+02 4.276e+02, threshold=4.264e+02, percent-clipped=0.0 2024-09-15 20:07:35,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-15 20:07:36,746 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=12.0 2024-09-15 20:08:00,523 INFO [train.py:1198] (0/2) Epoch 19, batch 3800, loss[loss=0.2221, ctc_loss=0.1506, cr_loss=0.3573, over 20993.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.163, cr_loss=0.3828, over 4101513.68 frames. ], batch size: 52, lr: 4.39e-03, grad_scale: 16.0 2024-09-15 20:08:15,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=336727.5, ans=0.125 2024-09-15 20:08:18,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-15 20:08:44,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=336784.1666666667, ans=0.0 2024-09-15 20:09:06,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=336812.5, ans=0.125 2024-09-15 20:09:17,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-09-15 20:09:19,770 INFO [train.py:1198] (0/2) Epoch 19, batch 3850, loss[loss=0.2274, ctc_loss=0.1543, cr_loss=0.3653, over 21047.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1632, cr_loss=0.3836, over 4097421.84 frames. ], batch size: 53, lr: 4.38e-03, grad_scale: 16.0 2024-09-15 20:09:35,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=336869.1666666667, ans=0.125 2024-09-15 20:09:44,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.79 vs. limit=10.0 2024-09-15 20:10:06,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=336925.8333333333, ans=0.1 2024-09-15 20:10:09,470 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.013e+02 2.230e+02 2.392e+02 3.805e+02, threshold=4.460e+02, percent-clipped=0.0 2024-09-15 20:10:20,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=336954.1666666667, ans=0.125 2024-09-15 20:10:35,109 INFO [train.py:1198] (0/2) Epoch 19, batch 3900, loss[loss=0.2334, ctc_loss=0.158, cr_loss=0.3769, over 20966.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1632, cr_loss=0.3838, over 4095682.02 frames. ], batch size: 58, lr: 4.38e-03, grad_scale: 16.0 2024-09-15 20:10:36,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=336982.5, ans=0.125 2024-09-15 20:11:00,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-09-15 20:11:11,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-09-15 20:11:14,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=337039.1666666667, ans=0.125 2024-09-15 20:11:30,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=337067.5, ans=0.0 2024-09-15 20:11:50,977 INFO [train.py:1198] (0/2) Epoch 19, batch 3950, loss[loss=0.2554, ctc_loss=0.1738, cr_loss=0.4081, over 20820.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1635, cr_loss=0.3841, over 4096819.69 frames. ], batch size: 59, lr: 4.38e-03, grad_scale: 16.0 2024-09-15 20:12:30,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337180.8333333333, ans=0.1 2024-09-15 20:12:44,660 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.059e+02 2.164e+02 2.382e+02 4.370e+02, threshold=4.329e+02, percent-clipped=0.0 2024-09-15 20:12:44,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=337209.1666666667, ans=0.015 2024-09-15 20:12:49,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=22.5 2024-09-15 20:12:50,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=337209.1666666667, ans=0.07 2024-09-15 20:13:07,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337237.5, ans=0.1 2024-09-15 20:13:10,476 INFO [train.py:1198] (0/2) Epoch 19, batch 4000, loss[loss=0.208, ctc_loss=0.1381, cr_loss=0.3496, over 21066.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1618, cr_loss=0.3815, over 4107755.00 frames. ], batch size: 56, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:14:25,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=337407.5, ans=0.125 2024-09-15 20:14:26,244 INFO [train.py:1198] (0/2) Epoch 19, batch 4050, loss[loss=0.2529, ctc_loss=0.1745, cr_loss=0.3919, over 21042.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1623, cr_loss=0.3819, over 4097576.56 frames. ], batch size: 62, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:14:26,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337407.5, ans=0.1 2024-09-15 20:14:38,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=22.5 2024-09-15 20:15:14,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337492.5, ans=0.1 2024-09-15 20:15:18,800 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.754e+02 2.052e+02 2.195e+02 2.416e+02 4.050e+02, threshold=4.390e+02, percent-clipped=0.0 2024-09-15 20:15:43,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337549.1666666667, ans=0.1 2024-09-15 20:15:44,749 INFO [train.py:1198] (0/2) Epoch 19, batch 4100, loss[loss=0.2347, ctc_loss=0.1592, cr_loss=0.3774, over 21049.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1626, cr_loss=0.3823, over 4103633.45 frames. ], batch size: 62, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:16:06,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=337577.5, ans=0.125 2024-09-15 20:16:49,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=337662.5, ans=0.0 2024-09-15 20:16:55,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337662.5, ans=0.1 2024-09-15 20:16:55,695 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=22.5 2024-09-15 20:16:59,329 INFO [train.py:1198] (0/2) Epoch 19, batch 4150, loss[loss=0.2325, ctc_loss=0.1593, cr_loss=0.3661, over 20976.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1623, cr_loss=0.3827, over 4099924.11 frames. ], batch size: 51, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:16:59,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=337690.8333333333, ans=0.0 2024-09-15 20:17:21,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-15 20:17:23,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=337719.1666666667, ans=0.04949747468305833 2024-09-15 20:17:26,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=337719.1666666667, ans=0.125 2024-09-15 20:17:43,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=337775.8333333333, ans=0.125 2024-09-15 20:17:49,311 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.125e+02 2.270e+02 2.446e+02 3.860e+02, threshold=4.540e+02, percent-clipped=0.0 2024-09-15 20:18:16,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=337832.5, ans=0.0 2024-09-15 20:18:17,831 INFO [train.py:1198] (0/2) Epoch 19, batch 4200, loss[loss=0.2526, ctc_loss=0.171, cr_loss=0.4076, over 20879.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1625, cr_loss=0.3824, over 4096562.81 frames. ], batch size: 57, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:18:30,863 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2024-09-15 20:18:31,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=337860.8333333333, ans=0.125 2024-09-15 20:18:51,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337889.1666666667, ans=0.1 2024-09-15 20:18:52,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337889.1666666667, ans=0.1 2024-09-15 20:19:10,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=337917.5, ans=0.125 2024-09-15 20:19:21,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=337945.8333333333, ans=0.0 2024-09-15 20:19:33,302 INFO [train.py:1198] (0/2) Epoch 19, batch 4250, loss[loss=0.3055, ctc_loss=0.2179, cr_loss=0.4377, over 14852.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1622, cr_loss=0.3827, over 4093607.82 frames. ], batch size: 149, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:19:36,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=337974.1666666667, ans=0.125 2024-09-15 20:20:25,996 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.078e+02 2.214e+02 2.366e+02 5.564e+02, threshold=4.427e+02, percent-clipped=2.0 2024-09-15 20:20:35,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338087.5, ans=0.1 2024-09-15 20:20:35,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=338087.5, ans=0.125 2024-09-15 20:20:40,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.94 vs. limit=10.0 2024-09-15 20:20:47,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=338087.5, ans=0.125 2024-09-15 20:20:50,880 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-09-15 20:20:51,774 INFO [train.py:1198] (0/2) Epoch 19, batch 4300, loss[loss=0.2423, ctc_loss=0.166, cr_loss=0.3816, over 20776.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1633, cr_loss=0.384, over 4081301.11 frames. ], batch size: 53, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:20:52,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=338115.8333333333, ans=0.125 2024-09-15 20:21:07,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338144.1666666667, ans=0.1 2024-09-15 20:21:17,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=338144.1666666667, ans=0.0 2024-09-15 20:21:25,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=338172.5, ans=0.2 2024-09-15 20:21:42,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-15 20:21:48,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=338200.8333333333, ans=0.125 2024-09-15 20:21:51,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=338229.1666666667, ans=0.125 2024-09-15 20:21:52,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=22.5 2024-09-15 20:22:07,761 INFO [train.py:1198] (0/2) Epoch 19, batch 4350, loss[loss=0.2535, ctc_loss=0.1741, cr_loss=0.3972, over 20960.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1625, cr_loss=0.3833, over 4092816.41 frames. ], batch size: 58, lr: 4.38e-03, grad_scale: 32.0 2024-09-15 20:22:19,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-09-15 20:22:26,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=338285.8333333333, ans=0.125 2024-09-15 20:22:30,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=338285.8333333333, ans=0.125 2024-09-15 20:22:34,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-15 20:22:57,849 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.078e+02 2.191e+02 2.349e+02 5.066e+02, threshold=4.382e+02, percent-clipped=2.0 2024-09-15 20:22:59,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338342.5, ans=0.1 2024-09-15 20:23:15,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=338370.8333333333, ans=0.0 2024-09-15 20:23:23,859 INFO [train.py:1198] (0/2) Epoch 19, batch 4400, loss[loss=0.2557, ctc_loss=0.1776, cr_loss=0.3903, over 20970.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1625, cr_loss=0.3836, over 4104673.93 frames. ], batch size: 58, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:23:48,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=338427.5, ans=0.0 2024-09-15 20:23:52,019 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:23:57,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-09-15 20:24:05,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=338455.8333333333, ans=0.035 2024-09-15 20:24:28,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338512.5, ans=0.1 2024-09-15 20:24:43,501 INFO [train.py:1198] (0/2) Epoch 19, batch 4450, loss[loss=0.2548, ctc_loss=0.1762, cr_loss=0.3928, over 20835.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1625, cr_loss=0.3834, over 4107578.06 frames. ], batch size: 65, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:25:33,370 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.098e+02 2.200e+02 2.365e+02 3.307e+02, threshold=4.399e+02, percent-clipped=0.0 2024-09-15 20:25:33,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=338625.8333333333, ans=0.2 2024-09-15 20:25:47,596 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-09-15 20:25:48,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=338654.1666666667, ans=0.125 2024-09-15 20:25:59,150 INFO [train.py:1198] (0/2) Epoch 19, batch 4500, loss[loss=0.2337, ctc_loss=0.1558, cr_loss=0.3894, over 20631.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1625, cr_loss=0.3834, over 4107280.24 frames. ], batch size: 75, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:26:34,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338739.1666666667, ans=0.1 2024-09-15 20:26:39,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=338739.1666666667, ans=0.125 2024-09-15 20:26:45,263 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2024-09-15 20:26:47,503 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:27:15,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=338824.1666666667, ans=0.0 2024-09-15 20:27:16,824 INFO [train.py:1198] (0/2) Epoch 19, batch 4550, loss[loss=0.2534, ctc_loss=0.175, cr_loss=0.392, over 20349.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1631, cr_loss=0.384, over 4102193.31 frames. ], batch size: 74, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:27:23,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=338824.1666666667, ans=0.125 2024-09-15 20:28:06,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=338909.1666666667, ans=0.125 2024-09-15 20:28:07,411 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.072e+02 2.177e+02 2.383e+02 3.306e+02, threshold=4.353e+02, percent-clipped=0.0 2024-09-15 20:28:20,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=12.0 2024-09-15 20:28:21,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2024-09-15 20:28:33,218 INFO [train.py:1198] (0/2) Epoch 19, batch 4600, loss[loss=0.2297, ctc_loss=0.1548, cr_loss=0.3747, over 21003.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1625, cr_loss=0.3838, over 4110801.29 frames. ], batch size: 52, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:28:48,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=338994.1666666667, ans=0.125 2024-09-15 20:29:20,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=339050.8333333333, ans=0.125 2024-09-15 20:29:37,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=339079.1666666667, ans=0.2 2024-09-15 20:29:52,164 INFO [train.py:1198] (0/2) Epoch 19, batch 4650, loss[loss=0.2742, ctc_loss=0.1935, cr_loss=0.4036, over 14954.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.163, cr_loss=0.3843, over 4098886.59 frames. ], batch size: 149, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:30:33,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=339164.1666666667, ans=0.125 2024-09-15 20:30:41,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-15 20:30:41,953 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.070e+02 2.191e+02 2.427e+02 4.412e+02, threshold=4.382e+02, percent-clipped=1.0 2024-09-15 20:30:57,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=339220.8333333333, ans=0.1 2024-09-15 20:31:07,518 INFO [train.py:1198] (0/2) Epoch 19, batch 4700, loss[loss=0.2496, ctc_loss=0.1667, cr_loss=0.4146, over 20976.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1619, cr_loss=0.383, over 4089222.55 frames. ], batch size: 58, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:31:09,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=339249.1666666667, ans=0.025 2024-09-15 20:31:12,665 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:31:16,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=339249.1666666667, ans=0.1 2024-09-15 20:31:22,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=339277.5, ans=15.0 2024-09-15 20:31:31,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.92 vs. limit=10.0 2024-09-15 20:31:32,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=339277.5, ans=0.125 2024-09-15 20:31:47,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=339305.8333333333, ans=0.2 2024-09-15 20:31:48,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-09-15 20:32:25,791 INFO [train.py:1198] (0/2) Epoch 19, batch 4750, loss[loss=0.2142, ctc_loss=0.1447, cr_loss=0.3474, over 21021.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1627, cr_loss=0.3848, over 4091108.27 frames. ], batch size: 61, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:32:38,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.56 vs. limit=15.0 2024-09-15 20:32:39,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=339419.1666666667, ans=0.125 2024-09-15 20:32:47,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=339419.1666666667, ans=0.125 2024-09-15 20:33:16,050 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.051e+02 2.213e+02 2.424e+02 3.685e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-15 20:33:21,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=339475.8333333333, ans=0.125 2024-09-15 20:33:41,843 INFO [train.py:1198] (0/2) Epoch 19, batch 4800, loss[loss=0.2184, ctc_loss=0.1492, cr_loss=0.3461, over 21046.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1624, cr_loss=0.3836, over 4091656.51 frames. ], batch size: 53, lr: 4.37e-03, grad_scale: 32.0 2024-09-15 20:34:18,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=339589.1666666667, ans=0.0 2024-09-15 20:34:30,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=339617.5, ans=0.1 2024-09-15 20:34:36,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.79 vs. limit=12.0 2024-09-15 20:34:46,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=339645.8333333333, ans=0.2 2024-09-15 20:34:57,324 INFO [train.py:1198] (0/2) Epoch 19, batch 4850, loss[loss=0.1988, ctc_loss=0.132, cr_loss=0.3343, over 20992.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1621, cr_loss=0.3828, over 4087392.02 frames. ], batch size: 50, lr: 4.37e-03, grad_scale: 16.0 2024-09-15 20:35:26,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=339702.5, ans=0.0 2024-09-15 20:35:38,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=339730.8333333333, ans=0.2 2024-09-15 20:35:39,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=339730.8333333333, ans=0.125 2024-09-15 20:35:51,695 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.055e+02 2.168e+02 2.309e+02 3.263e+02, threshold=4.337e+02, percent-clipped=0.0 2024-09-15 20:36:15,796 INFO [train.py:1198] (0/2) Epoch 19, batch 4900, loss[loss=0.2542, ctc_loss=0.1738, cr_loss=0.4018, over 20956.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1619, cr_loss=0.3827, over 4096958.40 frames. ], batch size: 58, lr: 4.37e-03, grad_scale: 16.0 2024-09-15 20:36:42,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.27 vs. limit=15.0 2024-09-15 20:36:43,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=339844.1666666667, ans=0.125 2024-09-15 20:37:14,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=339929.1666666667, ans=0.1 2024-09-15 20:37:20,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=339929.1666666667, ans=0.2 2024-09-15 20:37:21,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=339929.1666666667, ans=0.2 2024-09-15 20:37:23,439 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:37:30,217 INFO [train.py:1198] (0/2) Epoch 19, batch 4950, loss[loss=0.2428, ctc_loss=0.161, cr_loss=0.409, over 20903.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1625, cr_loss=0.3839, over 4093823.01 frames. ], batch size: 54, lr: 4.36e-03, grad_scale: 16.0 2024-09-15 20:37:51,370 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-120000.pt 2024-09-15 20:38:00,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=340014.1666666667, ans=0.125 2024-09-15 20:38:16,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=340042.5, ans=0.07 2024-09-15 20:38:22,260 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.082e+02 2.229e+02 2.461e+02 3.381e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-15 20:38:42,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2024-09-15 20:38:48,748 INFO [train.py:1198] (0/2) Epoch 19, batch 5000, loss[loss=0.2619, ctc_loss=0.1827, cr_loss=0.3959, over 18440.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1633, cr_loss=0.3848, over 4085493.76 frames. ], batch size: 108, lr: 4.36e-03, grad_scale: 16.0 2024-09-15 20:39:11,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-09-15 20:39:12,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=340127.5, ans=10.0 2024-09-15 20:39:16,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-09-15 20:39:19,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=22.5 2024-09-15 20:39:50,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=340212.5, ans=0.0 2024-09-15 20:40:00,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=340212.5, ans=0.04949747468305833 2024-09-15 20:40:03,427 INFO [train.py:1198] (0/2) Epoch 19, batch 5050, loss[loss=0.271, ctc_loss=0.1914, cr_loss=0.398, over 20105.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1642, cr_loss=0.3854, over 4083955.39 frames. ], batch size: 80, lr: 4.36e-03, grad_scale: 16.0 2024-09-15 20:40:03,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=340240.8333333333, ans=0.125 2024-09-15 20:40:28,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340269.1666666667, ans=0.1 2024-09-15 20:40:51,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-09-15 20:40:53,656 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.148e+02 2.300e+02 2.499e+02 8.274e+02, threshold=4.601e+02, percent-clipped=1.0 2024-09-15 20:40:55,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=340325.8333333333, ans=0.125 2024-09-15 20:40:57,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340325.8333333333, ans=0.1 2024-09-15 20:41:10,533 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-15 20:41:17,163 INFO [train.py:1198] (0/2) Epoch 19, batch 5100, loss[loss=0.2227, ctc_loss=0.1487, cr_loss=0.3704, over 20923.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1641, cr_loss=0.3849, over 4084941.07 frames. ], batch size: 60, lr: 4.36e-03, grad_scale: 16.0 2024-09-15 20:41:21,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.48 vs. limit=10.0 2024-09-15 20:41:32,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=340410.8333333333, ans=0.125 2024-09-15 20:42:23,144 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2024-09-15 20:42:31,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2024-09-15 20:42:31,499 INFO [train.py:1198] (0/2) Epoch 19, batch 5150, loss[loss=0.247, ctc_loss=0.1695, cr_loss=0.3876, over 20679.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.164, cr_loss=0.385, over 4092827.08 frames. ], batch size: 66, lr: 4.36e-03, grad_scale: 16.0 2024-09-15 20:42:46,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=340552.5, ans=0.0 2024-09-15 20:43:10,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=340580.8333333333, ans=0.1 2024-09-15 20:43:13,653 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:43:21,916 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.071e+02 2.213e+02 2.468e+02 3.137e+02, threshold=4.426e+02, percent-clipped=0.0 2024-09-15 20:43:45,856 INFO [train.py:1198] (0/2) Epoch 19, batch 5200, loss[loss=0.2789, ctc_loss=0.2019, cr_loss=0.3851, over 13996.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1644, cr_loss=0.3856, over 4096299.25 frames. ], batch size: 149, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:43:49,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2024-09-15 20:44:26,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=340722.5, ans=0.125 2024-09-15 20:44:27,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=340722.5, ans=0.125 2024-09-15 20:44:55,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340779.1666666667, ans=0.1 2024-09-15 20:45:01,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=340807.5, ans=0.0 2024-09-15 20:45:02,574 INFO [train.py:1198] (0/2) Epoch 19, batch 5250, loss[loss=0.2834, ctc_loss=0.1982, cr_loss=0.4261, over 19958.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1634, cr_loss=0.3833, over 4083669.12 frames. ], batch size: 80, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:45:16,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=22.5 2024-09-15 20:45:28,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=340835.8333333333, ans=0.125 2024-09-15 20:45:29,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=340835.8333333333, ans=0.125 2024-09-15 20:45:52,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=340892.5, ans=0.025 2024-09-15 20:45:53,432 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.075e+02 2.257e+02 2.468e+02 3.104e+02, threshold=4.513e+02, percent-clipped=0.0 2024-09-15 20:46:07,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-15 20:46:17,493 INFO [train.py:1198] (0/2) Epoch 19, batch 5300, loss[loss=0.265, ctc_loss=0.1825, cr_loss=0.4123, over 20611.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.163, cr_loss=0.3832, over 4101742.52 frames. ], batch size: 75, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:46:50,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=341005.8333333333, ans=0.125 2024-09-15 20:46:57,327 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.06 vs. limit=15.0 2024-09-15 20:47:32,528 INFO [train.py:1198] (0/2) Epoch 19, batch 5350, loss[loss=0.2387, ctc_loss=0.1598, cr_loss=0.3944, over 20937.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1617, cr_loss=0.3817, over 4111672.06 frames. ], batch size: 60, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:47:32,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=341090.8333333333, ans=0.0 2024-09-15 20:47:38,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=341090.8333333333, ans=0.0 2024-09-15 20:47:58,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=341119.1666666667, ans=0.2 2024-09-15 20:48:24,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=341175.8333333333, ans=0.125 2024-09-15 20:48:25,966 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.063e+02 2.144e+02 2.322e+02 3.198e+02, threshold=4.287e+02, percent-clipped=0.0 2024-09-15 20:48:30,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=341175.8333333333, ans=0.125 2024-09-15 20:48:49,522 INFO [train.py:1198] (0/2) Epoch 19, batch 5400, loss[loss=0.2498, ctc_loss=0.1703, cr_loss=0.3971, over 19581.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1631, cr_loss=0.3843, over 4106167.55 frames. ], batch size: 90, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:49:00,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=341232.5, ans=0.0 2024-09-15 20:49:17,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=341289.1666666667, ans=0.0 2024-09-15 20:49:19,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-09-15 20:49:26,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=341289.1666666667, ans=0.125 2024-09-15 20:49:44,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=341317.5, ans=0.125 2024-09-15 20:50:03,419 INFO [train.py:1198] (0/2) Epoch 19, batch 5450, loss[loss=0.2103, ctc_loss=0.1393, cr_loss=0.355, over 20995.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1637, cr_loss=0.3847, over 4091706.35 frames. ], batch size: 48, lr: 4.36e-03, grad_scale: 32.0 2024-09-15 20:50:17,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=341402.5, ans=0.2 2024-09-15 20:50:39,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341430.8333333333, ans=0.1 2024-09-15 20:50:51,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=22.5 2024-09-15 20:50:53,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2024-09-15 20:50:53,814 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.025e+02 2.193e+02 2.354e+02 3.816e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 20:51:17,807 INFO [train.py:1198] (0/2) Epoch 19, batch 5500, loss[loss=0.2814, ctc_loss=0.2007, cr_loss=0.4033, over 14173.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1645, cr_loss=0.3854, over 4091893.73 frames. ], batch size: 151, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:51:37,694 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:51:59,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=341572.5, ans=0.05 2024-09-15 20:52:01,628 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.62 vs. limit=10.0 2024-09-15 20:52:32,624 INFO [train.py:1198] (0/2) Epoch 19, batch 5550, loss[loss=0.2347, ctc_loss=0.1585, cr_loss=0.3812, over 20853.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1639, cr_loss=0.3848, over 4096762.89 frames. ], batch size: 57, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:52:37,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=341657.5, ans=0.125 2024-09-15 20:52:52,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.21 vs. limit=12.0 2024-09-15 20:52:59,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=341685.8333333333, ans=0.0 2024-09-15 20:53:15,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341714.1666666667, ans=0.1 2024-09-15 20:53:23,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.077e+02 2.214e+02 2.415e+02 5.498e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-15 20:53:48,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-09-15 20:53:50,193 INFO [train.py:1198] (0/2) Epoch 19, batch 5600, loss[loss=0.2218, ctc_loss=0.1504, cr_loss=0.357, over 20777.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1638, cr_loss=0.3844, over 4081249.99 frames. ], batch size: 56, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:54:11,664 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:55:04,697 INFO [train.py:1198] (0/2) Epoch 19, batch 5650, loss[loss=0.2569, ctc_loss=0.1793, cr_loss=0.388, over 20832.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1636, cr_loss=0.3844, over 4084647.10 frames. ], batch size: 65, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:55:42,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=341997.5, ans=0.1 2024-09-15 20:55:55,195 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.088e+02 2.206e+02 2.423e+02 3.823e+02, threshold=4.412e+02, percent-clipped=1.0 2024-09-15 20:56:18,844 INFO [train.py:1198] (0/2) Epoch 19, batch 5700, loss[loss=0.2309, ctc_loss=0.155, cr_loss=0.3799, over 20791.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1644, cr_loss=0.3851, over 4066450.30 frames. ], batch size: 56, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:56:29,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=342082.5, ans=0.0 2024-09-15 20:57:00,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342139.1666666667, ans=0.1 2024-09-15 20:57:19,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=342195.8333333333, ans=10.0 2024-09-15 20:57:27,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=342195.8333333333, ans=0.1 2024-09-15 20:57:35,922 INFO [train.py:1198] (0/2) Epoch 19, batch 5750, loss[loss=0.2356, ctc_loss=0.1606, cr_loss=0.3751, over 20902.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.164, cr_loss=0.3841, over 4066827.24 frames. ], batch size: 57, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:57:37,780 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 20:57:40,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=342224.1666666667, ans=0.125 2024-09-15 20:57:51,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=22.5 2024-09-15 20:58:23,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=342309.1666666667, ans=0.2 2024-09-15 20:58:26,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.039e+02 2.192e+02 2.375e+02 4.646e+02, threshold=4.384e+02, percent-clipped=1.0 2024-09-15 20:58:50,199 INFO [train.py:1198] (0/2) Epoch 19, batch 5800, loss[loss=0.2242, ctc_loss=0.151, cr_loss=0.366, over 21075.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.164, cr_loss=0.3846, over 4077318.35 frames. ], batch size: 56, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 20:59:20,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-09-15 20:59:41,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342450.8333333333, ans=0.1 2024-09-15 20:59:50,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=342479.1666666667, ans=0.2 2024-09-15 21:00:04,429 INFO [train.py:1198] (0/2) Epoch 19, batch 5850, loss[loss=0.288, ctc_loss=0.2069, cr_loss=0.4057, over 14650.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.163, cr_loss=0.3832, over 4082278.76 frames. ], batch size: 150, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 21:00:04,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=342507.5, ans=10.0 2024-09-15 21:00:24,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=342535.8333333333, ans=0.125 2024-09-15 21:00:54,970 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.054e+02 2.231e+02 2.392e+02 5.027e+02, threshold=4.462e+02, percent-clipped=1.0 2024-09-15 21:01:08,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=342620.8333333333, ans=0.125 2024-09-15 21:01:13,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2024-09-15 21:01:18,923 INFO [train.py:1198] (0/2) Epoch 19, batch 5900, loss[loss=0.2641, ctc_loss=0.181, cr_loss=0.4154, over 20755.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1631, cr_loss=0.3837, over 4070644.03 frames. ], batch size: 71, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 21:01:48,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=342705.8333333333, ans=0.125 2024-09-15 21:01:59,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=342705.8333333333, ans=0.0 2024-09-15 21:02:25,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=342762.5, ans=0.2 2024-09-15 21:02:34,916 INFO [train.py:1198] (0/2) Epoch 19, batch 5950, loss[loss=0.2529, ctc_loss=0.1723, cr_loss=0.4028, over 20869.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1632, cr_loss=0.3847, over 4083427.33 frames. ], batch size: 54, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 21:02:36,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=342790.8333333333, ans=0.0 2024-09-15 21:02:49,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=342819.1666666667, ans=0.0 2024-09-15 21:02:54,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=342819.1666666667, ans=0.125 2024-09-15 21:03:25,546 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.091e+02 2.197e+02 2.312e+02 3.561e+02, threshold=4.393e+02, percent-clipped=0.0 2024-09-15 21:03:49,377 INFO [train.py:1198] (0/2) Epoch 19, batch 6000, loss[loss=0.2062, ctc_loss=0.1374, cr_loss=0.3437, over 20996.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1625, cr_loss=0.3837, over 4093071.71 frames. ], batch size: 50, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 21:03:49,378 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 21:04:13,863 INFO [train.py:1230] (0/2) Epoch 19, validation: loss=0.04428, ctc_loss=0.04428, cr_loss=1.047e-14, over 944034.00 frames. 2024-09-15 21:04:13,864 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 21:04:30,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=342960.8333333333, ans=0.0 2024-09-15 21:04:48,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=342989.1666666667, ans=0.125 2024-09-15 21:05:30,113 INFO [train.py:1198] (0/2) Epoch 19, batch 6050, loss[loss=0.2758, ctc_loss=0.1885, cr_loss=0.4362, over 20338.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1628, cr_loss=0.3843, over 4090507.51 frames. ], batch size: 74, lr: 4.35e-03, grad_scale: 32.0 2024-09-15 21:05:36,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=343074.1666666667, ans=0.125 2024-09-15 21:06:02,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=343130.8333333333, ans=0.125 2024-09-15 21:06:12,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=343130.8333333333, ans=0.125 2024-09-15 21:06:21,588 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.107e+02 2.319e+02 2.569e+02 3.285e+02, threshold=4.638e+02, percent-clipped=0.0 2024-09-15 21:06:45,488 INFO [train.py:1198] (0/2) Epoch 19, batch 6100, loss[loss=0.2369, ctc_loss=0.1627, cr_loss=0.3708, over 20930.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1632, cr_loss=0.385, over 4091053.42 frames. ], batch size: 60, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:07:00,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=343244.1666666667, ans=0.125 2024-09-15 21:07:35,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=343300.8333333333, ans=0.0 2024-09-15 21:07:50,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343329.1666666667, ans=0.1 2024-09-15 21:07:59,225 INFO [train.py:1198] (0/2) Epoch 19, batch 6150, loss[loss=0.2231, ctc_loss=0.1471, cr_loss=0.38, over 20889.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1639, cr_loss=0.385, over 4071837.62 frames. ], batch size: 54, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:08:08,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=343357.5, ans=0.0 2024-09-15 21:08:23,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=343385.8333333333, ans=0.125 2024-09-15 21:08:26,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=343385.8333333333, ans=0.0 2024-09-15 21:08:49,334 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.125e+02 2.258e+02 2.510e+02 3.321e+02, threshold=4.517e+02, percent-clipped=0.0 2024-09-15 21:08:52,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=343442.5, ans=0.125 2024-09-15 21:09:07,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=343470.8333333333, ans=0.125 2024-09-15 21:09:12,856 INFO [train.py:1198] (0/2) Epoch 19, batch 6200, loss[loss=0.2507, ctc_loss=0.1679, cr_loss=0.4142, over 21031.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1634, cr_loss=0.3841, over 4070473.72 frames. ], batch size: 63, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:09:37,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-09-15 21:09:41,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=343555.8333333333, ans=6.0 2024-09-15 21:10:07,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=343584.1666666667, ans=0.125 2024-09-15 21:10:08,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.76 vs. limit=10.0 2024-09-15 21:10:26,981 INFO [train.py:1198] (0/2) Epoch 19, batch 6250, loss[loss=0.2314, ctc_loss=0.1569, cr_loss=0.3726, over 21075.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1638, cr_loss=0.3836, over 4032963.82 frames. ], batch size: 59, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:10:43,113 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:10:45,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=343669.1666666667, ans=0.125 2024-09-15 21:10:45,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=343669.1666666667, ans=0.025 2024-09-15 21:10:56,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=343697.5, ans=0.0 2024-09-15 21:11:07,170 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2024-09-15 21:11:08,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=343697.5, ans=0.2 2024-09-15 21:11:17,911 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.073e+02 2.286e+02 2.465e+02 3.594e+02, threshold=4.573e+02, percent-clipped=0.0 2024-09-15 21:11:28,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343754.1666666667, ans=0.1 2024-09-15 21:11:31,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=343754.1666666667, ans=0.125 2024-09-15 21:11:41,763 INFO [train.py:1198] (0/2) Epoch 19, batch 6300, loss[loss=0.2609, ctc_loss=0.1842, cr_loss=0.3837, over 14257.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1655, cr_loss=0.3838, over 3975424.29 frames. ], batch size: 149, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:11:43,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343782.5, ans=0.1 2024-09-15 21:11:58,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=343810.8333333333, ans=0.0 2024-09-15 21:12:05,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2024-09-15 21:12:15,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=343839.1666666667, ans=0.125 2024-09-15 21:12:33,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=343867.5, ans=0.02 2024-09-15 21:12:50,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343895.8333333333, ans=0.1 2024-09-15 21:12:54,183 INFO [train.py:1198] (0/2) Epoch 19, batch 6350, loss[loss=0.3012, ctc_loss=0.2146, cr_loss=0.433, over 14226.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1703, cr_loss=0.387, over 3826766.01 frames. ], batch size: 149, lr: 4.34e-03, grad_scale: 32.0 2024-09-15 21:12:56,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343924.1666666667, ans=0.1 2024-09-15 21:12:58,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=343924.1666666667, ans=0.125 2024-09-15 21:13:01,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=343924.1666666667, ans=0.2 2024-09-15 21:13:11,011 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=12.0 2024-09-15 21:13:20,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343952.5, ans=0.1 2024-09-15 21:13:43,333 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.320e+02 2.521e+02 2.765e+02 3.309e+02, threshold=5.042e+02, percent-clipped=0.0 2024-09-15 21:13:53,171 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-19.pt 2024-09-15 21:14:41,660 INFO [train.py:1198] (0/2) Epoch 20, batch 0, loss[loss=0.2298, ctc_loss=0.1523, cr_loss=0.3874, over 20894.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1523, cr_loss=0.3874, over 20894.00 frames. ], batch size: 54, lr: 4.23e-03, grad_scale: 32.0 2024-09-15 21:14:41,661 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 21:14:52,530 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.2807, 3.9101, 3.9757, 3.8911], device='cuda:0') 2024-09-15 21:15:01,248 INFO [train.py:1230] (0/2) Epoch 20, validation: loss=0.04431, ctc_loss=0.04431, cr_loss=1.055e-14, over 944034.00 frames. 2024-09-15 21:15:01,249 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 21:15:51,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344125.3333333333, ans=0.1 2024-09-15 21:16:08,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=344153.6666666667, ans=0.0 2024-09-15 21:16:09,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=344153.6666666667, ans=0.0 2024-09-15 21:16:17,107 INFO [train.py:1198] (0/2) Epoch 20, batch 50, loss[loss=0.2883, ctc_loss=0.2047, cr_loss=0.4181, over 14080.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1634, cr_loss=0.3849, over 926809.02 frames. ], batch size: 150, lr: 4.23e-03, grad_scale: 32.0 2024-09-15 21:16:29,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=344182.0, ans=0.125 2024-09-15 21:16:52,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2024-09-15 21:16:59,359 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:17:00,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=344267.0, ans=0.0 2024-09-15 21:17:22,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=344295.3333333333, ans=0.2 2024-09-15 21:17:26,292 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.057e+02 2.202e+02 2.357e+02 3.385e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-15 21:17:35,251 INFO [train.py:1198] (0/2) Epoch 20, batch 100, loss[loss=0.2272, ctc_loss=0.1531, cr_loss=0.3704, over 21021.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1622, cr_loss=0.3814, over 1631073.45 frames. ], batch size: 63, lr: 4.23e-03, grad_scale: 16.0 2024-09-15 21:17:40,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=344323.6666666667, ans=0.125 2024-09-15 21:17:56,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=344352.0, ans=0.0 2024-09-15 21:18:08,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=344380.3333333333, ans=0.125 2024-09-15 21:18:14,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=344380.3333333333, ans=0.035 2024-09-15 21:18:31,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=344408.6666666667, ans=0.125 2024-09-15 21:18:36,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=344437.0, ans=0.5 2024-09-15 21:18:39,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=344437.0, ans=0.09899494936611666 2024-09-15 21:18:52,302 INFO [train.py:1198] (0/2) Epoch 20, batch 150, loss[loss=0.2603, ctc_loss=0.1798, cr_loss=0.4025, over 20642.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1626, cr_loss=0.3826, over 2183713.29 frames. ], batch size: 66, lr: 4.22e-03, grad_scale: 16.0 2024-09-15 21:18:52,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=344465.3333333333, ans=0.125 2024-09-15 21:19:06,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=22.5 2024-09-15 21:19:30,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344522.0, ans=0.1 2024-09-15 21:19:58,379 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.034e+02 2.155e+02 2.380e+02 4.137e+02, threshold=4.310e+02, percent-clipped=0.0 2024-09-15 21:20:04,779 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:20:07,502 INFO [train.py:1198] (0/2) Epoch 20, batch 200, loss[loss=0.2307, ctc_loss=0.1577, cr_loss=0.3651, over 21068.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1635, cr_loss=0.3845, over 2612845.41 frames. ], batch size: 59, lr: 4.22e-03, grad_scale: 16.0 2024-09-15 21:20:23,705 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.22 vs. limit=12.0 2024-09-15 21:20:46,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=344663.6666666667, ans=0.125 2024-09-15 21:21:03,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-15 21:21:22,496 INFO [train.py:1198] (0/2) Epoch 20, batch 250, loss[loss=0.2602, ctc_loss=0.1772, cr_loss=0.4153, over 21036.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1642, cr_loss=0.3858, over 2934680.65 frames. ], batch size: 61, lr: 4.22e-03, grad_scale: 16.0 2024-09-15 21:21:33,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=344748.6666666667, ans=0.125 2024-09-15 21:21:54,632 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 21:22:00,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=344805.3333333333, ans=0.025 2024-09-15 21:22:09,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344833.6666666667, ans=0.1 2024-09-15 21:22:12,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=344833.6666666667, ans=0.0 2024-09-15 21:22:18,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=344833.6666666667, ans=0.2 2024-09-15 21:22:28,982 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.037e+02 2.145e+02 2.263e+02 8.354e+02, threshold=4.291e+02, percent-clipped=1.0 2024-09-15 21:22:37,888 INFO [train.py:1198] (0/2) Epoch 20, batch 300, loss[loss=0.2392, ctc_loss=0.1614, cr_loss=0.3891, over 21064.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1639, cr_loss=0.3855, over 3189049.58 frames. ], batch size: 53, lr: 4.22e-03, grad_scale: 16.0 2024-09-15 21:22:47,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=344890.3333333333, ans=0.125 2024-09-15 21:23:16,543 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=15.0 2024-09-15 21:23:59,638 INFO [train.py:1198] (0/2) Epoch 20, batch 350, loss[loss=0.2449, ctc_loss=0.1671, cr_loss=0.3892, over 20958.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1637, cr_loss=0.3849, over 3397800.82 frames. ], batch size: 58, lr: 4.22e-03, grad_scale: 16.0 2024-09-15 21:24:23,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=345060.3333333333, ans=0.0 2024-09-15 21:24:42,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-15 21:24:48,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=345117.0, ans=0.2 2024-09-15 21:25:00,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=345145.3333333333, ans=0.125 2024-09-15 21:25:06,299 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.027e+02 2.187e+02 2.410e+02 4.213e+02, threshold=4.375e+02, percent-clipped=0.0 2024-09-15 21:25:15,396 INFO [train.py:1198] (0/2) Epoch 20, batch 400, loss[loss=0.2336, ctc_loss=0.1567, cr_loss=0.3842, over 20952.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1629, cr_loss=0.3843, over 3563262.41 frames. ], batch size: 51, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:25:53,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=345230.3333333333, ans=0.0 2024-09-15 21:26:22,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=345287.0, ans=0.2 2024-09-15 21:26:30,963 INFO [train.py:1198] (0/2) Epoch 20, batch 450, loss[loss=0.2082, ctc_loss=0.1362, cr_loss=0.36, over 20879.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1614, cr_loss=0.3826, over 3686321.78 frames. ], batch size: 54, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:26:47,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=345343.6666666667, ans=0.025 2024-09-15 21:26:52,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=345343.6666666667, ans=0.125 2024-09-15 21:26:59,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345372.0, ans=0.1 2024-09-15 21:27:06,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=345372.0, ans=0.0 2024-09-15 21:27:16,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2024-09-15 21:27:37,708 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.091e+02 2.227e+02 2.463e+02 3.132e+02, threshold=4.453e+02, percent-clipped=0.0 2024-09-15 21:27:46,637 INFO [train.py:1198] (0/2) Epoch 20, batch 500, loss[loss=0.2419, ctc_loss=0.1653, cr_loss=0.3833, over 20693.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1621, cr_loss=0.3837, over 3784425.29 frames. ], batch size: 68, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:27:50,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=345457.0, ans=0.2 2024-09-15 21:29:05,859 INFO [train.py:1198] (0/2) Epoch 20, batch 550, loss[loss=0.2505, ctc_loss=0.17, cr_loss=0.4022, over 20968.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1619, cr_loss=0.3832, over 3842439.71 frames. ], batch size: 64, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:29:42,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=345655.3333333333, ans=0.05 2024-09-15 21:30:15,480 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.033e+02 2.183e+02 2.361e+02 3.780e+02, threshold=4.366e+02, percent-clipped=0.0 2024-09-15 21:30:24,674 INFO [train.py:1198] (0/2) Epoch 20, batch 600, loss[loss=0.2366, ctc_loss=0.1606, cr_loss=0.38, over 21019.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1618, cr_loss=0.3832, over 3907977.84 frames. ], batch size: 63, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:30:26,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=345740.3333333333, ans=0.0 2024-09-15 21:30:55,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=345797.0, ans=0.125 2024-09-15 21:31:01,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=345797.0, ans=0.125 2024-09-15 21:31:04,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=345797.0, ans=0.0 2024-09-15 21:31:24,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=345853.6666666667, ans=0.125 2024-09-15 21:31:34,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2024-09-15 21:31:40,100 INFO [train.py:1198] (0/2) Epoch 20, batch 650, loss[loss=0.2352, ctc_loss=0.1571, cr_loss=0.3902, over 21043.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1615, cr_loss=0.3833, over 3967448.32 frames. ], batch size: 62, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:31:40,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=345882.0, ans=0.015 2024-09-15 21:31:54,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=345910.3333333333, ans=0.0 2024-09-15 21:32:14,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-09-15 21:32:46,720 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.018e+02 2.206e+02 2.436e+02 5.404e+02, threshold=4.412e+02, percent-clipped=1.0 2024-09-15 21:32:55,725 INFO [train.py:1198] (0/2) Epoch 20, batch 700, loss[loss=0.2672, ctc_loss=0.184, cr_loss=0.4161, over 20332.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1616, cr_loss=0.3837, over 4003422.96 frames. ], batch size: 74, lr: 4.22e-03, grad_scale: 32.0 2024-09-15 21:32:57,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=346023.6666666667, ans=0.0 2024-09-15 21:33:30,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=346080.3333333333, ans=0.0 2024-09-15 21:33:47,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=346108.6666666667, ans=0.125 2024-09-15 21:33:58,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2024-09-15 21:34:00,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=346137.0, ans=0.0 2024-09-15 21:34:10,925 INFO [train.py:1198] (0/2) Epoch 20, batch 750, loss[loss=0.2414, ctc_loss=0.1631, cr_loss=0.3914, over 20842.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1621, cr_loss=0.3842, over 4017350.96 frames. ], batch size: 65, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:35:05,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=346250.3333333333, ans=0.2 2024-09-15 21:35:20,841 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.018e+02 2.162e+02 2.350e+02 3.556e+02, threshold=4.324e+02, percent-clipped=0.0 2024-09-15 21:35:33,149 INFO [train.py:1198] (0/2) Epoch 20, batch 800, loss[loss=0.2324, ctc_loss=0.1567, cr_loss=0.3787, over 20834.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1616, cr_loss=0.3832, over 4025372.14 frames. ], batch size: 59, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:35:48,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=346335.3333333333, ans=0.2 2024-09-15 21:36:03,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=346363.6666666667, ans=0.125 2024-09-15 21:36:05,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=346363.6666666667, ans=0.125 2024-09-15 21:36:48,791 INFO [train.py:1198] (0/2) Epoch 20, batch 850, loss[loss=0.2655, ctc_loss=0.1806, cr_loss=0.4241, over 20326.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1622, cr_loss=0.3836, over 4023937.68 frames. ], batch size: 74, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:37:09,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=346477.0, ans=0.2 2024-09-15 21:37:33,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346533.6666666667, ans=0.1 2024-09-15 21:37:41,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=346533.6666666667, ans=0.125 2024-09-15 21:37:45,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=346533.6666666667, ans=0.0 2024-09-15 21:37:46,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.17 vs. limit=15.0 2024-09-15 21:37:55,798 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.103e+02 2.201e+02 2.408e+02 3.477e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-15 21:38:04,874 INFO [train.py:1198] (0/2) Epoch 20, batch 900, loss[loss=0.2504, ctc_loss=0.1717, cr_loss=0.3934, over 20968.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1621, cr_loss=0.3832, over 4045049.83 frames. ], batch size: 58, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:38:05,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=346590.3333333333, ans=0.125 2024-09-15 21:38:43,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=346647.0, ans=0.0 2024-09-15 21:38:56,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=346675.3333333333, ans=0.0 2024-09-15 21:39:04,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=346703.6666666667, ans=0.0 2024-09-15 21:39:08,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=346703.6666666667, ans=0.0 2024-09-15 21:39:20,348 INFO [train.py:1198] (0/2) Epoch 20, batch 950, loss[loss=0.2226, ctc_loss=0.1479, cr_loss=0.3736, over 21064.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1621, cr_loss=0.3837, over 4065860.29 frames. ], batch size: 56, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:40:29,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=346845.3333333333, ans=0.125 2024-09-15 21:40:30,276 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.105e+02 2.209e+02 2.368e+02 5.712e+02, threshold=4.418e+02, percent-clipped=1.0 2024-09-15 21:40:36,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=346845.3333333333, ans=0.125 2024-09-15 21:40:39,246 INFO [train.py:1198] (0/2) Epoch 20, batch 1000, loss[loss=0.2113, ctc_loss=0.1416, cr_loss=0.3487, over 20974.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1614, cr_loss=0.3819, over 4076829.91 frames. ], batch size: 52, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:40:58,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=346902.0, ans=0.125 2024-09-15 21:41:29,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=346958.6666666667, ans=0.125 2024-09-15 21:41:57,909 INFO [train.py:1198] (0/2) Epoch 20, batch 1050, loss[loss=0.2888, ctc_loss=0.2095, cr_loss=0.3965, over 14304.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1617, cr_loss=0.382, over 4075026.50 frames. ], batch size: 151, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:42:35,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=347072.0, ans=0.025 2024-09-15 21:42:41,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=347100.3333333333, ans=0.125 2024-09-15 21:43:03,768 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.041e+02 2.158e+02 2.292e+02 3.094e+02, threshold=4.316e+02, percent-clipped=0.0 2024-09-15 21:43:13,068 INFO [train.py:1198] (0/2) Epoch 20, batch 1100, loss[loss=0.2464, ctc_loss=0.167, cr_loss=0.3966, over 21034.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1614, cr_loss=0.381, over 4076828.50 frames. ], batch size: 63, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:43:46,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=347213.6666666667, ans=0.07 2024-09-15 21:44:14,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=347270.3333333333, ans=0.1 2024-09-15 21:44:28,347 INFO [train.py:1198] (0/2) Epoch 20, batch 1150, loss[loss=0.2383, ctc_loss=0.16, cr_loss=0.3912, over 20833.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1613, cr_loss=0.381, over 4080727.29 frames. ], batch size: 59, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:44:37,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=15.0 2024-09-15 21:44:39,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.66 vs. limit=10.0 2024-09-15 21:45:23,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.58 vs. limit=10.0 2024-09-15 21:45:35,015 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.079e+02 2.206e+02 2.346e+02 3.281e+02, threshold=4.413e+02, percent-clipped=0.0 2024-09-15 21:45:44,058 INFO [train.py:1198] (0/2) Epoch 20, batch 1200, loss[loss=0.2554, ctc_loss=0.1745, cr_loss=0.4049, over 20672.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1619, cr_loss=0.3818, over 4079200.62 frames. ], batch size: 71, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:46:16,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=347497.0, ans=0.125 2024-09-15 21:46:28,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=347497.0, ans=0.125 2024-09-15 21:46:32,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=347525.3333333333, ans=0.0 2024-09-15 21:47:05,231 INFO [train.py:1198] (0/2) Epoch 20, batch 1250, loss[loss=0.2411, ctc_loss=0.1632, cr_loss=0.3895, over 20937.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1614, cr_loss=0.3817, over 4097200.84 frames. ], batch size: 60, lr: 4.21e-03, grad_scale: 32.0 2024-09-15 21:48:11,106 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.092e+02 2.225e+02 2.357e+02 3.190e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-15 21:48:20,103 INFO [train.py:1198] (0/2) Epoch 20, batch 1300, loss[loss=0.2143, ctc_loss=0.1441, cr_loss=0.351, over 20963.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1621, cr_loss=0.3826, over 4099671.84 frames. ], batch size: 58, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:48:23,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=347723.6666666667, ans=0.0 2024-09-15 21:49:02,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=347780.3333333333, ans=0.125 2024-09-15 21:49:10,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=347808.6666666667, ans=0.07 2024-09-15 21:49:20,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=347837.0, ans=0.07 2024-09-15 21:49:21,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=347837.0, ans=0.0 2024-09-15 21:49:35,572 INFO [train.py:1198] (0/2) Epoch 20, batch 1350, loss[loss=0.2229, ctc_loss=0.1493, cr_loss=0.368, over 20971.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1612, cr_loss=0.3813, over 4095165.68 frames. ], batch size: 64, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:49:49,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=347893.6666666667, ans=0.125 2024-09-15 21:49:55,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=347893.6666666667, ans=0.0 2024-09-15 21:50:03,247 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.20 vs. limit=10.0 2024-09-15 21:50:41,952 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.096e+02 2.229e+02 2.420e+02 3.122e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-15 21:50:51,215 INFO [train.py:1198] (0/2) Epoch 20, batch 1400, loss[loss=0.1866, ctc_loss=0.1228, cr_loss=0.3187, over 20940.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.161, cr_loss=0.381, over 4109177.63 frames. ], batch size: 49, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:50:51,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=348007.0, ans=0.0 2024-09-15 21:51:15,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=348035.3333333333, ans=0.0 2024-09-15 21:51:32,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=348063.6666666667, ans=0.125 2024-09-15 21:51:33,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=348063.6666666667, ans=0.125 2024-09-15 21:51:56,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=348120.3333333333, ans=0.125 2024-09-15 21:51:57,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=348120.3333333333, ans=0.1 2024-09-15 21:52:06,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=22.5 2024-09-15 21:52:08,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=348148.6666666667, ans=0.2 2024-09-15 21:52:09,933 INFO [train.py:1198] (0/2) Epoch 20, batch 1450, loss[loss=0.1814, ctc_loss=0.119, cr_loss=0.3123, over 19899.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1609, cr_loss=0.3808, over 4107002.32 frames. ], batch size: 44, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:52:13,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=348148.6666666667, ans=0.0 2024-09-15 21:52:17,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348148.6666666667, ans=0.1 2024-09-15 21:52:28,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=348177.0, ans=0.0 2024-09-15 21:53:19,471 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.133e+02 2.247e+02 2.402e+02 3.201e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-15 21:53:26,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=348262.0, ans=0.125 2024-09-15 21:53:28,792 INFO [train.py:1198] (0/2) Epoch 20, batch 1500, loss[loss=0.2364, ctc_loss=0.1572, cr_loss=0.3957, over 20981.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1619, cr_loss=0.3826, over 4107639.27 frames. ], batch size: 63, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:53:54,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=348318.6666666667, ans=0.0 2024-09-15 21:53:54,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=348318.6666666667, ans=0.07 2024-09-15 21:54:00,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348347.0, ans=0.1 2024-09-15 21:54:12,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=348375.3333333333, ans=0.2 2024-09-15 21:54:12,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=348375.3333333333, ans=0.1 2024-09-15 21:54:14,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=348375.3333333333, ans=0.125 2024-09-15 21:54:21,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=348375.3333333333, ans=0.125 2024-09-15 21:54:44,308 INFO [train.py:1198] (0/2) Epoch 20, batch 1550, loss[loss=0.2342, ctc_loss=0.1587, cr_loss=0.3775, over 21014.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1617, cr_loss=0.3822, over 4120752.07 frames. ], batch size: 63, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:54:51,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-09-15 21:55:31,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=348517.0, ans=0.0 2024-09-15 21:55:50,625 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.721e+02 2.021e+02 2.194e+02 2.346e+02 4.155e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-15 21:55:59,873 INFO [train.py:1198] (0/2) Epoch 20, batch 1600, loss[loss=0.2588, ctc_loss=0.1763, cr_loss=0.4125, over 20625.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1623, cr_loss=0.3827, over 4113869.24 frames. ], batch size: 66, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:56:24,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=348602.0, ans=10.0 2024-09-15 21:56:35,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=348630.3333333333, ans=0.125 2024-09-15 21:56:35,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348630.3333333333, ans=0.1 2024-09-15 21:57:16,282 INFO [train.py:1198] (0/2) Epoch 20, batch 1650, loss[loss=0.1967, ctc_loss=0.1312, cr_loss=0.3277, over 21045.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1612, cr_loss=0.3815, over 4115327.28 frames. ], batch size: 53, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:57:22,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=348715.3333333333, ans=0.125 2024-09-15 21:57:30,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=348743.6666666667, ans=0.125 2024-09-15 21:57:37,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=348743.6666666667, ans=0.125 2024-09-15 21:58:17,659 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=22.5 2024-09-15 21:58:25,333 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.098e+02 2.222e+02 2.396e+02 4.762e+02, threshold=4.443e+02, percent-clipped=1.0 2024-09-15 21:58:34,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=348828.6666666667, ans=0.125 2024-09-15 21:58:37,112 INFO [train.py:1198] (0/2) Epoch 20, batch 1700, loss[loss=0.2778, ctc_loss=0.1948, cr_loss=0.4153, over 20645.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1613, cr_loss=0.3812, over 4111016.99 frames. ], batch size: 66, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:58:48,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=348857.0, ans=0.025 2024-09-15 21:59:06,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-09-15 21:59:52,713 INFO [train.py:1198] (0/2) Epoch 20, batch 1750, loss[loss=0.2463, ctc_loss=0.167, cr_loss=0.3965, over 21009.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1614, cr_loss=0.3813, over 4104757.51 frames. ], batch size: 61, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 21:59:57,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=348998.6666666667, ans=0.2 2024-09-15 22:00:05,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-15 22:00:06,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=349027.0, ans=0.0 2024-09-15 22:00:26,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=349055.3333333333, ans=0.04949747468305833 2024-09-15 22:00:31,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=349055.3333333333, ans=0.1 2024-09-15 22:00:58,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=349112.0, ans=0.04949747468305833 2024-09-15 22:00:59,455 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.078e+02 2.203e+02 2.438e+02 4.193e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-15 22:01:04,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=349112.0, ans=0.125 2024-09-15 22:01:04,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349112.0, ans=0.1 2024-09-15 22:01:08,576 INFO [train.py:1198] (0/2) Epoch 20, batch 1800, loss[loss=0.2539, ctc_loss=0.171, cr_loss=0.4142, over 21005.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1613, cr_loss=0.3814, over 4112551.08 frames. ], batch size: 63, lr: 4.20e-03, grad_scale: 32.0 2024-09-15 22:01:21,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=349140.3333333333, ans=0.04949747468305833 2024-09-15 22:02:23,945 INFO [train.py:1198] (0/2) Epoch 20, batch 1850, loss[loss=0.2159, ctc_loss=0.1434, cr_loss=0.3625, over 20920.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1624, cr_loss=0.3827, over 4097495.19 frames. ], batch size: 60, lr: 4.20e-03, grad_scale: 16.0 2024-09-15 22:02:24,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=349282.0, ans=0.0 2024-09-15 22:02:37,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=349310.3333333333, ans=0.125 2024-09-15 22:02:42,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=349310.3333333333, ans=0.0 2024-09-15 22:03:06,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=349338.6666666667, ans=0.0 2024-09-15 22:03:08,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=349367.0, ans=0.025 2024-09-15 22:03:32,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=349395.3333333333, ans=0.1 2024-09-15 22:03:33,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=22.5 2024-09-15 22:03:35,449 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.062e+02 2.190e+02 2.352e+02 6.993e+02, threshold=4.381e+02, percent-clipped=1.0 2024-09-15 22:03:37,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349395.3333333333, ans=0.1 2024-09-15 22:03:43,254 INFO [train.py:1198] (0/2) Epoch 20, batch 1900, loss[loss=0.2164, ctc_loss=0.1447, cr_loss=0.3583, over 20946.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1624, cr_loss=0.383, over 4098391.44 frames. ], batch size: 49, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:04:06,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=349452.0, ans=0.125 2024-09-15 22:04:10,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=349452.0, ans=0.125 2024-09-15 22:04:54,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=349537.0, ans=0.125 2024-09-15 22:05:01,571 INFO [train.py:1198] (0/2) Epoch 20, batch 1950, loss[loss=0.2709, ctc_loss=0.1855, cr_loss=0.4272, over 20979.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1639, cr_loss=0.3847, over 4083080.92 frames. ], batch size: 64, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:05:09,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=349565.3333333333, ans=0.125 2024-09-15 22:05:25,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=349593.6666666667, ans=0.2 2024-09-15 22:05:34,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=349622.0, ans=0.025 2024-09-15 22:06:08,934 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.079e+02 2.264e+02 2.474e+02 3.500e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-15 22:06:16,550 INFO [train.py:1198] (0/2) Epoch 20, batch 2000, loss[loss=0.248, ctc_loss=0.1713, cr_loss=0.3836, over 19383.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1635, cr_loss=0.3839, over 4079763.43 frames. ], batch size: 90, lr: 4.19e-03, grad_scale: 32.0 2024-09-15 22:06:24,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=349707.0, ans=0.5 2024-09-15 22:06:29,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=349707.0, ans=0.05 2024-09-15 22:06:45,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.14 vs. limit=10.0 2024-09-15 22:06:47,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=349763.6666666667, ans=0.0 2024-09-15 22:07:12,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349792.0, ans=0.1 2024-09-15 22:07:25,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=349820.3333333333, ans=0.025 2024-09-15 22:07:32,341 INFO [train.py:1198] (0/2) Epoch 20, batch 2050, loss[loss=0.2166, ctc_loss=0.1454, cr_loss=0.3558, over 20961.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1623, cr_loss=0.382, over 4084577.90 frames. ], batch size: 51, lr: 4.19e-03, grad_scale: 32.0 2024-09-15 22:07:41,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=349848.6666666667, ans=0.1 2024-09-15 22:07:44,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=349848.6666666667, ans=0.0 2024-09-15 22:07:51,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=349877.0, ans=0.04949747468305833 2024-09-15 22:08:12,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=349905.3333333333, ans=0.125 2024-09-15 22:08:42,271 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.026e+02 2.161e+02 2.319e+02 3.736e+02, threshold=4.321e+02, percent-clipped=0.0 2024-09-15 22:08:48,328 INFO [train.py:1198] (0/2) Epoch 20, batch 2100, loss[loss=0.1943, ctc_loss=0.1276, cr_loss=0.3334, over 20943.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1622, cr_loss=0.382, over 4089938.23 frames. ], batch size: 50, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:08:54,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=349990.3333333333, ans=0.125 2024-09-15 22:10:11,258 INFO [train.py:1198] (0/2) Epoch 20, batch 2150, loss[loss=0.2586, ctc_loss=0.1746, cr_loss=0.4199, over 20925.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1612, cr_loss=0.3806, over 4092861.13 frames. ], batch size: 60, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:10:16,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=350132.0, ans=0.1 2024-09-15 22:10:22,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=350132.0, ans=0.1 2024-09-15 22:11:21,144 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.081e+02 2.203e+02 2.427e+02 4.025e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-15 22:11:27,262 INFO [train.py:1198] (0/2) Epoch 20, batch 2200, loss[loss=0.249, ctc_loss=0.1691, cr_loss=0.3992, over 21037.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1609, cr_loss=0.3802, over 4092702.02 frames. ], batch size: 62, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:12:20,737 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:12:43,022 INFO [train.py:1198] (0/2) Epoch 20, batch 2250, loss[loss=0.2384, ctc_loss=0.1613, cr_loss=0.3857, over 20989.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1599, cr_loss=0.3787, over 4105210.87 frames. ], batch size: 63, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:13:17,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=350472.0, ans=0.04949747468305833 2024-09-15 22:13:22,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=350472.0, ans=0.0 2024-09-15 22:13:22,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=350472.0, ans=0.125 2024-09-15 22:13:23,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=350472.0, ans=0.0 2024-09-15 22:13:35,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=350500.3333333333, ans=0.025 2024-09-15 22:13:50,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=350528.6666666667, ans=0.125 2024-09-15 22:13:51,849 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.704e+02 2.114e+02 2.267e+02 2.461e+02 3.098e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-15 22:13:52,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=350528.6666666667, ans=0.0 2024-09-15 22:13:57,795 INFO [train.py:1198] (0/2) Epoch 20, batch 2300, loss[loss=0.2746, ctc_loss=0.1874, cr_loss=0.4361, over 19472.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1615, cr_loss=0.3807, over 4090489.55 frames. ], batch size: 90, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:14:32,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=22.5 2024-09-15 22:14:34,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=350613.6666666667, ans=0.2 2024-09-15 22:15:16,621 INFO [train.py:1198] (0/2) Epoch 20, batch 2350, loss[loss=0.2266, ctc_loss=0.1504, cr_loss=0.3813, over 21030.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1624, cr_loss=0.3824, over 4091268.50 frames. ], batch size: 62, lr: 4.19e-03, grad_scale: 16.0 2024-09-15 22:15:32,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=15.0 2024-09-15 22:15:36,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=350727.0, ans=0.95 2024-09-15 22:15:47,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=350727.0, ans=0.025 2024-09-15 22:16:29,246 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.106e+02 2.231e+02 2.400e+02 4.033e+02, threshold=4.462e+02, percent-clipped=0.0 2024-09-15 22:16:35,218 INFO [train.py:1198] (0/2) Epoch 20, batch 2400, loss[loss=0.2493, ctc_loss=0.1691, cr_loss=0.401, over 20825.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1625, cr_loss=0.3825, over 4090142.02 frames. ], batch size: 65, lr: 4.19e-03, grad_scale: 32.0 2024-09-15 22:16:47,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=350840.3333333333, ans=0.0 2024-09-15 22:16:53,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.61 vs. limit=22.5 2024-09-15 22:16:54,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2024-09-15 22:17:02,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=350868.6666666667, ans=0.025 2024-09-15 22:17:08,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=350897.0, ans=0.1 2024-09-15 22:17:50,676 INFO [train.py:1198] (0/2) Epoch 20, batch 2450, loss[loss=0.2371, ctc_loss=0.1631, cr_loss=0.3702, over 20700.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1629, cr_loss=0.3831, over 4094656.78 frames. ], batch size: 66, lr: 4.19e-03, grad_scale: 32.0 2024-09-15 22:18:01,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=350982.0, ans=0.125 2024-09-15 22:18:30,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=351038.6666666667, ans=0.125 2024-09-15 22:18:59,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=351095.3333333333, ans=0.125 2024-09-15 22:19:00,285 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.068e+02 2.187e+02 2.375e+02 2.912e+02, threshold=4.373e+02, percent-clipped=0.0 2024-09-15 22:19:00,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=351095.3333333333, ans=0.2 2024-09-15 22:19:06,319 INFO [train.py:1198] (0/2) Epoch 20, batch 2500, loss[loss=0.2529, ctc_loss=0.1698, cr_loss=0.4157, over 20737.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1621, cr_loss=0.3815, over 4088531.94 frames. ], batch size: 71, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:19:06,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=351123.6666666667, ans=0.125 2024-09-15 22:19:20,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=351152.0, ans=0.125 2024-09-15 22:19:41,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351180.3333333333, ans=0.1 2024-09-15 22:19:49,670 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.12 vs. limit=15.0 2024-09-15 22:19:53,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351208.6666666667, ans=0.1 2024-09-15 22:20:21,725 INFO [train.py:1198] (0/2) Epoch 20, batch 2550, loss[loss=0.2943, ctc_loss=0.2053, cr_loss=0.4449, over 18260.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1619, cr_loss=0.3809, over 4092047.68 frames. ], batch size: 108, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:20:23,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=351265.3333333333, ans=0.0 2024-09-15 22:20:38,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=351293.6666666667, ans=0.0 2024-09-15 22:20:39,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-15 22:20:40,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=351293.6666666667, ans=0.125 2024-09-15 22:20:59,867 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-124000.pt 2024-09-15 22:21:05,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=351322.0, ans=0.09899494936611666 2024-09-15 22:21:10,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=351322.0, ans=0.025 2024-09-15 22:21:37,141 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.078e+02 2.237e+02 2.413e+02 3.960e+02, threshold=4.474e+02, percent-clipped=0.0 2024-09-15 22:21:43,088 INFO [train.py:1198] (0/2) Epoch 20, batch 2600, loss[loss=0.228, ctc_loss=0.1507, cr_loss=0.3863, over 20867.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1616, cr_loss=0.3809, over 4098207.90 frames. ], batch size: 57, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:21:55,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=351407.0, ans=0.025 2024-09-15 22:21:58,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=351435.3333333333, ans=0.125 2024-09-15 22:22:04,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=351435.3333333333, ans=0.125 2024-09-15 22:22:05,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=351435.3333333333, ans=0.07 2024-09-15 22:22:25,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=351463.6666666667, ans=0.04949747468305833 2024-09-15 22:22:37,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=351492.0, ans=0.125 2024-09-15 22:22:40,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=351492.0, ans=0.2 2024-09-15 22:22:48,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.33 vs. limit=10.0 2024-09-15 22:22:58,612 INFO [train.py:1198] (0/2) Epoch 20, batch 2650, loss[loss=0.2144, ctc_loss=0.1456, cr_loss=0.344, over 20956.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1622, cr_loss=0.3817, over 4086277.90 frames. ], batch size: 51, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:23:32,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351605.3333333333, ans=0.1 2024-09-15 22:24:08,414 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.040e+02 2.198e+02 2.355e+02 3.323e+02, threshold=4.396e+02, percent-clipped=0.0 2024-09-15 22:24:14,474 INFO [train.py:1198] (0/2) Epoch 20, batch 2700, loss[loss=0.2255, ctc_loss=0.1529, cr_loss=0.3631, over 21049.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.163, cr_loss=0.3826, over 4079660.68 frames. ], batch size: 53, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:24:37,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=351718.6666666667, ans=0.125 2024-09-15 22:25:07,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=351775.3333333333, ans=0.125 2024-09-15 22:25:30,153 INFO [train.py:1198] (0/2) Epoch 20, batch 2750, loss[loss=0.2386, ctc_loss=0.1602, cr_loss=0.3924, over 20975.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1633, cr_loss=0.3831, over 4081444.11 frames. ], batch size: 55, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:25:30,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=351832.0, ans=0.125 2024-09-15 22:26:05,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=351888.6666666667, ans=0.125 2024-09-15 22:26:40,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=351945.3333333333, ans=0.07 2024-09-15 22:26:47,012 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.065e+02 2.172e+02 2.321e+02 4.117e+02, threshold=4.344e+02, percent-clipped=0.0 2024-09-15 22:26:51,627 INFO [train.py:1198] (0/2) Epoch 20, batch 2800, loss[loss=0.1999, ctc_loss=0.1313, cr_loss=0.3429, over 20934.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1632, cr_loss=0.3834, over 4080981.14 frames. ], batch size: 49, lr: 4.18e-03, grad_scale: 32.0 2024-09-15 22:26:56,575 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:27:05,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=352002.0, ans=0.0 2024-09-15 22:27:06,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=22.5 2024-09-15 22:27:07,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=352002.0, ans=0.0 2024-09-15 22:27:09,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352002.0, ans=0.1 2024-09-15 22:27:43,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=352058.6666666667, ans=0.0 2024-09-15 22:27:54,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=352087.0, ans=0.125 2024-09-15 22:28:07,537 INFO [train.py:1198] (0/2) Epoch 20, batch 2850, loss[loss=0.257, ctc_loss=0.1768, cr_loss=0.401, over 20983.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1625, cr_loss=0.3821, over 4079451.31 frames. ], batch size: 55, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:28:09,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=352115.3333333333, ans=0.025 2024-09-15 22:28:16,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=352115.3333333333, ans=0.0 2024-09-15 22:28:24,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=352143.6666666667, ans=0.0 2024-09-15 22:28:27,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=352143.6666666667, ans=0.125 2024-09-15 22:28:30,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=352143.6666666667, ans=0.0 2024-09-15 22:29:15,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=352228.6666666667, ans=0.0 2024-09-15 22:29:20,128 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.073e+02 2.225e+02 2.423e+02 3.316e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 22:29:23,181 INFO [train.py:1198] (0/2) Epoch 20, batch 2900, loss[loss=0.2485, ctc_loss=0.1694, cr_loss=0.3957, over 20936.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.162, cr_loss=0.3817, over 4083787.66 frames. ], batch size: 60, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:29:26,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=352257.0, ans=0.125 2024-09-15 22:29:38,653 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 22:30:24,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2024-09-15 22:30:30,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=352370.3333333333, ans=0.2 2024-09-15 22:30:33,807 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=22.5 2024-09-15 22:30:37,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=352398.6666666667, ans=0.125 2024-09-15 22:30:38,940 INFO [train.py:1198] (0/2) Epoch 20, batch 2950, loss[loss=0.234, ctc_loss=0.1589, cr_loss=0.3752, over 21034.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1613, cr_loss=0.3816, over 4095532.98 frames. ], batch size: 63, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:31:08,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=352455.3333333333, ans=0.0 2024-09-15 22:31:21,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=352455.3333333333, ans=0.0 2024-09-15 22:31:51,724 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.098e+02 2.284e+02 2.558e+02 3.973e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-15 22:31:57,788 INFO [train.py:1198] (0/2) Epoch 20, batch 3000, loss[loss=0.2293, ctc_loss=0.1536, cr_loss=0.3783, over 20956.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.162, cr_loss=0.3823, over 4092608.24 frames. ], batch size: 55, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:31:57,789 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 22:32:21,356 INFO [train.py:1230] (0/2) Epoch 20, validation: loss=0.04399, ctc_loss=0.04399, cr_loss=1.053e-14, over 944034.00 frames. 2024-09-15 22:32:21,356 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 22:32:55,498 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-09-15 22:33:07,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-15 22:33:38,313 INFO [train.py:1198] (0/2) Epoch 20, batch 3050, loss[loss=0.2212, ctc_loss=0.1504, cr_loss=0.3539, over 20791.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1618, cr_loss=0.3824, over 4105417.11 frames. ], batch size: 56, lr: 4.18e-03, grad_scale: 16.0 2024-09-15 22:33:51,111 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=22.5 2024-09-15 22:34:13,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=352738.6666666667, ans=0.05 2024-09-15 22:34:40,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352795.3333333333, ans=0.1 2024-09-15 22:34:50,271 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.714e+02 2.014e+02 2.170e+02 2.369e+02 3.465e+02, threshold=4.339e+02, percent-clipped=0.0 2024-09-15 22:34:53,397 INFO [train.py:1198] (0/2) Epoch 20, batch 3100, loss[loss=0.2611, ctc_loss=0.1794, cr_loss=0.4085, over 20971.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1621, cr_loss=0.3824, over 4093415.72 frames. ], batch size: 64, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:34:53,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=352823.6666666667, ans=0.0 2024-09-15 22:35:05,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=352823.6666666667, ans=0.0 2024-09-15 22:35:11,064 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.16 vs. limit=15.0 2024-09-15 22:35:30,933 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-15 22:35:36,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=352880.3333333333, ans=0.125 2024-09-15 22:35:43,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=352908.6666666667, ans=0.125 2024-09-15 22:35:48,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=352908.6666666667, ans=0.0 2024-09-15 22:35:52,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=352937.0, ans=0.0 2024-09-15 22:35:54,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=352937.0, ans=0.125 2024-09-15 22:35:59,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=352937.0, ans=0.0 2024-09-15 22:36:03,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=352937.0, ans=0.125 2024-09-15 22:36:09,306 INFO [train.py:1198] (0/2) Epoch 20, batch 3150, loss[loss=0.2342, ctc_loss=0.1582, cr_loss=0.3801, over 20999.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1624, cr_loss=0.3829, over 4095056.85 frames. ], batch size: 64, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:36:09,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=352965.3333333333, ans=0.125 2024-09-15 22:36:11,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=352965.3333333333, ans=0.04949747468305833 2024-09-15 22:36:42,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=353022.0, ans=0.125 2024-09-15 22:37:04,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=353050.3333333333, ans=0.125 2024-09-15 22:37:07,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=22.5 2024-09-15 22:37:14,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=353078.6666666667, ans=0.0 2024-09-15 22:37:21,989 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.112e+02 2.247e+02 2.439e+02 2.946e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-15 22:37:24,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-09-15 22:37:25,138 INFO [train.py:1198] (0/2) Epoch 20, batch 3200, loss[loss=0.1943, ctc_loss=0.1299, cr_loss=0.3216, over 20970.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1615, cr_loss=0.3815, over 4092143.22 frames. ], batch size: 50, lr: 4.17e-03, grad_scale: 32.0 2024-09-15 22:37:36,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=353107.0, ans=0.125 2024-09-15 22:37:39,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-09-15 22:38:10,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353163.6666666667, ans=0.1 2024-09-15 22:38:13,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=353163.6666666667, ans=0.125 2024-09-15 22:38:28,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=353192.0, ans=0.125 2024-09-15 22:38:36,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=353220.3333333333, ans=0.125 2024-09-15 22:38:37,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=353220.3333333333, ans=0.125 2024-09-15 22:38:46,841 INFO [train.py:1198] (0/2) Epoch 20, batch 3250, loss[loss=0.2585, ctc_loss=0.1785, cr_loss=0.4004, over 20955.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1616, cr_loss=0.381, over 4096909.24 frames. ], batch size: 67, lr: 4.17e-03, grad_scale: 32.0 2024-09-15 22:39:07,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=353277.0, ans=0.0 2024-09-15 22:39:14,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=353277.0, ans=0.125 2024-09-15 22:40:00,581 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.071e+02 2.269e+02 2.505e+02 3.417e+02, threshold=4.537e+02, percent-clipped=0.0 2024-09-15 22:40:02,201 INFO [train.py:1198] (0/2) Epoch 20, batch 3300, loss[loss=0.2124, ctc_loss=0.1395, cr_loss=0.3644, over 20885.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1616, cr_loss=0.3808, over 4101848.20 frames. ], batch size: 54, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:41:13,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=353503.6666666667, ans=0.2 2024-09-15 22:41:17,133 INFO [train.py:1198] (0/2) Epoch 20, batch 3350, loss[loss=0.2479, ctc_loss=0.1727, cr_loss=0.376, over 20998.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1606, cr_loss=0.38, over 4115062.53 frames. ], batch size: 61, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:41:20,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=353532.0, ans=0.0 2024-09-15 22:41:23,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=353532.0, ans=0.2 2024-09-15 22:41:26,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=353532.0, ans=0.0 2024-09-15 22:41:52,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=353588.6666666667, ans=0.125 2024-09-15 22:41:52,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=353588.6666666667, ans=0.0 2024-09-15 22:41:55,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=353588.6666666667, ans=0.125 2024-09-15 22:42:26,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=353645.3333333333, ans=0.125 2024-09-15 22:42:31,054 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.077e+02 2.182e+02 2.343e+02 3.801e+02, threshold=4.364e+02, percent-clipped=0.0 2024-09-15 22:42:32,469 INFO [train.py:1198] (0/2) Epoch 20, batch 3400, loss[loss=0.2664, ctc_loss=0.1833, cr_loss=0.4158, over 20326.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1593, cr_loss=0.3777, over 4117134.81 frames. ], batch size: 74, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:42:34,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=353673.6666666667, ans=0.125 2024-09-15 22:42:50,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=353702.0, ans=0.2 2024-09-15 22:43:31,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353758.6666666667, ans=0.1 2024-09-15 22:43:53,668 INFO [train.py:1198] (0/2) Epoch 20, batch 3450, loss[loss=0.2177, ctc_loss=0.1479, cr_loss=0.349, over 20829.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1598, cr_loss=0.378, over 4115794.11 frames. ], batch size: 59, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:44:12,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=353843.6666666667, ans=0.0 2024-09-15 22:44:17,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=353843.6666666667, ans=0.125 2024-09-15 22:44:23,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=353872.0, ans=0.125 2024-09-15 22:44:37,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.44 vs. limit=15.0 2024-09-15 22:44:51,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=353900.3333333333, ans=0.125 2024-09-15 22:45:04,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2024-09-15 22:45:08,380 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.075e+02 2.215e+02 2.345e+02 3.245e+02, threshold=4.430e+02, percent-clipped=0.0 2024-09-15 22:45:09,920 INFO [train.py:1198] (0/2) Epoch 20, batch 3500, loss[loss=0.252, ctc_loss=0.1699, cr_loss=0.4107, over 20776.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1596, cr_loss=0.3782, over 4123351.92 frames. ], batch size: 56, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:45:19,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=353957.0, ans=0.0 2024-09-15 22:45:37,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=353985.3333333333, ans=0.1 2024-09-15 22:46:25,917 INFO [train.py:1198] (0/2) Epoch 20, batch 3550, loss[loss=0.1882, ctc_loss=0.1246, cr_loss=0.318, over 20966.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1597, cr_loss=0.3783, over 4114575.76 frames. ], batch size: 48, lr: 4.17e-03, grad_scale: 16.0 2024-09-15 22:46:47,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=354127.0, ans=0.2 2024-09-15 22:47:02,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=354155.3333333333, ans=0.0 2024-09-15 22:47:35,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=354212.0, ans=0.125 2024-09-15 22:47:39,228 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.109e+02 2.237e+02 2.459e+02 9.433e+02, threshold=4.474e+02, percent-clipped=1.0 2024-09-15 22:47:40,760 INFO [train.py:1198] (0/2) Epoch 20, batch 3600, loss[loss=0.269, ctc_loss=0.1848, cr_loss=0.4208, over 20825.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1621, cr_loss=0.3826, over 4093020.63 frames. ], batch size: 59, lr: 4.17e-03, grad_scale: 32.0 2024-09-15 22:47:53,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354240.3333333333, ans=0.1 2024-09-15 22:47:53,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=354240.3333333333, ans=0.2 2024-09-15 22:48:15,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354297.0, ans=0.1 2024-09-15 22:48:56,318 INFO [train.py:1198] (0/2) Epoch 20, batch 3650, loss[loss=0.2714, ctc_loss=0.1881, cr_loss=0.4163, over 18360.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1609, cr_loss=0.3807, over 4104093.95 frames. ], batch size: 108, lr: 4.17e-03, grad_scale: 32.0 2024-09-15 22:49:02,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=354382.0, ans=0.125 2024-09-15 22:49:18,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=354410.3333333333, ans=0.125 2024-09-15 22:49:23,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=354410.3333333333, ans=0.0 2024-09-15 22:49:26,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=354410.3333333333, ans=0.125 2024-09-15 22:49:42,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=354438.6666666667, ans=0.125 2024-09-15 22:50:16,619 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.039e+02 2.213e+02 2.325e+02 4.277e+02, threshold=4.426e+02, percent-clipped=0.0 2024-09-15 22:50:16,639 INFO [train.py:1198] (0/2) Epoch 20, batch 3700, loss[loss=0.2509, ctc_loss=0.1701, cr_loss=0.404, over 20685.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1611, cr_loss=0.3812, over 4101040.16 frames. ], batch size: 68, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:51:14,325 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.61 vs. limit=10.0 2024-09-15 22:51:31,640 INFO [train.py:1198] (0/2) Epoch 20, batch 3750, loss[loss=0.2103, ctc_loss=0.1403, cr_loss=0.3503, over 20970.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1623, cr_loss=0.3829, over 4089379.22 frames. ], batch size: 51, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:51:35,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=354665.3333333333, ans=0.125 2024-09-15 22:51:41,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=354665.3333333333, ans=0.125 2024-09-15 22:52:08,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=22.5 2024-09-15 22:52:21,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=354750.3333333333, ans=0.125 2024-09-15 22:52:47,401 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.063e+02 2.189e+02 2.332e+02 3.793e+02, threshold=4.378e+02, percent-clipped=0.0 2024-09-15 22:52:47,425 INFO [train.py:1198] (0/2) Epoch 20, batch 3800, loss[loss=0.2039, ctc_loss=0.1355, cr_loss=0.3421, over 20966.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1617, cr_loss=0.3822, over 4095999.83 frames. ], batch size: 50, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:52:58,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=354807.0, ans=0.125 2024-09-15 22:53:29,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=354863.6666666667, ans=0.125 2024-09-15 22:53:35,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=354892.0, ans=0.125 2024-09-15 22:53:53,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=354920.3333333333, ans=0.125 2024-09-15 22:54:02,842 INFO [train.py:1198] (0/2) Epoch 20, batch 3850, loss[loss=0.2009, ctc_loss=0.1376, cr_loss=0.3166, over 21076.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1611, cr_loss=0.381, over 4093023.99 frames. ], batch size: 53, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:54:22,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=354977.0, ans=0.05 2024-09-15 22:54:46,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2024-09-15 22:54:56,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=12.0 2024-09-15 22:55:15,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=355062.0, ans=0.125 2024-09-15 22:55:24,098 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.035e+02 2.181e+02 2.287e+02 3.384e+02, threshold=4.363e+02, percent-clipped=0.0 2024-09-15 22:55:24,118 INFO [train.py:1198] (0/2) Epoch 20, batch 3900, loss[loss=0.2499, ctc_loss=0.1682, cr_loss=0.4086, over 20966.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1613, cr_loss=0.3817, over 4093533.59 frames. ], batch size: 64, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:55:47,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-15 22:55:59,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=355147.0, ans=0.0 2024-09-15 22:56:00,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=355147.0, ans=0.125 2024-09-15 22:56:11,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=355175.3333333333, ans=0.125 2024-09-15 22:56:39,650 INFO [train.py:1198] (0/2) Epoch 20, batch 3950, loss[loss=0.2378, ctc_loss=0.1603, cr_loss=0.3872, over 21081.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1614, cr_loss=0.3825, over 4098792.41 frames. ], batch size: 59, lr: 4.16e-03, grad_scale: 16.0 2024-09-15 22:57:12,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=22.5 2024-09-15 22:57:13,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=355288.6666666667, ans=0.125 2024-09-15 22:57:47,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=355345.3333333333, ans=0.0 2024-09-15 22:57:55,548 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.069e+02 2.199e+02 2.358e+02 4.159e+02, threshold=4.398e+02, percent-clipped=0.0 2024-09-15 22:57:55,568 INFO [train.py:1198] (0/2) Epoch 20, batch 4000, loss[loss=0.2638, ctc_loss=0.1785, cr_loss=0.4267, over 20991.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1609, cr_loss=0.3817, over 4099029.06 frames. ], batch size: 55, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 22:58:11,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=355402.0, ans=0.125 2024-09-15 22:58:24,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=8.0 2024-09-15 22:58:48,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=355458.6666666667, ans=0.0 2024-09-15 22:59:11,167 INFO [train.py:1198] (0/2) Epoch 20, batch 4050, loss[loss=0.2112, ctc_loss=0.1389, cr_loss=0.3613, over 20973.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1615, cr_loss=0.3828, over 4097032.97 frames. ], batch size: 51, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 22:59:33,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=355543.6666666667, ans=0.025 2024-09-15 22:59:50,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=355572.0, ans=0.125 2024-09-15 23:00:13,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=355628.6666666667, ans=0.2 2024-09-15 23:00:27,025 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.111e+02 2.238e+02 2.395e+02 4.534e+02, threshold=4.475e+02, percent-clipped=1.0 2024-09-15 23:00:27,045 INFO [train.py:1198] (0/2) Epoch 20, batch 4100, loss[loss=0.2338, ctc_loss=0.1578, cr_loss=0.3802, over 20672.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.162, cr_loss=0.3836, over 4100869.34 frames. ], batch size: 68, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 23:01:24,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=355742.0, ans=0.125 2024-09-15 23:01:48,725 INFO [train.py:1198] (0/2) Epoch 20, batch 4150, loss[loss=0.2497, ctc_loss=0.1739, cr_loss=0.3792, over 20254.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1632, cr_loss=0.3852, over 4091363.58 frames. ], batch size: 74, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 23:02:04,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=355827.0, ans=0.2 2024-09-15 23:02:31,242 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2024-09-15 23:02:56,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=355912.0, ans=0.025 2024-09-15 23:03:04,819 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.101e+02 2.241e+02 2.524e+02 3.949e+02, threshold=4.482e+02, percent-clipped=0.0 2024-09-15 23:03:04,840 INFO [train.py:1198] (0/2) Epoch 20, batch 4200, loss[loss=0.2556, ctc_loss=0.1733, cr_loss=0.4116, over 20842.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1638, cr_loss=0.3866, over 4088815.57 frames. ], batch size: 65, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 23:03:13,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2024-09-15 23:03:51,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.23 vs. limit=15.0 2024-09-15 23:04:04,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=356053.6666666667, ans=0.2 2024-09-15 23:04:07,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=356053.6666666667, ans=0.125 2024-09-15 23:04:20,667 INFO [train.py:1198] (0/2) Epoch 20, batch 4250, loss[loss=0.2438, ctc_loss=0.1675, cr_loss=0.3814, over 21085.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1634, cr_loss=0.3866, over 4100566.58 frames. ], batch size: 59, lr: 4.16e-03, grad_scale: 32.0 2024-09-15 23:04:31,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=356082.0, ans=0.0 2024-09-15 23:04:59,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=356138.6666666667, ans=0.125 2024-09-15 23:05:25,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=356195.3333333333, ans=0.0 2024-09-15 23:05:27,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356195.3333333333, ans=0.1 2024-09-15 23:05:31,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=356195.3333333333, ans=0.0 2024-09-15 23:05:35,838 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.060e+02 2.153e+02 2.296e+02 3.155e+02, threshold=4.306e+02, percent-clipped=0.0 2024-09-15 23:05:35,857 INFO [train.py:1198] (0/2) Epoch 20, batch 4300, loss[loss=0.2228, ctc_loss=0.1502, cr_loss=0.3626, over 21012.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1624, cr_loss=0.3851, over 4099173.42 frames. ], batch size: 63, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:06:56,885 INFO [train.py:1198] (0/2) Epoch 20, batch 4350, loss[loss=0.2361, ctc_loss=0.1614, cr_loss=0.3733, over 20862.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1619, cr_loss=0.3839, over 4098586.60 frames. ], batch size: 65, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:07:07,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=356365.3333333333, ans=0.125 2024-09-15 23:07:31,701 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.85 vs. limit=6.0 2024-09-15 23:07:44,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=356450.3333333333, ans=0.2 2024-09-15 23:08:13,771 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.052e+02 2.190e+02 2.351e+02 3.242e+02, threshold=4.381e+02, percent-clipped=0.0 2024-09-15 23:08:13,791 INFO [train.py:1198] (0/2) Epoch 20, batch 4400, loss[loss=0.2266, ctc_loss=0.1474, cr_loss=0.3964, over 20957.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1612, cr_loss=0.383, over 4101784.01 frames. ], batch size: 49, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:08:22,259 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.25 vs. limit=10.0 2024-09-15 23:08:48,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-09-15 23:08:49,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-09-15 23:09:18,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356620.3333333333, ans=0.1 2024-09-15 23:09:29,508 INFO [train.py:1198] (0/2) Epoch 20, batch 4450, loss[loss=0.2293, ctc_loss=0.1554, cr_loss=0.3697, over 20985.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1605, cr_loss=0.3825, over 4115622.31 frames. ], batch size: 48, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:09:39,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=356648.6666666667, ans=0.025 2024-09-15 23:09:45,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=12.0 2024-09-15 23:09:52,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356677.0, ans=0.1 2024-09-15 23:09:57,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-15 23:10:10,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=356705.3333333333, ans=0.125 2024-09-15 23:10:18,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=356733.6666666667, ans=0.2 2024-09-15 23:10:39,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=356762.0, ans=0.125 2024-09-15 23:10:45,522 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.102e+02 2.235e+02 2.380e+02 3.140e+02, threshold=4.469e+02, percent-clipped=0.0 2024-09-15 23:10:45,541 INFO [train.py:1198] (0/2) Epoch 20, batch 4500, loss[loss=0.2172, ctc_loss=0.1428, cr_loss=0.3721, over 20974.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1602, cr_loss=0.3817, over 4114420.48 frames. ], batch size: 48, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:10:51,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356790.3333333333, ans=0.1 2024-09-15 23:10:58,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356790.3333333333, ans=0.1 2024-09-15 23:11:07,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-09-15 23:11:30,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=356875.3333333333, ans=0.125 2024-09-15 23:11:46,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=356903.6666666667, ans=0.2 2024-09-15 23:12:01,862 INFO [train.py:1198] (0/2) Epoch 20, batch 4550, loss[loss=0.265, ctc_loss=0.1809, cr_loss=0.4205, over 21016.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1611, cr_loss=0.3828, over 4105135.08 frames. ], batch size: 61, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:12:32,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=356960.3333333333, ans=0.2 2024-09-15 23:12:52,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=357017.0, ans=0.125 2024-09-15 23:12:56,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=357017.0, ans=0.0 2024-09-15 23:13:01,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357017.0, ans=0.1 2024-09-15 23:13:07,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=357045.3333333333, ans=0.0 2024-09-15 23:13:23,518 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.075e+02 2.185e+02 2.431e+02 3.897e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-15 23:13:23,547 INFO [train.py:1198] (0/2) Epoch 20, batch 4600, loss[loss=0.1998, ctc_loss=0.1347, cr_loss=0.3256, over 20942.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1617, cr_loss=0.3833, over 4105468.27 frames. ], batch size: 49, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:13:23,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357073.6666666667, ans=0.1 2024-09-15 23:13:28,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=357073.6666666667, ans=0.2 2024-09-15 23:13:34,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357073.6666666667, ans=0.1 2024-09-15 23:13:36,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=357073.6666666667, ans=0.125 2024-09-15 23:13:45,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=357102.0, ans=0.125 2024-09-15 23:13:45,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=22.5 2024-09-15 23:13:50,118 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-09-15 23:13:55,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=357130.3333333333, ans=0.125 2024-09-15 23:14:06,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=357130.3333333333, ans=0.0 2024-09-15 23:14:10,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357158.6666666667, ans=0.1 2024-09-15 23:14:17,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=357158.6666666667, ans=0.0 2024-09-15 23:14:21,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=357158.6666666667, ans=0.0 2024-09-15 23:14:28,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2024-09-15 23:14:39,537 INFO [train.py:1198] (0/2) Epoch 20, batch 4650, loss[loss=0.2398, ctc_loss=0.1626, cr_loss=0.3857, over 20899.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1616, cr_loss=0.3837, over 4111562.13 frames. ], batch size: 57, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:15:01,363 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-15 23:15:02,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=357243.6666666667, ans=0.125 2024-09-15 23:15:12,324 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-09-15 23:15:20,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=357272.0, ans=0.125 2024-09-15 23:15:54,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.105e+02 2.240e+02 2.465e+02 5.218e+02, threshold=4.479e+02, percent-clipped=1.0 2024-09-15 23:15:54,951 INFO [train.py:1198] (0/2) Epoch 20, batch 4700, loss[loss=0.2488, ctc_loss=0.1722, cr_loss=0.3829, over 20777.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.161, cr_loss=0.383, over 4107421.06 frames. ], batch size: 53, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:16:35,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=357413.6666666667, ans=0.125 2024-09-15 23:17:10,028 INFO [train.py:1198] (0/2) Epoch 20, batch 4750, loss[loss=0.1989, ctc_loss=0.1353, cr_loss=0.3181, over 20447.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.162, cr_loss=0.384, over 4094082.50 frames. ], batch size: 45, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:17:22,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=357498.6666666667, ans=0.09899494936611666 2024-09-15 23:17:22,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=357498.6666666667, ans=0.125 2024-09-15 23:17:52,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=357555.3333333333, ans=0.1 2024-09-15 23:18:28,460 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.067e+02 2.210e+02 2.422e+02 3.253e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-15 23:18:28,480 INFO [train.py:1198] (0/2) Epoch 20, batch 4800, loss[loss=0.2783, ctc_loss=0.1947, cr_loss=0.4179, over 18200.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.162, cr_loss=0.3841, over 4093957.41 frames. ], batch size: 108, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:18:33,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357640.3333333333, ans=0.1 2024-09-15 23:18:48,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=357668.6666666667, ans=0.1 2024-09-15 23:18:48,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357668.6666666667, ans=0.1 2024-09-15 23:18:54,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=357668.6666666667, ans=0.125 2024-09-15 23:19:27,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=357725.3333333333, ans=0.0 2024-09-15 23:19:42,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=357753.6666666667, ans=0.125 2024-09-15 23:19:46,404 INFO [train.py:1198] (0/2) Epoch 20, batch 4850, loss[loss=0.2723, ctc_loss=0.1841, cr_loss=0.4411, over 20653.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1622, cr_loss=0.384, over 4092596.43 frames. ], batch size: 66, lr: 4.15e-03, grad_scale: 32.0 2024-09-15 23:19:46,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=357782.0, ans=0.2 2024-09-15 23:19:55,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=357782.0, ans=0.04949747468305833 2024-09-15 23:20:09,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=357810.3333333333, ans=0.125 2024-09-15 23:20:16,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=357838.6666666667, ans=0.125 2024-09-15 23:20:42,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=22.5 2024-09-15 23:21:01,325 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.102e+02 2.286e+02 2.422e+02 6.933e+02, threshold=4.572e+02, percent-clipped=1.0 2024-09-15 23:21:01,344 INFO [train.py:1198] (0/2) Epoch 20, batch 4900, loss[loss=0.2099, ctc_loss=0.1393, cr_loss=0.3527, over 20969.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1618, cr_loss=0.383, over 4079788.21 frames. ], batch size: 48, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:21:32,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=357980.3333333333, ans=0.2 2024-09-15 23:22:15,469 INFO [train.py:1198] (0/2) Epoch 20, batch 4950, loss[loss=0.2183, ctc_loss=0.1478, cr_loss=0.3526, over 21057.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1616, cr_loss=0.3819, over 4066510.54 frames. ], batch size: 59, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:22:59,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=358150.3333333333, ans=0.0 2024-09-15 23:23:30,580 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.099e+02 2.250e+02 2.616e+02 5.235e+02, threshold=4.499e+02, percent-clipped=1.0 2024-09-15 23:23:30,599 INFO [train.py:1198] (0/2) Epoch 20, batch 5000, loss[loss=0.209, ctc_loss=0.1388, cr_loss=0.3509, over 20788.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1609, cr_loss=0.3813, over 4079848.31 frames. ], batch size: 53, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:23:32,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=358207.0, ans=0.125 2024-09-15 23:23:45,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=358235.3333333333, ans=0.5 2024-09-15 23:23:51,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-15 23:24:03,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=358263.6666666667, ans=0.1 2024-09-15 23:24:27,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358292.0, ans=0.1 2024-09-15 23:24:45,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2024-09-15 23:24:45,732 INFO [train.py:1198] (0/2) Epoch 20, batch 5050, loss[loss=0.2343, ctc_loss=0.1593, cr_loss=0.3752, over 21007.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1607, cr_loss=0.381, over 4086167.24 frames. ], batch size: 61, lr: 4.14e-03, grad_scale: 16.0 2024-09-15 23:24:51,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358348.6666666667, ans=0.1 2024-09-15 23:24:53,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=358348.6666666667, ans=0.0 2024-09-15 23:24:53,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=358348.6666666667, ans=0.125 2024-09-15 23:25:05,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=358377.0, ans=0.125 2024-09-15 23:26:03,409 INFO [train.py:1198] (0/2) Epoch 20, batch 5100, loss[loss=0.2542, ctc_loss=0.1756, cr_loss=0.3929, over 20778.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1613, cr_loss=0.3817, over 4076932.87 frames. ], batch size: 71, lr: 4.14e-03, grad_scale: 16.0 2024-09-15 23:26:04,757 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.062e+02 2.229e+02 2.442e+02 3.604e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-15 23:26:46,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2024-09-15 23:26:50,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=358575.3333333333, ans=0.0 2024-09-15 23:27:06,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=358603.6666666667, ans=0.05 2024-09-15 23:27:18,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=358603.6666666667, ans=0.0 2024-09-15 23:27:20,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=358632.0, ans=0.125 2024-09-15 23:27:21,179 INFO [train.py:1198] (0/2) Epoch 20, batch 5150, loss[loss=0.2408, ctc_loss=0.1644, cr_loss=0.3824, over 20851.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1605, cr_loss=0.3808, over 4091943.36 frames. ], batch size: 65, lr: 4.14e-03, grad_scale: 16.0 2024-09-15 23:28:14,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=358717.0, ans=0.0 2024-09-15 23:28:27,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=358745.3333333333, ans=0.125 2024-09-15 23:28:34,778 INFO [train.py:1198] (0/2) Epoch 20, batch 5200, loss[loss=0.2331, ctc_loss=0.1584, cr_loss=0.3734, over 20975.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1611, cr_loss=0.3816, over 4098957.64 frames. ], batch size: 55, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:28:36,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.708e+02 2.056e+02 2.149e+02 2.322e+02 4.240e+02, threshold=4.298e+02, percent-clipped=0.0 2024-09-15 23:28:46,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=358773.6666666667, ans=0.05 2024-09-15 23:28:52,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358802.0, ans=0.1 2024-09-15 23:28:57,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=358802.0, ans=0.0 2024-09-15 23:29:11,030 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=22.5 2024-09-15 23:29:21,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.57 vs. limit=22.5 2024-09-15 23:29:24,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=358858.6666666667, ans=0.125 2024-09-15 23:29:25,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=358858.6666666667, ans=0.125 2024-09-15 23:29:48,816 INFO [train.py:1198] (0/2) Epoch 20, batch 5250, loss[loss=0.2201, ctc_loss=0.1468, cr_loss=0.3668, over 20967.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1615, cr_loss=0.3817, over 4092248.05 frames. ], batch size: 50, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:30:46,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=359028.6666666667, ans=0.0 2024-09-15 23:31:02,605 INFO [train.py:1198] (0/2) Epoch 20, batch 5300, loss[loss=0.2309, ctc_loss=0.1558, cr_loss=0.3752, over 21024.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1617, cr_loss=0.3814, over 4076456.39 frames. ], batch size: 63, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:31:04,069 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.060e+02 2.201e+02 2.376e+02 3.191e+02, threshold=4.402e+02, percent-clipped=0.0 2024-09-15 23:31:11,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=359057.0, ans=0.0 2024-09-15 23:31:32,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=359113.6666666667, ans=0.125 2024-09-15 23:31:34,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=359113.6666666667, ans=0.125 2024-09-15 23:31:43,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=12.0 2024-09-15 23:31:44,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=359113.6666666667, ans=0.125 2024-09-15 23:31:50,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=359142.0, ans=0.125 2024-09-15 23:31:55,896 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.04 vs. limit=12.0 2024-09-15 23:32:12,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=359170.3333333333, ans=0.125 2024-09-15 23:32:16,991 INFO [train.py:1198] (0/2) Epoch 20, batch 5350, loss[loss=0.2186, ctc_loss=0.1455, cr_loss=0.3655, over 20997.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1619, cr_loss=0.382, over 4078886.91 frames. ], batch size: 55, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:32:28,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=359198.6666666667, ans=0.0 2024-09-15 23:32:40,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=359227.0, ans=0.125 2024-09-15 23:32:51,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=359255.3333333333, ans=0.0 2024-09-15 23:32:52,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=359255.3333333333, ans=0.1 2024-09-15 23:32:58,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=359255.3333333333, ans=0.025 2024-09-15 23:33:32,031 INFO [train.py:1198] (0/2) Epoch 20, batch 5400, loss[loss=0.2666, ctc_loss=0.1822, cr_loss=0.4219, over 19325.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.161, cr_loss=0.3806, over 4071288.69 frames. ], batch size: 90, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:33:33,549 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.056e+02 2.189e+02 2.476e+02 3.815e+02, threshold=4.379e+02, percent-clipped=0.0 2024-09-15 23:34:12,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=359397.0, ans=0.025 2024-09-15 23:34:29,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=359453.6666666667, ans=0.125 2024-09-15 23:34:45,927 INFO [train.py:1198] (0/2) Epoch 20, batch 5450, loss[loss=0.2013, ctc_loss=0.1352, cr_loss=0.3305, over 20977.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1606, cr_loss=0.3795, over 4064044.73 frames. ], batch size: 48, lr: 4.14e-03, grad_scale: 32.0 2024-09-15 23:35:17,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=359538.6666666667, ans=0.125 2024-09-15 23:35:53,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=359595.3333333333, ans=0.125 2024-09-15 23:36:06,311 INFO [train.py:1198] (0/2) Epoch 20, batch 5500, loss[loss=0.2441, ctc_loss=0.1651, cr_loss=0.3953, over 20970.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1611, cr_loss=0.3805, over 4065411.33 frames. ], batch size: 58, lr: 4.13e-03, grad_scale: 16.0 2024-09-15 23:36:09,382 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.072e+02 2.194e+02 2.346e+02 3.622e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-15 23:36:14,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=359623.6666666667, ans=0.2 2024-09-15 23:36:17,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=359623.6666666667, ans=0.0 2024-09-15 23:36:26,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=359652.0, ans=0.125 2024-09-15 23:36:30,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=359652.0, ans=0.0 2024-09-15 23:36:44,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-09-15 23:36:56,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359708.6666666667, ans=0.1 2024-09-15 23:37:06,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=359737.0, ans=0.2 2024-09-15 23:37:13,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=359737.0, ans=0.125 2024-09-15 23:37:17,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359737.0, ans=0.1 2024-09-15 23:37:21,068 INFO [train.py:1198] (0/2) Epoch 20, batch 5550, loss[loss=0.2326, ctc_loss=0.1561, cr_loss=0.3825, over 20775.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1611, cr_loss=0.3812, over 4074866.61 frames. ], batch size: 56, lr: 4.13e-03, grad_scale: 16.0 2024-09-15 23:38:35,409 INFO [train.py:1198] (0/2) Epoch 20, batch 5600, loss[loss=0.2078, ctc_loss=0.14, cr_loss=0.3392, over 20960.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1604, cr_loss=0.3806, over 4082547.23 frames. ], batch size: 48, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:38:38,492 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.100e+02 2.212e+02 2.372e+02 4.536e+02, threshold=4.424e+02, percent-clipped=1.0 2024-09-15 23:38:38,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=359907.0, ans=0.0 2024-09-15 23:38:40,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=359907.0, ans=0.0 2024-09-15 23:38:41,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=359907.0, ans=0.025 2024-09-15 23:39:03,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=359935.3333333333, ans=0.125 2024-09-15 23:39:16,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=359963.6666666667, ans=0.025 2024-09-15 23:39:33,207 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 23:39:50,369 INFO [train.py:1198] (0/2) Epoch 20, batch 5650, loss[loss=0.2725, ctc_loss=0.1853, cr_loss=0.4358, over 21027.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1601, cr_loss=0.3805, over 4087694.75 frames. ], batch size: 62, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:39:55,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=360048.6666666667, ans=0.125 2024-09-15 23:40:47,257 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2024-09-15 23:40:51,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.32 vs. limit=10.0 2024-09-15 23:40:54,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=360162.0, ans=0.025 2024-09-15 23:41:04,317 INFO [train.py:1198] (0/2) Epoch 20, batch 5700, loss[loss=0.2622, ctc_loss=0.1813, cr_loss=0.4044, over 20700.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.16, cr_loss=0.3808, over 4101983.69 frames. ], batch size: 66, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:41:07,167 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.066e+02 2.213e+02 2.381e+02 6.070e+02, threshold=4.426e+02, percent-clipped=1.0 2024-09-15 23:41:21,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=22.5 2024-09-15 23:41:50,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=360275.3333333333, ans=0.125 2024-09-15 23:42:18,299 INFO [train.py:1198] (0/2) Epoch 20, batch 5750, loss[loss=0.2116, ctc_loss=0.1403, cr_loss=0.3562, over 20940.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1607, cr_loss=0.3815, over 4093977.08 frames. ], batch size: 49, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:42:27,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=360332.0, ans=0.0 2024-09-15 23:42:39,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=360360.3333333333, ans=0.125 2024-09-15 23:42:39,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=360360.3333333333, ans=0.025 2024-09-15 23:43:03,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=360417.0, ans=0.2 2024-09-15 23:43:09,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2024-09-15 23:43:32,357 INFO [train.py:1198] (0/2) Epoch 20, batch 5800, loss[loss=0.2411, ctc_loss=0.1634, cr_loss=0.3886, over 20966.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1611, cr_loss=0.3823, over 4089117.14 frames. ], batch size: 58, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:43:35,326 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.103e+02 2.226e+02 2.380e+02 4.136e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-15 23:43:49,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=360502.0, ans=0.0 2024-09-15 23:43:49,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=360502.0, ans=0.125 2024-09-15 23:44:01,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=360530.3333333333, ans=0.125 2024-09-15 23:44:17,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=360558.6666666667, ans=0.04949747468305833 2024-09-15 23:44:25,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=360558.6666666667, ans=0.0 2024-09-15 23:44:49,841 INFO [train.py:1198] (0/2) Epoch 20, batch 5850, loss[loss=0.2293, ctc_loss=0.1562, cr_loss=0.3656, over 20114.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1617, cr_loss=0.3832, over 4090355.42 frames. ], batch size: 80, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:45:03,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=360643.6666666667, ans=0.035 2024-09-15 23:45:15,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-09-15 23:45:30,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=360672.0, ans=0.125 2024-09-15 23:45:41,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.87 vs. limit=10.0 2024-09-15 23:45:51,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.02 vs. limit=10.0 2024-09-15 23:45:53,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2024-09-15 23:46:07,111 INFO [train.py:1198] (0/2) Epoch 20, batch 5900, loss[loss=0.2293, ctc_loss=0.1544, cr_loss=0.3743, over 20962.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1617, cr_loss=0.3832, over 4092284.53 frames. ], batch size: 58, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:46:10,069 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.107e+02 2.232e+02 2.415e+02 3.409e+02, threshold=4.465e+02, percent-clipped=0.0 2024-09-15 23:46:50,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=360842.0, ans=0.0 2024-09-15 23:46:52,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360842.0, ans=0.1 2024-09-15 23:46:53,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=360842.0, ans=0.0 2024-09-15 23:46:56,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=360842.0, ans=0.125 2024-09-15 23:47:12,001 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2024-09-15 23:47:21,590 INFO [train.py:1198] (0/2) Epoch 20, batch 5950, loss[loss=0.2281, ctc_loss=0.1541, cr_loss=0.37, over 20975.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1608, cr_loss=0.3813, over 4096711.90 frames. ], batch size: 58, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:47:25,841 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-15 23:47:27,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=360898.6666666667, ans=0.125 2024-09-15 23:47:41,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=360927.0, ans=0.05 2024-09-15 23:47:45,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=360927.0, ans=0.0 2024-09-15 23:47:48,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=360927.0, ans=0.0 2024-09-15 23:47:48,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-09-15 23:48:08,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-15 23:48:12,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=360983.6666666667, ans=0.125 2024-09-15 23:48:35,934 INFO [train.py:1198] (0/2) Epoch 20, batch 6000, loss[loss=0.25, ctc_loss=0.1714, cr_loss=0.393, over 20944.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1606, cr_loss=0.3804, over 4105308.72 frames. ], batch size: 60, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:48:35,935 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 23:48:59,843 INFO [train.py:1230] (0/2) Epoch 20, validation: loss=0.044, ctc_loss=0.044, cr_loss=1.069e-14, over 944034.00 frames. 2024-09-15 23:48:59,844 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-15 23:49:02,723 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.716e+02 2.070e+02 2.162e+02 2.360e+02 3.870e+02, threshold=4.325e+02, percent-clipped=0.0 2024-09-15 23:49:07,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=361040.3333333333, ans=0.125 2024-09-15 23:49:29,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=15.0 2024-09-15 23:49:53,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=361125.3333333333, ans=0.2 2024-09-15 23:50:13,709 INFO [train.py:1198] (0/2) Epoch 20, batch 6050, loss[loss=0.204, ctc_loss=0.136, cr_loss=0.3399, over 20940.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1615, cr_loss=0.382, over 4102949.50 frames. ], batch size: 48, lr: 4.13e-03, grad_scale: 32.0 2024-09-15 23:50:53,546 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.04 vs. limit=6.0 2024-09-15 23:51:28,464 INFO [train.py:1198] (0/2) Epoch 20, batch 6100, loss[loss=0.2402, ctc_loss=0.1612, cr_loss=0.3947, over 21050.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1616, cr_loss=0.3816, over 4088264.44 frames. ], batch size: 62, lr: 4.13e-03, grad_scale: 16.0 2024-09-15 23:51:32,769 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.092e+02 2.238e+02 2.378e+02 3.005e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-15 23:51:40,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=361323.6666666667, ans=10.0 2024-09-15 23:51:52,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=361352.0, ans=0.07 2024-09-15 23:51:57,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.08 vs. limit=15.0 2024-09-15 23:52:05,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=361380.3333333333, ans=0.0 2024-09-15 23:52:10,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=361380.3333333333, ans=0.2 2024-09-15 23:52:43,177 INFO [train.py:1198] (0/2) Epoch 20, batch 6150, loss[loss=0.254, ctc_loss=0.173, cr_loss=0.4046, over 20756.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1613, cr_loss=0.3813, over 4079789.42 frames. ], batch size: 53, lr: 4.12e-03, grad_scale: 16.0 2024-09-15 23:52:46,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=361465.3333333333, ans=0.0 2024-09-15 23:53:50,800 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-15 23:53:59,415 INFO [train.py:1198] (0/2) Epoch 20, batch 6200, loss[loss=0.2532, ctc_loss=0.1691, cr_loss=0.4204, over 21062.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1623, cr_loss=0.3821, over 4041814.47 frames. ], batch size: 56, lr: 4.12e-03, grad_scale: 16.0 2024-09-15 23:54:01,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=361607.0, ans=0.125 2024-09-15 23:54:03,815 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.088e+02 2.244e+02 2.482e+02 3.656e+02, threshold=4.489e+02, percent-clipped=0.0 2024-09-15 23:54:41,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=361663.6666666667, ans=0.2 2024-09-15 23:54:55,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=361692.0, ans=0.0 2024-09-15 23:55:07,439 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-15 23:55:14,122 INFO [train.py:1198] (0/2) Epoch 20, batch 6250, loss[loss=0.2639, ctc_loss=0.1816, cr_loss=0.4113, over 21003.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1626, cr_loss=0.3823, over 4016630.70 frames. ], batch size: 63, lr: 4.12e-03, grad_scale: 16.0 2024-09-15 23:55:38,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=361777.0, ans=0.0 2024-09-15 23:55:57,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2024-09-15 23:56:04,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=361833.6666666667, ans=0.125 2024-09-15 23:56:16,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=361862.0, ans=0.125 2024-09-15 23:56:21,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=361862.0, ans=0.125 2024-09-15 23:56:26,858 INFO [train.py:1198] (0/2) Epoch 20, batch 6300, loss[loss=0.2881, ctc_loss=0.1978, cr_loss=0.4515, over 18086.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1647, cr_loss=0.3835, over 3951204.50 frames. ], batch size: 108, lr: 4.12e-03, grad_scale: 16.0 2024-09-15 23:56:31,244 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.191e+02 2.344e+02 2.616e+02 4.767e+02, threshold=4.689e+02, percent-clipped=1.0 2024-09-15 23:56:40,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=361918.6666666667, ans=0.125 2024-09-15 23:56:45,766 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=12.0 2024-09-15 23:56:59,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361947.0, ans=0.1 2024-09-15 23:57:37,900 INFO [train.py:1198] (0/2) Epoch 20, batch 6350, loss[loss=0.2636, ctc_loss=0.1868, cr_loss=0.3839, over 14012.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1704, cr_loss=0.3882, over 3809784.14 frames. ], batch size: 149, lr: 4.12e-03, grad_scale: 8.0 2024-09-15 23:57:38,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=362032.0, ans=0.125 2024-09-15 23:57:38,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=362032.0, ans=0.1 2024-09-15 23:58:17,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=362088.6666666667, ans=0.125 2024-09-15 23:58:28,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=12.0 2024-09-15 23:58:37,148 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-20.pt 2024-09-15 23:59:24,867 INFO [train.py:1198] (0/2) Epoch 21, batch 0, loss[loss=0.2073, ctc_loss=0.1391, cr_loss=0.3407, over 20941.00 frames. ], tot_loss[loss=0.2073, ctc_loss=0.1391, cr_loss=0.3407, over 20941.00 frames. ], batch size: 48, lr: 4.02e-03, grad_scale: 16.0 2024-09-15 23:59:24,868 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-15 23:59:46,392 INFO [train.py:1230] (0/2) Epoch 21, validation: loss=0.04406, ctc_loss=0.04406, cr_loss=1.047e-14, over 944034.00 frames. 2024-09-15 23:59:46,392 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 00:00:04,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=362176.5, ans=0.2 2024-09-16 00:00:05,658 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.361e+02 2.567e+02 2.706e+02 3.216e+02, threshold=5.134e+02, percent-clipped=0.0 2024-09-16 00:00:33,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=362233.1666666667, ans=0.125 2024-09-16 00:00:37,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=362233.1666666667, ans=0.09899494936611666 2024-09-16 00:01:02,033 INFO [train.py:1198] (0/2) Epoch 21, batch 50, loss[loss=0.2098, ctc_loss=0.1394, cr_loss=0.3522, over 19898.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.163, cr_loss=0.3846, over 920624.93 frames. ], batch size: 44, lr: 4.02e-03, grad_scale: 16.0 2024-09-16 00:01:08,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=362289.8333333333, ans=0.125 2024-09-16 00:01:12,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=362289.8333333333, ans=0.125 2024-09-16 00:02:04,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=362403.1666666667, ans=0.125 2024-09-16 00:02:17,014 INFO [train.py:1198] (0/2) Epoch 21, batch 100, loss[loss=0.2495, ctc_loss=0.1702, cr_loss=0.3963, over 19540.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1616, cr_loss=0.3829, over 1624344.05 frames. ], batch size: 90, lr: 4.02e-03, grad_scale: 16.0 2024-09-16 00:02:31,354 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.57 vs. limit=10.0 2024-09-16 00:02:36,905 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.076e+02 2.185e+02 2.385e+02 3.441e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-16 00:03:06,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=362516.5, ans=0.125 2024-09-16 00:03:06,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=362516.5, ans=0.125 2024-09-16 00:03:12,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=362516.5, ans=0.04949747468305833 2024-09-16 00:03:15,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=362516.5, ans=0.125 2024-09-16 00:03:27,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=362544.8333333333, ans=0.2 2024-09-16 00:03:30,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=362544.8333333333, ans=0.125 2024-09-16 00:03:33,164 INFO [train.py:1198] (0/2) Epoch 21, batch 150, loss[loss=0.1946, ctc_loss=0.1292, cr_loss=0.3271, over 20977.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1607, cr_loss=0.3817, over 2170944.44 frames. ], batch size: 52, lr: 4.02e-03, grad_scale: 16.0 2024-09-16 00:04:03,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=362629.8333333333, ans=0.125 2024-09-16 00:04:22,101 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-128000.pt 2024-09-16 00:04:34,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=362686.5, ans=0.125 2024-09-16 00:04:41,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=362686.5, ans=0.125 2024-09-16 00:04:47,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.09 vs. limit=15.0 2024-09-16 00:04:50,582 INFO [train.py:1198] (0/2) Epoch 21, batch 200, loss[loss=0.219, ctc_loss=0.1461, cr_loss=0.3645, over 20781.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1601, cr_loss=0.3818, over 2596674.31 frames. ], batch size: 53, lr: 4.02e-03, grad_scale: 16.0 2024-09-16 00:04:50,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=362714.8333333333, ans=0.125 2024-09-16 00:04:55,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=362714.8333333333, ans=0.125 2024-09-16 00:05:10,045 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.755e+02 2.091e+02 2.185e+02 2.311e+02 3.861e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-16 00:05:25,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=362771.5, ans=0.015 2024-09-16 00:06:07,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=362828.1666666667, ans=0.04949747468305833 2024-09-16 00:06:13,739 INFO [train.py:1198] (0/2) Epoch 21, batch 250, loss[loss=0.2438, ctc_loss=0.165, cr_loss=0.394, over 20113.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1603, cr_loss=0.3831, over 2937195.63 frames. ], batch size: 80, lr: 4.02e-03, grad_scale: 16.0 2024-09-16 00:07:29,595 INFO [train.py:1198] (0/2) Epoch 21, batch 300, loss[loss=0.2524, ctc_loss=0.175, cr_loss=0.3869, over 20975.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1606, cr_loss=0.3829, over 3202365.62 frames. ], batch size: 64, lr: 4.01e-03, grad_scale: 16.0 2024-09-16 00:07:30,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2024-09-16 00:07:49,192 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.066e+02 2.181e+02 2.306e+02 4.450e+02, threshold=4.363e+02, percent-clipped=1.0 2024-09-16 00:08:13,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=363083.1666666667, ans=0.125 2024-09-16 00:08:21,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=363083.1666666667, ans=0.125 2024-09-16 00:08:45,081 INFO [train.py:1198] (0/2) Epoch 21, batch 350, loss[loss=0.2359, ctc_loss=0.1597, cr_loss=0.3807, over 20982.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1602, cr_loss=0.3831, over 3406790.23 frames. ], batch size: 58, lr: 4.01e-03, grad_scale: 16.0 2024-09-16 00:09:22,372 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2024-09-16 00:09:41,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=363224.8333333333, ans=0.125 2024-09-16 00:09:44,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=363253.1666666667, ans=0.125 2024-09-16 00:09:56,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=363253.1666666667, ans=0.125 2024-09-16 00:10:00,774 INFO [train.py:1198] (0/2) Epoch 21, batch 400, loss[loss=0.2485, ctc_loss=0.1676, cr_loss=0.4047, over 20985.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1606, cr_loss=0.383, over 3542180.38 frames. ], batch size: 58, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:10:03,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-09-16 00:10:08,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=363281.5, ans=0.04949747468305833 2024-09-16 00:10:13,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363281.5, ans=0.1 2024-09-16 00:10:20,461 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.081e+02 2.215e+02 2.452e+02 3.087e+02, threshold=4.431e+02, percent-clipped=0.0 2024-09-16 00:10:44,099 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.32 vs. limit=22.5 2024-09-16 00:10:55,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=363366.5, ans=0.1 2024-09-16 00:11:22,599 INFO [train.py:1198] (0/2) Epoch 21, batch 450, loss[loss=0.2572, ctc_loss=0.175, cr_loss=0.4112, over 20967.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1604, cr_loss=0.3825, over 3664131.92 frames. ], batch size: 58, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:11:47,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=363451.5, ans=0.025 2024-09-16 00:12:05,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.47 vs. limit=15.0 2024-09-16 00:12:36,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=363564.8333333333, ans=0.125 2024-09-16 00:12:37,914 INFO [train.py:1198] (0/2) Epoch 21, batch 500, loss[loss=0.2394, ctc_loss=0.1637, cr_loss=0.3788, over 20982.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1612, cr_loss=0.3839, over 3751612.42 frames. ], batch size: 67, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:12:57,407 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.112e+02 2.252e+02 2.412e+02 5.083e+02, threshold=4.504e+02, percent-clipped=2.0 2024-09-16 00:13:03,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=363593.1666666667, ans=0.035 2024-09-16 00:13:09,338 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:13:19,036 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=22.5 2024-09-16 00:13:26,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=363649.8333333333, ans=0.2 2024-09-16 00:13:38,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363678.1666666667, ans=0.1 2024-09-16 00:13:53,133 INFO [train.py:1198] (0/2) Epoch 21, batch 550, loss[loss=0.2212, ctc_loss=0.1497, cr_loss=0.3572, over 19873.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1611, cr_loss=0.3835, over 3821105.60 frames. ], batch size: 44, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:13:56,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363706.5, ans=0.1 2024-09-16 00:14:03,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=363706.5, ans=0.125 2024-09-16 00:14:12,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=363734.8333333333, ans=0.125 2024-09-16 00:14:26,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=363763.1666666667, ans=0.125 2024-09-16 00:15:07,660 INFO [train.py:1198] (0/2) Epoch 21, batch 600, loss[loss=0.2042, ctc_loss=0.1344, cr_loss=0.3493, over 20999.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1613, cr_loss=0.3837, over 3888692.37 frames. ], batch size: 52, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:15:09,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=363848.1666666667, ans=0.0 2024-09-16 00:15:27,017 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.055e+02 2.279e+02 2.491e+02 5.267e+02, threshold=4.558e+02, percent-clipped=1.0 2024-09-16 00:15:29,304 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-16 00:15:36,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=363904.8333333333, ans=0.0 2024-09-16 00:16:23,292 INFO [train.py:1198] (0/2) Epoch 21, batch 650, loss[loss=0.2213, ctc_loss=0.1476, cr_loss=0.3688, over 20793.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.161, cr_loss=0.3831, over 3936475.29 frames. ], batch size: 53, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:16:36,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=364018.1666666667, ans=0.125 2024-09-16 00:16:50,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=364018.1666666667, ans=0.0 2024-09-16 00:17:07,766 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-16 00:17:07,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=15.0 2024-09-16 00:17:18,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-09-16 00:17:45,036 INFO [train.py:1198] (0/2) Epoch 21, batch 700, loss[loss=0.2513, ctc_loss=0.1718, cr_loss=0.3976, over 20841.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1609, cr_loss=0.3831, over 3982964.15 frames. ], batch size: 65, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:17:48,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=364131.5, ans=0.125 2024-09-16 00:17:49,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=364131.5, ans=0.04949747468305833 2024-09-16 00:18:04,453 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.115e+02 2.240e+02 2.424e+02 3.755e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-16 00:18:06,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=364159.8333333333, ans=0.125 2024-09-16 00:18:10,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=364159.8333333333, ans=0.125 2024-09-16 00:18:18,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=22.5 2024-09-16 00:19:00,234 INFO [train.py:1198] (0/2) Epoch 21, batch 750, loss[loss=0.2291, ctc_loss=0.1532, cr_loss=0.3795, over 21024.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1615, cr_loss=0.3838, over 3982447.80 frames. ], batch size: 63, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:19:12,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=364273.1666666667, ans=0.125 2024-09-16 00:19:34,142 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-09-16 00:19:48,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=364358.1666666667, ans=0.0 2024-09-16 00:20:08,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364386.5, ans=0.1 2024-09-16 00:20:15,743 INFO [train.py:1198] (0/2) Epoch 21, batch 800, loss[loss=0.2218, ctc_loss=0.1482, cr_loss=0.3681, over 21033.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1615, cr_loss=0.3836, over 4013973.78 frames. ], batch size: 63, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:20:32,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=364443.1666666667, ans=0.0 2024-09-16 00:20:33,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=364443.1666666667, ans=0.125 2024-09-16 00:20:36,648 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.112e+02 2.255e+02 2.439e+02 3.218e+02, threshold=4.510e+02, percent-clipped=0.0 2024-09-16 00:21:10,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2024-09-16 00:21:13,359 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:21:25,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=364528.1666666667, ans=0.1 2024-09-16 00:21:28,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=364528.1666666667, ans=0.0 2024-09-16 00:21:30,944 INFO [train.py:1198] (0/2) Epoch 21, batch 850, loss[loss=0.2297, ctc_loss=0.1568, cr_loss=0.3647, over 21020.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1611, cr_loss=0.3826, over 4041038.94 frames. ], batch size: 63, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:22:35,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=364669.8333333333, ans=0.125 2024-09-16 00:22:37,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=364669.8333333333, ans=0.0 2024-09-16 00:22:52,355 INFO [train.py:1198] (0/2) Epoch 21, batch 900, loss[loss=0.2516, ctc_loss=0.1725, cr_loss=0.3952, over 20650.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1612, cr_loss=0.3827, over 4057113.17 frames. ], batch size: 66, lr: 4.01e-03, grad_scale: 32.0 2024-09-16 00:22:54,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364698.1666666667, ans=0.1 2024-09-16 00:22:54,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=364698.1666666667, ans=0.125 2024-09-16 00:22:54,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=364698.1666666667, ans=0.04949747468305833 2024-09-16 00:23:06,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=364726.5, ans=0.125 2024-09-16 00:23:13,332 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.096e+02 2.211e+02 2.363e+02 4.258e+02, threshold=4.422e+02, percent-clipped=0.0 2024-09-16 00:23:30,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=364754.8333333333, ans=0.1 2024-09-16 00:23:44,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=364783.1666666667, ans=0.125 2024-09-16 00:23:50,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=364783.1666666667, ans=0.025 2024-09-16 00:24:05,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=364811.5, ans=0.125 2024-09-16 00:24:07,611 INFO [train.py:1198] (0/2) Epoch 21, batch 950, loss[loss=0.206, ctc_loss=0.1383, cr_loss=0.3382, over 21044.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1616, cr_loss=0.3831, over 4059035.20 frames. ], batch size: 56, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:24:47,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=364896.5, ans=0.2 2024-09-16 00:25:23,200 INFO [train.py:1198] (0/2) Epoch 21, batch 1000, loss[loss=0.2486, ctc_loss=0.1706, cr_loss=0.3902, over 19631.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1609, cr_loss=0.3819, over 4061868.90 frames. ], batch size: 90, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:25:33,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2024-09-16 00:25:44,008 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.762e+02 2.058e+02 2.174e+02 2.304e+02 5.292e+02, threshold=4.348e+02, percent-clipped=1.0 2024-09-16 00:25:58,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=365038.1666666667, ans=15.0 2024-09-16 00:26:39,016 INFO [train.py:1198] (0/2) Epoch 21, batch 1050, loss[loss=0.2259, ctc_loss=0.1514, cr_loss=0.3724, over 20777.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.161, cr_loss=0.3822, over 4071103.27 frames. ], batch size: 53, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:27:20,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=365179.8333333333, ans=0.05 2024-09-16 00:27:28,288 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-09-16 00:27:54,914 INFO [train.py:1198] (0/2) Epoch 21, batch 1100, loss[loss=0.2294, ctc_loss=0.1539, cr_loss=0.3777, over 20961.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1602, cr_loss=0.3809, over 4074654.00 frames. ], batch size: 64, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:28:18,892 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.044e+02 2.214e+02 2.380e+02 3.122e+02, threshold=4.428e+02, percent-clipped=0.0 2024-09-16 00:28:53,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=365349.8333333333, ans=0.125 2024-09-16 00:29:16,237 INFO [train.py:1198] (0/2) Epoch 21, batch 1150, loss[loss=0.2764, ctc_loss=0.1974, cr_loss=0.3954, over 14117.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1603, cr_loss=0.3805, over 4074763.36 frames. ], batch size: 150, lr: 4.00e-03, grad_scale: 16.0 2024-09-16 00:29:18,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=365406.5, ans=0.125 2024-09-16 00:29:21,664 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.27 vs. limit=10.0 2024-09-16 00:29:33,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=365434.8333333333, ans=0.125 2024-09-16 00:29:42,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=365434.8333333333, ans=0.2 2024-09-16 00:29:48,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=365463.1666666667, ans=0.0 2024-09-16 00:29:48,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=365463.1666666667, ans=0.125 2024-09-16 00:30:10,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=365491.5, ans=0.0 2024-09-16 00:30:31,611 INFO [train.py:1198] (0/2) Epoch 21, batch 1200, loss[loss=0.2482, ctc_loss=0.1731, cr_loss=0.3758, over 20656.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1607, cr_loss=0.3812, over 4087521.85 frames. ], batch size: 66, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:30:36,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=365548.1666666667, ans=0.125 2024-09-16 00:30:44,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=365548.1666666667, ans=0.2 2024-09-16 00:30:54,646 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.073e+02 2.184e+02 2.405e+02 3.363e+02, threshold=4.367e+02, percent-clipped=0.0 2024-09-16 00:30:59,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365576.5, ans=0.1 2024-09-16 00:31:01,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=365604.8333333333, ans=0.05 2024-09-16 00:31:04,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=365604.8333333333, ans=0.1 2024-09-16 00:31:30,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=8.0 2024-09-16 00:31:47,861 INFO [train.py:1198] (0/2) Epoch 21, batch 1250, loss[loss=0.2487, ctc_loss=0.1723, cr_loss=0.3822, over 20839.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1608, cr_loss=0.3815, over 4090171.08 frames. ], batch size: 59, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:32:27,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=365746.5, ans=0.125 2024-09-16 00:32:47,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365803.1666666667, ans=0.1 2024-09-16 00:33:02,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=365831.5, ans=0.125 2024-09-16 00:33:03,733 INFO [train.py:1198] (0/2) Epoch 21, batch 1300, loss[loss=0.2186, ctc_loss=0.1459, cr_loss=0.3633, over 20775.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1616, cr_loss=0.3818, over 4073904.83 frames. ], batch size: 53, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:33:08,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=365831.5, ans=0.125 2024-09-16 00:33:13,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-16 00:33:14,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=365831.5, ans=0.2 2024-09-16 00:33:25,833 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.095e+02 2.206e+02 2.422e+02 5.284e+02, threshold=4.411e+02, percent-clipped=2.0 2024-09-16 00:34:05,536 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:34:21,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.23 vs. limit=10.0 2024-09-16 00:34:24,994 INFO [train.py:1198] (0/2) Epoch 21, batch 1350, loss[loss=0.187, ctc_loss=0.1251, cr_loss=0.3094, over 20989.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1614, cr_loss=0.3812, over 4068553.76 frames. ], batch size: 51, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:34:34,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=365973.1666666667, ans=0.025 2024-09-16 00:34:34,845 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2024-09-16 00:35:24,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=366086.5, ans=0.125 2024-09-16 00:35:36,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=366086.5, ans=0.0 2024-09-16 00:35:37,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=366086.5, ans=0.125 2024-09-16 00:35:39,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2024-09-16 00:35:40,585 INFO [train.py:1198] (0/2) Epoch 21, batch 1400, loss[loss=0.2718, ctc_loss=0.189, cr_loss=0.4142, over 20845.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1604, cr_loss=0.3804, over 4078198.53 frames. ], batch size: 65, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:35:45,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=366114.8333333333, ans=0.0 2024-09-16 00:36:03,537 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.049e+02 2.185e+02 2.335e+02 3.471e+02, threshold=4.370e+02, percent-clipped=0.0 2024-09-16 00:36:34,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=366199.8333333333, ans=10.0 2024-09-16 00:36:56,299 INFO [train.py:1198] (0/2) Epoch 21, batch 1450, loss[loss=0.2922, ctc_loss=0.2009, cr_loss=0.4566, over 19892.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1603, cr_loss=0.3807, over 4084493.84 frames. ], batch size: 80, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:37:19,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=366284.8333333333, ans=0.0 2024-09-16 00:37:31,881 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2024-09-16 00:38:12,294 INFO [train.py:1198] (0/2) Epoch 21, batch 1500, loss[loss=0.2735, ctc_loss=0.1895, cr_loss=0.4199, over 19197.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1611, cr_loss=0.3809, over 4079000.03 frames. ], batch size: 90, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:38:35,093 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.054e+02 2.218e+02 2.422e+02 4.010e+02, threshold=4.437e+02, percent-clipped=0.0 2024-09-16 00:38:59,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=366483.1666666667, ans=0.125 2024-09-16 00:39:23,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=366511.5, ans=10.0 2024-09-16 00:39:26,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=366539.8333333333, ans=0.125 2024-09-16 00:39:27,735 INFO [train.py:1198] (0/2) Epoch 21, batch 1550, loss[loss=0.275, ctc_loss=0.1896, cr_loss=0.4271, over 20865.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1613, cr_loss=0.3816, over 4067345.59 frames. ], batch size: 65, lr: 4.00e-03, grad_scale: 32.0 2024-09-16 00:40:34,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=366653.1666666667, ans=0.125 2024-09-16 00:40:40,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=366653.1666666667, ans=0.125 2024-09-16 00:40:41,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=366653.1666666667, ans=0.125 2024-09-16 00:40:43,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=366653.1666666667, ans=0.05 2024-09-16 00:40:48,712 INFO [train.py:1198] (0/2) Epoch 21, batch 1600, loss[loss=0.2382, ctc_loss=0.1652, cr_loss=0.365, over 20947.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1622, cr_loss=0.3832, over 4075939.66 frames. ], batch size: 60, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:40:52,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=366681.5, ans=0.125 2024-09-16 00:41:11,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.123e+02 2.260e+02 2.442e+02 6.542e+02, threshold=4.520e+02, percent-clipped=2.0 2024-09-16 00:41:48,215 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=22.5 2024-09-16 00:42:04,362 INFO [train.py:1198] (0/2) Epoch 21, batch 1650, loss[loss=0.2125, ctc_loss=0.1411, cr_loss=0.3566, over 20969.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1623, cr_loss=0.3828, over 4073513.00 frames. ], batch size: 50, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:42:24,695 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=22.5 2024-09-16 00:43:20,006 INFO [train.py:1198] (0/2) Epoch 21, batch 1700, loss[loss=0.2615, ctc_loss=0.1799, cr_loss=0.4077, over 18315.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1631, cr_loss=0.3841, over 4055308.94 frames. ], batch size: 108, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:43:44,392 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.129e+02 2.284e+02 2.469e+02 7.459e+02, threshold=4.568e+02, percent-clipped=1.0 2024-09-16 00:44:11,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367049.8333333333, ans=0.1 2024-09-16 00:44:35,305 INFO [train.py:1198] (0/2) Epoch 21, batch 1750, loss[loss=0.2696, ctc_loss=0.1851, cr_loss=0.4226, over 20111.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1626, cr_loss=0.3839, over 4050926.80 frames. ], batch size: 80, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:44:45,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=367106.5, ans=0.2 2024-09-16 00:44:52,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=367134.8333333333, ans=0.2 2024-09-16 00:44:56,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.88 vs. limit=15.0 2024-09-16 00:45:04,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=367163.1666666667, ans=0.04949747468305833 2024-09-16 00:45:15,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=367163.1666666667, ans=0.0 2024-09-16 00:45:57,283 INFO [train.py:1198] (0/2) Epoch 21, batch 1800, loss[loss=0.24, ctc_loss=0.1626, cr_loss=0.387, over 20861.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1628, cr_loss=0.3838, over 4043028.72 frames. ], batch size: 65, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:45:59,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=367248.1666666667, ans=0.125 2024-09-16 00:46:14,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2024-09-16 00:46:21,701 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.108e+02 2.242e+02 2.421e+02 3.650e+02, threshold=4.484e+02, percent-clipped=0.0 2024-09-16 00:46:23,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367276.5, ans=0.1 2024-09-16 00:47:09,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=367361.5, ans=0.2 2024-09-16 00:47:13,505 INFO [train.py:1198] (0/2) Epoch 21, batch 1850, loss[loss=0.2636, ctc_loss=0.1773, cr_loss=0.4314, over 20933.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1621, cr_loss=0.3827, over 4046186.21 frames. ], batch size: 60, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:47:13,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=367389.8333333333, ans=0.125 2024-09-16 00:47:27,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=367418.1666666667, ans=0.125 2024-09-16 00:47:27,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367418.1666666667, ans=0.1 2024-09-16 00:47:36,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367418.1666666667, ans=0.1 2024-09-16 00:47:44,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=367446.5, ans=0.125 2024-09-16 00:48:22,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=367503.1666666667, ans=0.125 2024-09-16 00:48:29,188 INFO [train.py:1198] (0/2) Epoch 21, batch 1900, loss[loss=0.2726, ctc_loss=0.1863, cr_loss=0.4313, over 20259.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.162, cr_loss=0.3825, over 4053262.51 frames. ], batch size: 74, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:48:35,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=367531.5, ans=0.1 2024-09-16 00:48:42,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-09-16 00:48:44,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367559.8333333333, ans=0.1 2024-09-16 00:48:53,586 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.057e+02 2.225e+02 2.414e+02 3.049e+02, threshold=4.449e+02, percent-clipped=0.0 2024-09-16 00:49:44,792 INFO [train.py:1198] (0/2) Epoch 21, batch 1950, loss[loss=0.206, ctc_loss=0.136, cr_loss=0.3503, over 20963.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1617, cr_loss=0.3828, over 4074676.30 frames. ], batch size: 55, lr: 3.99e-03, grad_scale: 16.0 2024-09-16 00:49:58,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=367701.5, ans=0.125 2024-09-16 00:51:01,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=367814.8333333333, ans=0.125 2024-09-16 00:51:02,806 INFO [train.py:1198] (0/2) Epoch 21, batch 2000, loss[loss=0.2314, ctc_loss=0.1571, cr_loss=0.3715, over 20886.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.162, cr_loss=0.3829, over 4076098.61 frames. ], batch size: 54, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:51:30,173 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.083e+02 2.267e+02 2.405e+02 5.239e+02, threshold=4.535e+02, percent-clipped=1.0 2024-09-16 00:51:48,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2024-09-16 00:51:54,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=367899.8333333333, ans=0.04949747468305833 2024-09-16 00:52:08,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=367928.1666666667, ans=0.09899494936611666 2024-09-16 00:52:21,476 INFO [train.py:1198] (0/2) Epoch 21, batch 2050, loss[loss=0.2018, ctc_loss=0.1348, cr_loss=0.3349, over 20962.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1618, cr_loss=0.3827, over 4057073.48 frames. ], batch size: 50, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:52:26,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=367956.5, ans=0.2 2024-09-16 00:53:21,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-16 00:53:35,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=368098.1666666667, ans=0.1 2024-09-16 00:53:36,508 INFO [train.py:1198] (0/2) Epoch 21, batch 2100, loss[loss=0.3358, ctc_loss=0.2423, cr_loss=0.4677, over 14189.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1618, cr_loss=0.3827, over 4063687.06 frames. ], batch size: 149, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:53:36,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=368098.1666666667, ans=0.0 2024-09-16 00:53:44,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=368098.1666666667, ans=0.125 2024-09-16 00:54:00,453 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.150e+02 2.286e+02 2.511e+02 3.500e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-16 00:54:20,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=368183.1666666667, ans=0.1 2024-09-16 00:54:51,880 INFO [train.py:1198] (0/2) Epoch 21, batch 2150, loss[loss=0.2143, ctc_loss=0.1399, cr_loss=0.3722, over 20962.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1603, cr_loss=0.3808, over 4080609.32 frames. ], batch size: 48, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:55:11,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=368268.1666666667, ans=0.2 2024-09-16 00:55:18,381 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2024-09-16 00:55:27,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=368296.5, ans=0.125 2024-09-16 00:56:00,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=368353.1666666667, ans=0.025 2024-09-16 00:56:07,503 INFO [train.py:1198] (0/2) Epoch 21, batch 2200, loss[loss=0.3002, ctc_loss=0.2093, cr_loss=0.4548, over 17888.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.161, cr_loss=0.3823, over 4087406.46 frames. ], batch size: 108, lr: 3.99e-03, grad_scale: 32.0 2024-09-16 00:56:10,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=368381.5, ans=0.125 2024-09-16 00:56:34,945 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.709e+02 2.062e+02 2.210e+02 2.347e+02 4.367e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-16 00:56:35,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=22.5 2024-09-16 00:56:42,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=368438.1666666667, ans=0.2 2024-09-16 00:57:02,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=368466.5, ans=0.125 2024-09-16 00:57:15,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=368494.8333333333, ans=0.95 2024-09-16 00:57:29,044 INFO [train.py:1198] (0/2) Epoch 21, batch 2250, loss[loss=0.241, ctc_loss=0.167, cr_loss=0.3701, over 20950.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1603, cr_loss=0.3811, over 4096839.51 frames. ], batch size: 58, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 00:57:32,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=368523.1666666667, ans=0.09899494936611666 2024-09-16 00:57:44,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=368551.5, ans=0.0 2024-09-16 00:57:47,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=368551.5, ans=0.025 2024-09-16 00:57:51,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=368551.5, ans=0.0 2024-09-16 00:58:15,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=368608.1666666667, ans=0.1 2024-09-16 00:58:24,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-16 00:58:44,592 INFO [train.py:1198] (0/2) Epoch 21, batch 2300, loss[loss=0.2209, ctc_loss=0.1486, cr_loss=0.3613, over 21071.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1593, cr_loss=0.3795, over 4105201.90 frames. ], batch size: 53, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 00:58:56,966 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 00:58:58,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=368693.1666666667, ans=0.125 2024-09-16 00:58:59,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=368693.1666666667, ans=0.125 2024-09-16 00:59:08,513 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.066e+02 2.193e+02 2.339e+02 3.928e+02, threshold=4.386e+02, percent-clipped=0.0 2024-09-16 00:59:23,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=368721.5, ans=0.0 2024-09-16 00:59:38,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=368749.8333333333, ans=0.125 2024-09-16 00:59:43,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=368778.1666666667, ans=0.125 2024-09-16 00:59:52,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=368778.1666666667, ans=0.125 2024-09-16 00:59:55,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-09-16 00:59:58,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=368806.5, ans=0.125 2024-09-16 00:59:59,389 INFO [train.py:1198] (0/2) Epoch 21, batch 2350, loss[loss=0.2271, ctc_loss=0.1524, cr_loss=0.3735, over 21065.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1597, cr_loss=0.3807, over 4107852.90 frames. ], batch size: 56, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:00:22,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=368834.8333333333, ans=0.125 2024-09-16 01:00:46,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=368891.5, ans=0.2 2024-09-16 01:00:47,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=368891.5, ans=0.0 2024-09-16 01:01:14,711 INFO [train.py:1198] (0/2) Epoch 21, batch 2400, loss[loss=0.2612, ctc_loss=0.1771, cr_loss=0.4207, over 20669.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1596, cr_loss=0.3799, over 4099628.65 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:01:22,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=368948.1666666667, ans=0.125 2024-09-16 01:01:39,029 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.007e+02 2.140e+02 2.274e+02 3.323e+02, threshold=4.280e+02, percent-clipped=0.0 2024-09-16 01:02:00,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=369033.1666666667, ans=0.125 2024-09-16 01:02:37,014 INFO [train.py:1198] (0/2) Epoch 21, batch 2450, loss[loss=0.2426, ctc_loss=0.1664, cr_loss=0.3811, over 20838.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1599, cr_loss=0.3813, over 4100878.12 frames. ], batch size: 59, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:02:38,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=369089.8333333333, ans=0.125 2024-09-16 01:03:11,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=369146.5, ans=0.025 2024-09-16 01:03:21,729 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2024-09-16 01:03:21,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2024-09-16 01:03:52,627 INFO [train.py:1198] (0/2) Epoch 21, batch 2500, loss[loss=0.1931, ctc_loss=0.1255, cr_loss=0.3378, over 21056.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.161, cr_loss=0.3824, over 4089427.57 frames. ], batch size: 53, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:03:55,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369231.5, ans=0.1 2024-09-16 01:04:10,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=369259.8333333333, ans=0.125 2024-09-16 01:04:12,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=22.5 2024-09-16 01:04:16,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.033e+02 2.204e+02 2.375e+02 3.703e+02, threshold=4.408e+02, percent-clipped=0.0 2024-09-16 01:04:19,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=369259.8333333333, ans=0.05 2024-09-16 01:04:44,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=369316.5, ans=0.125 2024-09-16 01:04:44,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=369316.5, ans=0.125 2024-09-16 01:05:08,620 INFO [train.py:1198] (0/2) Epoch 21, batch 2550, loss[loss=0.2265, ctc_loss=0.1506, cr_loss=0.3799, over 20954.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1602, cr_loss=0.3809, over 4094958.68 frames. ], batch size: 50, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:06:09,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=369486.5, ans=0.0 2024-09-16 01:06:17,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=369486.5, ans=0.125 2024-09-16 01:06:24,354 INFO [train.py:1198] (0/2) Epoch 21, batch 2600, loss[loss=0.2492, ctc_loss=0.1683, cr_loss=0.4045, over 21019.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1611, cr_loss=0.3825, over 4093120.47 frames. ], batch size: 63, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:06:37,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=369543.1666666667, ans=0.125 2024-09-16 01:06:47,904 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.089e+02 2.212e+02 2.406e+02 3.912e+02, threshold=4.424e+02, percent-clipped=0.0 2024-09-16 01:06:54,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.60 vs. limit=22.5 2024-09-16 01:07:07,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=369599.8333333333, ans=0.2 2024-09-16 01:07:26,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=369628.1666666667, ans=0.125 2024-09-16 01:07:39,561 INFO [train.py:1198] (0/2) Epoch 21, batch 2650, loss[loss=0.2574, ctc_loss=0.1711, cr_loss=0.4317, over 20827.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1602, cr_loss=0.3813, over 4108835.15 frames. ], batch size: 59, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:07:47,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=369656.5, ans=0.125 2024-09-16 01:07:57,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=369684.8333333333, ans=0.0 2024-09-16 01:08:41,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=369741.5, ans=0.1 2024-09-16 01:08:50,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=369769.8333333333, ans=0.2 2024-09-16 01:08:50,821 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=22.5 2024-09-16 01:08:58,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=369769.8333333333, ans=0.125 2024-09-16 01:09:00,904 INFO [train.py:1198] (0/2) Epoch 21, batch 2700, loss[loss=0.255, ctc_loss=0.1747, cr_loss=0.4011, over 20977.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1606, cr_loss=0.3813, over 4103233.13 frames. ], batch size: 64, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:09:24,797 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.104e+02 2.249e+02 2.407e+02 3.415e+02, threshold=4.497e+02, percent-clipped=0.0 2024-09-16 01:09:38,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=369854.8333333333, ans=0.0 2024-09-16 01:09:39,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=369854.8333333333, ans=0.0 2024-09-16 01:09:45,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=369883.1666666667, ans=0.0 2024-09-16 01:09:50,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=369883.1666666667, ans=0.125 2024-09-16 01:09:50,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2024-09-16 01:10:16,042 INFO [train.py:1198] (0/2) Epoch 21, batch 2750, loss[loss=0.2142, ctc_loss=0.1409, cr_loss=0.3668, over 19849.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1608, cr_loss=0.3817, over 4090553.33 frames. ], batch size: 44, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:10:41,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-16 01:11:10,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=370024.8333333333, ans=0.125 2024-09-16 01:11:15,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-16 01:11:18,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=370053.1666666667, ans=0.0 2024-09-16 01:11:28,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=370053.1666666667, ans=0.2 2024-09-16 01:11:30,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=370081.5, ans=0.04949747468305833 2024-09-16 01:11:31,801 INFO [train.py:1198] (0/2) Epoch 21, batch 2800, loss[loss=0.1947, ctc_loss=0.1284, cr_loss=0.3314, over 21067.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1611, cr_loss=0.3823, over 4080804.09 frames. ], batch size: 53, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:11:56,066 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.021e+02 2.135e+02 2.361e+02 3.966e+02, threshold=4.271e+02, percent-clipped=0.0 2024-09-16 01:12:39,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=370194.8333333333, ans=0.125 2024-09-16 01:12:46,729 INFO [train.py:1198] (0/2) Epoch 21, batch 2850, loss[loss=0.2085, ctc_loss=0.1357, cr_loss=0.364, over 20975.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1595, cr_loss=0.3799, over 4091777.53 frames. ], batch size: 48, lr: 3.98e-03, grad_scale: 32.0 2024-09-16 01:13:14,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=370251.5, ans=0.07 2024-09-16 01:13:21,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=370279.8333333333, ans=0.0 2024-09-16 01:14:08,453 INFO [train.py:1198] (0/2) Epoch 21, batch 2900, loss[loss=0.2122, ctc_loss=0.1434, cr_loss=0.3442, over 19814.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1594, cr_loss=0.3809, over 4100965.47 frames. ], batch size: 44, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:14:19,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=370364.8333333333, ans=0.0 2024-09-16 01:14:33,688 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.111e+02 2.267e+02 2.480e+02 4.240e+02, threshold=4.534e+02, percent-clipped=0.0 2024-09-16 01:15:07,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=370478.1666666667, ans=0.2 2024-09-16 01:15:23,498 INFO [train.py:1198] (0/2) Epoch 21, batch 2950, loss[loss=0.2544, ctc_loss=0.1703, cr_loss=0.4206, over 20873.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1596, cr_loss=0.3813, over 4103612.78 frames. ], batch size: 57, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:15:32,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=370506.5, ans=0.07 2024-09-16 01:16:02,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-16 01:16:39,250 INFO [train.py:1198] (0/2) Epoch 21, batch 3000, loss[loss=0.2353, ctc_loss=0.1592, cr_loss=0.3804, over 20936.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1603, cr_loss=0.3821, over 4099081.09 frames. ], batch size: 60, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:16:39,251 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 01:16:58,836 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.1031, 5.7970, 5.5895, 5.1780], device='cuda:0') 2024-09-16 01:17:03,252 INFO [train.py:1230] (0/2) Epoch 21, validation: loss=0.04362, ctc_loss=0.04362, cr_loss=1.096e-14, over 944034.00 frames. 2024-09-16 01:17:03,253 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 01:17:03,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=370648.1666666667, ans=0.0 2024-09-16 01:17:05,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=370648.1666666667, ans=0.125 2024-09-16 01:17:25,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=370676.5, ans=0.0 2024-09-16 01:17:29,263 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.079e+02 2.219e+02 2.383e+02 3.230e+02, threshold=4.438e+02, percent-clipped=0.0 2024-09-16 01:17:36,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=370704.8333333333, ans=0.0 2024-09-16 01:17:54,758 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2024-09-16 01:17:57,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=370733.1666666667, ans=0.125 2024-09-16 01:18:09,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=22.5 2024-09-16 01:18:19,935 INFO [train.py:1198] (0/2) Epoch 21, batch 3050, loss[loss=0.2343, ctc_loss=0.1562, cr_loss=0.3904, over 20887.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1605, cr_loss=0.3824, over 4091059.09 frames. ], batch size: 54, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:18:21,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=370789.8333333333, ans=0.125 2024-09-16 01:18:41,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2024-09-16 01:19:08,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=370874.8333333333, ans=0.07 2024-09-16 01:19:34,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=22.5 2024-09-16 01:19:41,114 INFO [train.py:1198] (0/2) Epoch 21, batch 3100, loss[loss=0.2189, ctc_loss=0.1491, cr_loss=0.3492, over 21049.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1597, cr_loss=0.3806, over 4101472.08 frames. ], batch size: 53, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:20:07,156 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.122e+02 2.256e+02 2.443e+02 3.813e+02, threshold=4.513e+02, percent-clipped=0.0 2024-09-16 01:20:39,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=371016.5, ans=0.0 2024-09-16 01:20:45,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371044.8333333333, ans=0.1 2024-09-16 01:20:52,933 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-09-16 01:20:56,728 INFO [train.py:1198] (0/2) Epoch 21, batch 3150, loss[loss=0.2314, ctc_loss=0.1542, cr_loss=0.3858, over 21015.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1592, cr_loss=0.3801, over 4107983.17 frames. ], batch size: 62, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:21:24,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=371101.5, ans=0.125 2024-09-16 01:21:58,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=371186.5, ans=0.0 2024-09-16 01:22:04,349 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:22:05,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=371186.5, ans=0.0 2024-09-16 01:22:08,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=371186.5, ans=0.125 2024-09-16 01:22:08,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=371186.5, ans=0.125 2024-09-16 01:22:10,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371186.5, ans=0.1 2024-09-16 01:22:12,963 INFO [train.py:1198] (0/2) Epoch 21, batch 3200, loss[loss=0.2822, ctc_loss=0.1956, cr_loss=0.433, over 19263.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1583, cr_loss=0.379, over 4113828.70 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 32.0 2024-09-16 01:22:38,301 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.057e+02 2.187e+02 2.378e+02 3.203e+02, threshold=4.375e+02, percent-clipped=0.0 2024-09-16 01:22:49,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=371271.5, ans=0.125 2024-09-16 01:23:04,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=371299.8333333333, ans=0.125 2024-09-16 01:23:28,176 INFO [train.py:1198] (0/2) Epoch 21, batch 3250, loss[loss=0.2944, ctc_loss=0.2041, cr_loss=0.4516, over 18331.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1588, cr_loss=0.3794, over 4106704.46 frames. ], batch size: 108, lr: 3.97e-03, grad_scale: 32.0 2024-09-16 01:23:31,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=371356.5, ans=0.0 2024-09-16 01:23:49,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=371384.8333333333, ans=0.125 2024-09-16 01:23:51,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-16 01:24:00,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.22 vs. limit=6.0 2024-09-16 01:24:23,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=371441.5, ans=0.125 2024-09-16 01:24:24,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=371441.5, ans=0.125 2024-09-16 01:24:30,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=371469.8333333333, ans=0.0 2024-09-16 01:24:43,651 INFO [train.py:1198] (0/2) Epoch 21, batch 3300, loss[loss=0.2772, ctc_loss=0.1916, cr_loss=0.4279, over 20963.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1596, cr_loss=0.38, over 4083205.34 frames. ], batch size: 64, lr: 3.97e-03, grad_scale: 32.0 2024-09-16 01:24:43,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=371498.1666666667, ans=0.2 2024-09-16 01:25:06,246 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:25:11,662 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.100e+02 2.271e+02 2.405e+02 4.346e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-16 01:25:48,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=371611.5, ans=0.0 2024-09-16 01:26:04,545 INFO [train.py:1198] (0/2) Epoch 21, batch 3350, loss[loss=0.2329, ctc_loss=0.1565, cr_loss=0.3819, over 21033.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.159, cr_loss=0.3794, over 4091630.03 frames. ], batch size: 62, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:26:04,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=371639.8333333333, ans=0.2 2024-09-16 01:26:29,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=371668.1666666667, ans=0.125 2024-09-16 01:26:52,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2024-09-16 01:27:19,803 INFO [train.py:1198] (0/2) Epoch 21, batch 3400, loss[loss=0.2613, ctc_loss=0.1767, cr_loss=0.4227, over 20865.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1595, cr_loss=0.3804, over 4095898.94 frames. ], batch size: 65, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:27:46,726 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.651e+02 2.026e+02 2.152e+02 2.295e+02 2.892e+02, threshold=4.304e+02, percent-clipped=0.0 2024-09-16 01:27:49,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.42 vs. limit=22.5 2024-09-16 01:28:13,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=371866.5, ans=0.0 2024-09-16 01:28:35,271 INFO [train.py:1198] (0/2) Epoch 21, batch 3450, loss[loss=0.265, ctc_loss=0.1771, cr_loss=0.4392, over 20974.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1591, cr_loss=0.38, over 4102533.93 frames. ], batch size: 67, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:28:46,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2024-09-16 01:28:49,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-16 01:29:05,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=371979.8333333333, ans=0.125 2024-09-16 01:29:13,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=371979.8333333333, ans=0.035 2024-09-16 01:29:22,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=372008.1666666667, ans=0.09899494936611666 2024-09-16 01:29:28,339 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:29:33,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.97 vs. limit=10.0 2024-09-16 01:29:50,205 INFO [train.py:1198] (0/2) Epoch 21, batch 3500, loss[loss=0.2473, ctc_loss=0.1692, cr_loss=0.3904, over 21065.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1598, cr_loss=0.3803, over 4100124.97 frames. ], batch size: 56, lr: 3.97e-03, grad_scale: 16.0 2024-09-16 01:29:52,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=22.5 2024-09-16 01:30:17,320 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.129e+02 2.358e+02 2.544e+02 4.972e+02, threshold=4.715e+02, percent-clipped=1.0 2024-09-16 01:31:11,896 INFO [train.py:1198] (0/2) Epoch 21, batch 3550, loss[loss=0.224, ctc_loss=0.1491, cr_loss=0.3746, over 21063.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1593, cr_loss=0.3797, over 4108669.12 frames. ], batch size: 56, lr: 3.96e-03, grad_scale: 16.0 2024-09-16 01:31:27,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=372234.8333333333, ans=0.125 2024-09-16 01:32:11,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372319.8333333333, ans=0.1 2024-09-16 01:32:14,444 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:32:27,592 INFO [train.py:1198] (0/2) Epoch 21, batch 3600, loss[loss=0.2414, ctc_loss=0.1668, cr_loss=0.3733, over 20356.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1593, cr_loss=0.3803, over 4118450.23 frames. ], batch size: 74, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:32:41,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=372376.5, ans=0.125 2024-09-16 01:32:54,266 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.070e+02 2.186e+02 2.346e+02 3.810e+02, threshold=4.372e+02, percent-clipped=0.0 2024-09-16 01:32:54,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=22.5 2024-09-16 01:33:12,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=372433.1666666667, ans=0.0 2024-09-16 01:33:42,614 INFO [train.py:1198] (0/2) Epoch 21, batch 3650, loss[loss=0.2507, ctc_loss=0.1716, cr_loss=0.3956, over 21027.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1599, cr_loss=0.3814, over 4108323.65 frames. ], batch size: 61, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:34:07,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=372518.1666666667, ans=0.025 2024-09-16 01:34:13,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=372546.5, ans=0.0 2024-09-16 01:34:58,127 INFO [train.py:1198] (0/2) Epoch 21, batch 3700, loss[loss=0.2638, ctc_loss=0.179, cr_loss=0.4242, over 20849.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1595, cr_loss=0.3807, over 4105027.60 frames. ], batch size: 65, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:35:25,493 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.065e+02 2.188e+02 2.433e+02 2.909e+02, threshold=4.375e+02, percent-clipped=0.0 2024-09-16 01:35:49,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=372716.5, ans=0.125 2024-09-16 01:36:07,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=372744.8333333333, ans=0.0 2024-09-16 01:36:09,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=372744.8333333333, ans=0.125 2024-09-16 01:36:16,454 INFO [train.py:1198] (0/2) Epoch 21, batch 3750, loss[loss=0.2466, ctc_loss=0.1696, cr_loss=0.3851, over 20960.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1592, cr_loss=0.3794, over 4109358.74 frames. ], batch size: 64, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:36:19,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=372773.1666666667, ans=0.2 2024-09-16 01:36:45,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=372801.5, ans=0.0 2024-09-16 01:37:03,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=372858.1666666667, ans=0.125 2024-09-16 01:37:09,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-16 01:37:10,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=372858.1666666667, ans=0.0 2024-09-16 01:37:16,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=372858.1666666667, ans=0.2 2024-09-16 01:37:16,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=372858.1666666667, ans=0.125 2024-09-16 01:37:22,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=372886.5, ans=0.125 2024-09-16 01:37:34,529 INFO [train.py:1198] (0/2) Epoch 21, batch 3800, loss[loss=0.2437, ctc_loss=0.1649, cr_loss=0.3942, over 20824.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1597, cr_loss=0.38, over 4093821.43 frames. ], batch size: 59, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:37:48,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=372943.1666666667, ans=0.0 2024-09-16 01:38:01,574 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.082e+02 2.245e+02 2.424e+02 3.263e+02, threshold=4.489e+02, percent-clipped=0.0 2024-09-16 01:38:31,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=372999.8333333333, ans=0.125 2024-09-16 01:38:49,244 INFO [train.py:1198] (0/2) Epoch 21, batch 3850, loss[loss=0.2192, ctc_loss=0.1485, cr_loss=0.3536, over 20990.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1601, cr_loss=0.3806, over 4086864.23 frames. ], batch size: 58, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:39:06,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=373084.8333333333, ans=0.0 2024-09-16 01:39:17,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=373084.8333333333, ans=0.1 2024-09-16 01:39:18,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=373113.1666666667, ans=0.025 2024-09-16 01:39:35,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=373141.5, ans=0.0 2024-09-16 01:39:51,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=373169.8333333333, ans=0.125 2024-09-16 01:40:04,852 INFO [train.py:1198] (0/2) Epoch 21, batch 3900, loss[loss=0.2691, ctc_loss=0.1854, cr_loss=0.4185, over 21031.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1594, cr_loss=0.3795, over 4090482.12 frames. ], batch size: 63, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:40:17,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=373198.1666666667, ans=0.025 2024-09-16 01:40:32,002 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.094e+02 2.177e+02 2.401e+02 3.013e+02, threshold=4.355e+02, percent-clipped=0.0 2024-09-16 01:40:41,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=373254.8333333333, ans=0.125 2024-09-16 01:40:48,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=373283.1666666667, ans=0.125 2024-09-16 01:40:49,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=373283.1666666667, ans=0.125 2024-09-16 01:40:54,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=373283.1666666667, ans=0.0 2024-09-16 01:40:56,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=373283.1666666667, ans=0.125 2024-09-16 01:40:56,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=22.5 2024-09-16 01:41:00,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=373283.1666666667, ans=0.0 2024-09-16 01:41:20,311 INFO [train.py:1198] (0/2) Epoch 21, batch 3950, loss[loss=0.202, ctc_loss=0.1323, cr_loss=0.3485, over 20974.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1589, cr_loss=0.3799, over 4097419.80 frames. ], batch size: 51, lr: 3.96e-03, grad_scale: 16.0 2024-09-16 01:41:29,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=373339.8333333333, ans=0.125 2024-09-16 01:41:37,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-16 01:41:38,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=373368.1666666667, ans=0.0 2024-09-16 01:41:40,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=373368.1666666667, ans=0.2 2024-09-16 01:42:01,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=373396.5, ans=0.2 2024-09-16 01:42:34,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=373453.1666666667, ans=0.0 2024-09-16 01:42:41,948 INFO [train.py:1198] (0/2) Epoch 21, batch 4000, loss[loss=0.2789, ctc_loss=0.1903, cr_loss=0.4428, over 18273.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1602, cr_loss=0.3825, over 4094872.99 frames. ], batch size: 108, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:42:42,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=373481.5, ans=0.0 2024-09-16 01:42:51,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=373481.5, ans=0.125 2024-09-16 01:42:54,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=373481.5, ans=0.0 2024-09-16 01:42:58,937 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:43:10,956 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.173e+02 2.321e+02 2.558e+02 5.399e+02, threshold=4.642e+02, percent-clipped=1.0 2024-09-16 01:43:53,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=373594.8333333333, ans=0.125 2024-09-16 01:43:57,953 INFO [train.py:1198] (0/2) Epoch 21, batch 4050, loss[loss=0.2549, ctc_loss=0.1739, cr_loss=0.4048, over 20728.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1594, cr_loss=0.381, over 4099169.05 frames. ], batch size: 71, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:44:02,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=373623.1666666667, ans=0.125 2024-09-16 01:44:04,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=373623.1666666667, ans=0.025 2024-09-16 01:44:27,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=373679.8333333333, ans=0.125 2024-09-16 01:44:36,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=373679.8333333333, ans=0.2 2024-09-16 01:45:13,533 INFO [train.py:1198] (0/2) Epoch 21, batch 4100, loss[loss=0.2758, ctc_loss=0.1937, cr_loss=0.4109, over 19565.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1597, cr_loss=0.3812, over 4107430.28 frames. ], batch size: 90, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:45:19,912 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:45:30,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=373793.1666666667, ans=0.0 2024-09-16 01:45:42,209 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.049e+02 2.219e+02 2.381e+02 3.056e+02, threshold=4.438e+02, percent-clipped=0.0 2024-09-16 01:45:59,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=373849.8333333333, ans=10.0 2024-09-16 01:46:19,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=373878.1666666667, ans=0.125 2024-09-16 01:46:29,365 INFO [train.py:1198] (0/2) Epoch 21, batch 4150, loss[loss=0.2277, ctc_loss=0.1511, cr_loss=0.3829, over 20880.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1598, cr_loss=0.3811, over 4106286.76 frames. ], batch size: 57, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:46:37,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=373906.5, ans=0.125 2024-09-16 01:46:45,393 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-16 01:46:52,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=373934.8333333333, ans=0.125 2024-09-16 01:47:18,181 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-132000.pt 2024-09-16 01:47:49,047 INFO [train.py:1198] (0/2) Epoch 21, batch 4200, loss[loss=0.2889, ctc_loss=0.2067, cr_loss=0.4109, over 14564.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1593, cr_loss=0.3802, over 4097055.61 frames. ], batch size: 150, lr: 3.96e-03, grad_scale: 32.0 2024-09-16 01:47:55,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=374048.1666666667, ans=0.125 2024-09-16 01:48:01,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=374048.1666666667, ans=0.0 2024-09-16 01:48:20,536 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.113e+02 2.269e+02 2.405e+02 6.091e+02, threshold=4.539e+02, percent-clipped=1.0 2024-09-16 01:48:51,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=374161.5, ans=0.025 2024-09-16 01:48:51,686 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2024-09-16 01:49:07,659 INFO [train.py:1198] (0/2) Epoch 21, batch 4250, loss[loss=0.2293, ctc_loss=0.1556, cr_loss=0.3687, over 21043.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1592, cr_loss=0.38, over 4096451.47 frames. ], batch size: 63, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:49:28,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374218.1666666667, ans=0.1 2024-09-16 01:49:43,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=374246.5, ans=0.025 2024-09-16 01:49:57,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=374274.8333333333, ans=0.125 2024-09-16 01:50:04,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.67 vs. limit=22.5 2024-09-16 01:50:14,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-09-16 01:50:20,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=374303.1666666667, ans=0.125 2024-09-16 01:50:22,974 INFO [train.py:1198] (0/2) Epoch 21, batch 4300, loss[loss=0.252, ctc_loss=0.1697, cr_loss=0.4111, over 19546.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.159, cr_loss=0.3797, over 4095051.19 frames. ], batch size: 90, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:50:39,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=374359.8333333333, ans=0.125 2024-09-16 01:50:42,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=374359.8333333333, ans=0.125 2024-09-16 01:50:47,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=374359.8333333333, ans=0.2 2024-09-16 01:50:48,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=374359.8333333333, ans=0.125 2024-09-16 01:50:51,346 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.136e+02 2.245e+02 2.437e+02 3.095e+02, threshold=4.490e+02, percent-clipped=0.0 2024-09-16 01:51:35,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=374444.8333333333, ans=0.125 2024-09-16 01:51:38,712 INFO [train.py:1198] (0/2) Epoch 21, batch 4350, loss[loss=0.2248, ctc_loss=0.15, cr_loss=0.3742, over 20994.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1581, cr_loss=0.378, over 4098919.06 frames. ], batch size: 55, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:52:06,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=374501.5, ans=0.0 2024-09-16 01:52:32,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=374558.1666666667, ans=0.0 2024-09-16 01:52:35,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=374558.1666666667, ans=0.2 2024-09-16 01:52:54,633 INFO [train.py:1198] (0/2) Epoch 21, batch 4400, loss[loss=0.2388, ctc_loss=0.1624, cr_loss=0.3819, over 20938.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1578, cr_loss=0.3785, over 4110078.74 frames. ], batch size: 60, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:52:55,396 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2024-09-16 01:53:17,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=374643.1666666667, ans=0.2 2024-09-16 01:53:24,965 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.036e+02 2.162e+02 2.306e+02 4.854e+02, threshold=4.325e+02, percent-clipped=1.0 2024-09-16 01:53:37,883 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-09-16 01:53:47,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=374699.8333333333, ans=0.125 2024-09-16 01:53:52,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=374699.8333333333, ans=0.0 2024-09-16 01:53:52,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=374699.8333333333, ans=0.125 2024-09-16 01:53:58,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374728.1666666667, ans=0.1 2024-09-16 01:54:05,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2024-09-16 01:54:05,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2024-09-16 01:54:09,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2024-09-16 01:54:16,275 INFO [train.py:1198] (0/2) Epoch 21, batch 4450, loss[loss=0.2367, ctc_loss=0.1588, cr_loss=0.3892, over 20851.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1578, cr_loss=0.3784, over 4106525.15 frames. ], batch size: 59, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:54:17,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=374756.5, ans=0.025 2024-09-16 01:54:35,371 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.76 vs. limit=15.0 2024-09-16 01:54:51,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=374813.1666666667, ans=0.125 2024-09-16 01:55:31,278 INFO [train.py:1198] (0/2) Epoch 21, batch 4500, loss[loss=0.2097, ctc_loss=0.1393, cr_loss=0.3521, over 19955.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1581, cr_loss=0.3788, over 4103367.70 frames. ], batch size: 44, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:55:42,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2024-09-16 01:55:48,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=374926.5, ans=0.125 2024-09-16 01:56:01,459 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.082e+02 2.260e+02 2.396e+02 3.171e+02, threshold=4.519e+02, percent-clipped=0.0 2024-09-16 01:56:16,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=374983.1666666667, ans=0.0 2024-09-16 01:56:36,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=375011.5, ans=0.5 2024-09-16 01:56:46,426 INFO [train.py:1198] (0/2) Epoch 21, batch 4550, loss[loss=0.209, ctc_loss=0.1399, cr_loss=0.3452, over 20976.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1598, cr_loss=0.3814, over 4080904.67 frames. ], batch size: 51, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:56:46,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=375039.8333333333, ans=0.125 2024-09-16 01:57:01,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=375068.1666666667, ans=0.125 2024-09-16 01:57:20,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2024-09-16 01:57:24,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=375096.5, ans=0.0 2024-09-16 01:57:36,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=375124.8333333333, ans=0.0 2024-09-16 01:57:38,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=375124.8333333333, ans=0.125 2024-09-16 01:57:50,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=375153.1666666667, ans=0.125 2024-09-16 01:57:55,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=375153.1666666667, ans=0.2 2024-09-16 01:58:02,360 INFO [train.py:1198] (0/2) Epoch 21, batch 4600, loss[loss=0.2196, ctc_loss=0.1506, cr_loss=0.3451, over 21045.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1591, cr_loss=0.3797, over 4076452.23 frames. ], batch size: 56, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:58:07,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=375181.5, ans=0.015 2024-09-16 01:58:20,867 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 01:58:32,430 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.086e+02 2.249e+02 2.488e+02 4.137e+02, threshold=4.498e+02, percent-clipped=0.0 2024-09-16 01:58:38,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=375238.1666666667, ans=0.125 2024-09-16 01:58:40,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2024-09-16 01:59:20,562 INFO [train.py:1198] (0/2) Epoch 21, batch 4650, loss[loss=0.2185, ctc_loss=0.1473, cr_loss=0.3561, over 20785.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1593, cr_loss=0.3798, over 4091734.66 frames. ], batch size: 53, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 01:59:52,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.99 vs. limit=10.0 2024-09-16 01:59:55,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=375379.8333333333, ans=0.125 2024-09-16 01:59:57,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-16 02:00:08,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=375408.1666666667, ans=0.125 2024-09-16 02:00:17,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=375408.1666666667, ans=0.125 2024-09-16 02:00:23,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=375436.5, ans=0.2 2024-09-16 02:00:38,527 INFO [train.py:1198] (0/2) Epoch 21, batch 4700, loss[loss=0.2178, ctc_loss=0.1462, cr_loss=0.3581, over 21053.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1584, cr_loss=0.3786, over 4096886.30 frames. ], batch size: 56, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 02:00:51,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=375464.8333333333, ans=0.04949747468305833 2024-09-16 02:00:55,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=375493.1666666667, ans=0.125 2024-09-16 02:01:01,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375493.1666666667, ans=0.0 2024-09-16 02:01:08,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.048e+02 2.187e+02 2.365e+02 3.837e+02, threshold=4.375e+02, percent-clipped=0.0 2024-09-16 02:01:16,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=375521.5, ans=0.125 2024-09-16 02:01:34,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375549.8333333333, ans=0.1 2024-09-16 02:01:40,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=375578.1666666667, ans=0.0 2024-09-16 02:01:54,212 INFO [train.py:1198] (0/2) Epoch 21, batch 4750, loss[loss=0.2683, ctc_loss=0.1839, cr_loss=0.4219, over 20960.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1591, cr_loss=0.3795, over 4089853.52 frames. ], batch size: 58, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 02:02:18,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=375634.8333333333, ans=0.125 2024-09-16 02:02:41,219 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 02:03:09,683 INFO [train.py:1198] (0/2) Epoch 21, batch 4800, loss[loss=0.2605, ctc_loss=0.1791, cr_loss=0.4069, over 19931.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1596, cr_loss=0.3804, over 4101043.96 frames. ], batch size: 80, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 02:03:23,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375776.5, ans=0.1 2024-09-16 02:03:35,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=375776.5, ans=10.0 2024-09-16 02:03:40,599 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.054e+02 2.212e+02 2.381e+02 6.295e+02, threshold=4.424e+02, percent-clipped=1.0 2024-09-16 02:03:49,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=375804.8333333333, ans=0.0 2024-09-16 02:03:51,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=375804.8333333333, ans=0.0 2024-09-16 02:04:03,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=375833.1666666667, ans=0.025 2024-09-16 02:04:11,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2024-09-16 02:04:14,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=375861.5, ans=0.0 2024-09-16 02:04:22,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=375889.8333333333, ans=0.2 2024-09-16 02:04:24,108 INFO [train.py:1198] (0/2) Epoch 21, batch 4850, loss[loss=0.2292, ctc_loss=0.1527, cr_loss=0.3827, over 21031.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1603, cr_loss=0.3817, over 4106387.80 frames. ], batch size: 56, lr: 3.95e-03, grad_scale: 32.0 2024-09-16 02:05:01,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375946.5, ans=0.1 2024-09-16 02:05:41,396 INFO [train.py:1198] (0/2) Epoch 21, batch 4900, loss[loss=0.2285, ctc_loss=0.1557, cr_loss=0.3638, over 20870.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1609, cr_loss=0.3821, over 4106734.64 frames. ], batch size: 57, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:06:15,974 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.137e+02 2.284e+02 2.465e+02 3.194e+02, threshold=4.568e+02, percent-clipped=0.0 2024-09-16 02:06:35,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=376116.5, ans=0.2 2024-09-16 02:06:45,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=376144.8333333333, ans=0.125 2024-09-16 02:06:59,012 INFO [train.py:1198] (0/2) Epoch 21, batch 4950, loss[loss=0.2089, ctc_loss=0.1404, cr_loss=0.3426, over 20993.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.161, cr_loss=0.3823, over 4103759.77 frames. ], batch size: 52, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:07:27,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=376229.8333333333, ans=0.125 2024-09-16 02:07:33,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=376229.8333333333, ans=0.1 2024-09-16 02:07:54,135 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 02:08:13,245 INFO [train.py:1198] (0/2) Epoch 21, batch 5000, loss[loss=0.3191, ctc_loss=0.2323, cr_loss=0.4338, over 14205.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1616, cr_loss=0.3823, over 4073037.00 frames. ], batch size: 150, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:08:17,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=376314.8333333333, ans=0.125 2024-09-16 02:08:36,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=376343.1666666667, ans=0.2 2024-09-16 02:08:44,844 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.126e+02 2.282e+02 2.513e+02 6.959e+02, threshold=4.564e+02, percent-clipped=1.0 2024-09-16 02:08:44,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=376371.5, ans=0.125 2024-09-16 02:09:26,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=376456.5, ans=0.1 2024-09-16 02:09:27,737 INFO [train.py:1198] (0/2) Epoch 21, batch 5050, loss[loss=0.2645, ctc_loss=0.1806, cr_loss=0.4198, over 20014.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1611, cr_loss=0.3819, over 4070366.27 frames. ], batch size: 80, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:09:32,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=376456.5, ans=0.125 2024-09-16 02:10:30,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=376569.8333333333, ans=0.0 2024-09-16 02:10:42,612 INFO [train.py:1198] (0/2) Epoch 21, batch 5100, loss[loss=0.2084, ctc_loss=0.1406, cr_loss=0.3392, over 20953.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1599, cr_loss=0.3796, over 4054895.98 frames. ], batch size: 49, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:10:55,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-16 02:11:14,072 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.063e+02 2.284e+02 2.488e+02 3.450e+02, threshold=4.568e+02, percent-clipped=0.0 2024-09-16 02:11:22,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2024-09-16 02:11:31,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2024-09-16 02:11:54,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=376711.5, ans=0.125 2024-09-16 02:11:54,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=376711.5, ans=0.0 2024-09-16 02:11:57,226 INFO [train.py:1198] (0/2) Epoch 21, batch 5150, loss[loss=0.3227, ctc_loss=0.2371, cr_loss=0.4283, over 13736.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.16, cr_loss=0.3804, over 4065151.92 frames. ], batch size: 149, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:12:06,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=376739.8333333333, ans=0.0 2024-09-16 02:12:09,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=376739.8333333333, ans=0.125 2024-09-16 02:12:17,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-16 02:12:34,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=376796.5, ans=0.2 2024-09-16 02:12:48,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=376824.8333333333, ans=0.0 2024-09-16 02:12:54,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2024-09-16 02:12:55,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=376853.1666666667, ans=0.025 2024-09-16 02:13:11,491 INFO [train.py:1198] (0/2) Epoch 21, batch 5200, loss[loss=0.2394, ctc_loss=0.1647, cr_loss=0.3738, over 21044.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1593, cr_loss=0.3805, over 4083008.21 frames. ], batch size: 62, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:13:42,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.055e+02 2.176e+02 2.401e+02 3.523e+02, threshold=4.352e+02, percent-clipped=0.0 2024-09-16 02:14:12,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=376994.8333333333, ans=0.125 2024-09-16 02:14:25,609 INFO [train.py:1198] (0/2) Epoch 21, batch 5250, loss[loss=0.2063, ctc_loss=0.1362, cr_loss=0.3504, over 20965.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1603, cr_loss=0.3815, over 4085258.88 frames. ], batch size: 48, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:14:33,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=22.5 2024-09-16 02:14:49,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=377051.5, ans=0.125 2024-09-16 02:15:18,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=15.0 2024-09-16 02:15:42,015 INFO [train.py:1198] (0/2) Epoch 21, batch 5300, loss[loss=0.1919, ctc_loss=0.1256, cr_loss=0.3317, over 20994.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1597, cr_loss=0.3807, over 4081099.01 frames. ], batch size: 48, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:15:46,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=377164.8333333333, ans=0.0 2024-09-16 02:15:55,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=377193.1666666667, ans=0.0 2024-09-16 02:15:55,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-09-16 02:16:15,301 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.107e+02 2.202e+02 2.389e+02 4.329e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-16 02:16:20,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=377221.5, ans=0.125 2024-09-16 02:16:36,096 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.81 vs. limit=5.0 2024-09-16 02:16:58,956 INFO [train.py:1198] (0/2) Epoch 21, batch 5350, loss[loss=0.2727, ctc_loss=0.1902, cr_loss=0.4126, over 18066.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1594, cr_loss=0.3797, over 4086661.17 frames. ], batch size: 108, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:17:03,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=377306.5, ans=0.2 2024-09-16 02:17:03,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=377306.5, ans=0.025 2024-09-16 02:17:05,058 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 02:17:33,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=377363.1666666667, ans=0.025 2024-09-16 02:17:46,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=377391.5, ans=0.5 2024-09-16 02:17:58,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=377419.8333333333, ans=0.125 2024-09-16 02:18:07,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=377419.8333333333, ans=0.0 2024-09-16 02:18:13,443 INFO [train.py:1198] (0/2) Epoch 21, batch 5400, loss[loss=0.2517, ctc_loss=0.1725, cr_loss=0.3961, over 21022.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1595, cr_loss=0.3796, over 4090199.11 frames. ], batch size: 62, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:18:28,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=377476.5, ans=0.125 2024-09-16 02:18:39,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=377476.5, ans=0.025 2024-09-16 02:18:44,454 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.066e+02 2.231e+02 2.420e+02 4.440e+02, threshold=4.462e+02, percent-clipped=1.0 2024-09-16 02:19:02,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=377533.1666666667, ans=0.1 2024-09-16 02:19:25,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-09-16 02:19:27,995 INFO [train.py:1198] (0/2) Epoch 21, batch 5450, loss[loss=0.2435, ctc_loss=0.1653, cr_loss=0.391, over 19481.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1591, cr_loss=0.379, over 4096163.67 frames. ], batch size: 90, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:20:24,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.63 vs. limit=22.5 2024-09-16 02:20:27,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=377703.1666666667, ans=0.0 2024-09-16 02:20:28,677 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=12.0 2024-09-16 02:20:43,670 INFO [train.py:1198] (0/2) Epoch 21, batch 5500, loss[loss=0.2009, ctc_loss=0.1331, cr_loss=0.339, over 20951.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1589, cr_loss=0.3795, over 4104219.05 frames. ], batch size: 49, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:21:14,821 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.084e+02 2.255e+02 2.419e+02 3.239e+02, threshold=4.509e+02, percent-clipped=0.0 2024-09-16 02:21:57,887 INFO [train.py:1198] (0/2) Epoch 21, batch 5550, loss[loss=0.2713, ctc_loss=0.1929, cr_loss=0.3916, over 14482.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1596, cr_loss=0.3801, over 4100510.76 frames. ], batch size: 149, lr: 3.94e-03, grad_scale: 32.0 2024-09-16 02:22:42,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=377958.1666666667, ans=0.09899494936611666 2024-09-16 02:22:52,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=377958.1666666667, ans=0.125 2024-09-16 02:23:13,103 INFO [train.py:1198] (0/2) Epoch 21, batch 5600, loss[loss=0.2279, ctc_loss=0.1549, cr_loss=0.3649, over 20955.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.16, cr_loss=0.3809, over 4096016.06 frames. ], batch size: 58, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:23:19,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=378014.8333333333, ans=0.125 2024-09-16 02:23:22,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2024-09-16 02:23:44,361 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.060e+02 2.251e+02 2.461e+02 3.109e+02, threshold=4.503e+02, percent-clipped=0.0 2024-09-16 02:23:57,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=378099.8333333333, ans=0.125 2024-09-16 02:24:03,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=378099.8333333333, ans=0.125 2024-09-16 02:24:14,752 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-09-16 02:24:27,501 INFO [train.py:1198] (0/2) Epoch 21, batch 5650, loss[loss=0.2045, ctc_loss=0.1364, cr_loss=0.3405, over 20972.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1594, cr_loss=0.3796, over 4094070.01 frames. ], batch size: 48, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:24:40,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=378156.5, ans=0.0 2024-09-16 02:24:49,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=378184.8333333333, ans=0.125 2024-09-16 02:24:55,266 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 02:25:10,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=22.5 2024-09-16 02:25:46,044 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2024-09-16 02:25:46,499 INFO [train.py:1198] (0/2) Epoch 21, batch 5700, loss[loss=0.263, ctc_loss=0.1813, cr_loss=0.4084, over 20962.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.159, cr_loss=0.3791, over 4096041.99 frames. ], batch size: 64, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:26:19,190 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.067e+02 2.175e+02 2.427e+02 4.126e+02, threshold=4.349e+02, percent-clipped=0.0 2024-09-16 02:26:20,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=378354.8333333333, ans=0.025 2024-09-16 02:26:37,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=378383.1666666667, ans=0.125 2024-09-16 02:26:40,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=378383.1666666667, ans=0.125 2024-09-16 02:26:46,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=378411.5, ans=0.07 2024-09-16 02:27:00,824 INFO [train.py:1198] (0/2) Epoch 21, batch 5750, loss[loss=0.2237, ctc_loss=0.1521, cr_loss=0.3582, over 20967.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1587, cr_loss=0.3791, over 4097179.74 frames. ], batch size: 58, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:27:26,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=378468.1666666667, ans=0.125 2024-09-16 02:27:32,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=378496.5, ans=0.1 2024-09-16 02:27:47,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=378524.8333333333, ans=0.1 2024-09-16 02:27:50,498 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-09-16 02:27:54,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=378524.8333333333, ans=0.2 2024-09-16 02:28:04,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=378553.1666666667, ans=0.125 2024-09-16 02:28:14,711 INFO [train.py:1198] (0/2) Epoch 21, batch 5800, loss[loss=0.2197, ctc_loss=0.1484, cr_loss=0.3565, over 20985.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1594, cr_loss=0.3796, over 4081653.69 frames. ], batch size: 48, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:28:31,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=378609.8333333333, ans=0.125 2024-09-16 02:28:47,525 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.185e+02 2.427e+02 2.708e+02 9.486e+02, threshold=4.854e+02, percent-clipped=1.0 2024-09-16 02:28:50,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=378638.1666666667, ans=0.2 2024-09-16 02:29:28,535 INFO [train.py:1198] (0/2) Epoch 21, batch 5850, loss[loss=0.2088, ctc_loss=0.1412, cr_loss=0.3377, over 20949.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1595, cr_loss=0.3798, over 4075623.60 frames. ], batch size: 50, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:29:28,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=378723.1666666667, ans=0.125 2024-09-16 02:30:15,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=378808.1666666667, ans=0.0 2024-09-16 02:30:19,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=378808.1666666667, ans=0.125 2024-09-16 02:30:19,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=378808.1666666667, ans=0.04949747468305833 2024-09-16 02:30:37,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=378836.5, ans=0.0 2024-09-16 02:30:42,831 INFO [train.py:1198] (0/2) Epoch 21, batch 5900, loss[loss=0.226, ctc_loss=0.1542, cr_loss=0.359, over 20770.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1586, cr_loss=0.3781, over 4082220.14 frames. ], batch size: 56, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:31:02,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=22.5 2024-09-16 02:31:15,603 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.082e+02 2.243e+02 2.520e+02 3.993e+02, threshold=4.486e+02, percent-clipped=0.0 2024-09-16 02:31:30,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=378949.8333333333, ans=0.0 2024-09-16 02:31:44,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-16 02:31:57,365 INFO [train.py:1198] (0/2) Epoch 21, batch 5950, loss[loss=0.2812, ctc_loss=0.1965, cr_loss=0.4234, over 18315.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1589, cr_loss=0.3789, over 4088915.06 frames. ], batch size: 108, lr: 3.93e-03, grad_scale: 16.0 2024-09-16 02:32:30,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=379063.1666666667, ans=0.2 2024-09-16 02:32:34,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=379063.1666666667, ans=0.05 2024-09-16 02:33:01,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=379119.8333333333, ans=0.125 2024-09-16 02:33:03,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=379119.8333333333, ans=0.125 2024-09-16 02:33:11,648 INFO [train.py:1198] (0/2) Epoch 21, batch 6000, loss[loss=0.2535, ctc_loss=0.1684, cr_loss=0.4254, over 21018.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.159, cr_loss=0.3792, over 4073783.52 frames. ], batch size: 63, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:33:11,649 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 02:33:35,841 INFO [train.py:1230] (0/2) Epoch 21, validation: loss=0.04328, ctc_loss=0.04328, cr_loss=1.081e-14, over 944034.00 frames. 2024-09-16 02:33:35,842 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 02:34:01,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=379176.5, ans=0.0 2024-09-16 02:34:09,203 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.070e+02 2.217e+02 2.364e+02 3.560e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-16 02:34:50,895 INFO [train.py:1198] (0/2) Epoch 21, batch 6050, loss[loss=0.2203, ctc_loss=0.1509, cr_loss=0.347, over 20835.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1593, cr_loss=0.3802, over 4070990.84 frames. ], batch size: 59, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:34:52,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=379289.8333333333, ans=0.125 2024-09-16 02:35:12,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=379318.1666666667, ans=0.125 2024-09-16 02:35:17,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=379318.1666666667, ans=0.125 2024-09-16 02:35:38,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=15.0 2024-09-16 02:36:05,699 INFO [train.py:1198] (0/2) Epoch 21, batch 6100, loss[loss=0.258, ctc_loss=0.1738, cr_loss=0.4212, over 20666.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1605, cr_loss=0.3818, over 4067615.07 frames. ], batch size: 68, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:36:19,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-16 02:36:37,794 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.052e+02 2.178e+02 2.395e+02 3.200e+02, threshold=4.355e+02, percent-clipped=0.0 2024-09-16 02:36:49,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=379516.5, ans=0.2 2024-09-16 02:36:53,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=379516.5, ans=0.1 2024-09-16 02:37:10,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=379544.8333333333, ans=0.125 2024-09-16 02:37:16,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=379544.8333333333, ans=0.125 2024-09-16 02:37:19,233 INFO [train.py:1198] (0/2) Epoch 21, batch 6150, loss[loss=0.2603, ctc_loss=0.1784, cr_loss=0.4094, over 20960.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.161, cr_loss=0.3819, over 4056763.10 frames. ], batch size: 58, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:37:38,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=379601.5, ans=0.125 2024-09-16 02:38:23,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=379686.5, ans=0.1 2024-09-16 02:38:33,674 INFO [train.py:1198] (0/2) Epoch 21, batch 6200, loss[loss=0.1965, ctc_loss=0.1314, cr_loss=0.3258, over 20976.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1614, cr_loss=0.3823, over 4053548.97 frames. ], batch size: 52, lr: 3.93e-03, grad_scale: 32.0 2024-09-16 02:39:06,761 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.119e+02 2.225e+02 2.492e+02 3.167e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-16 02:39:41,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=379828.1666666667, ans=0.125 2024-09-16 02:39:44,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=379828.1666666667, ans=0.2 2024-09-16 02:39:48,925 INFO [train.py:1198] (0/2) Epoch 21, batch 6250, loss[loss=0.1987, ctc_loss=0.1334, cr_loss=0.3263, over 20303.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1611, cr_loss=0.3817, over 4044984.57 frames. ], batch size: 45, lr: 3.92e-03, grad_scale: 32.0 2024-09-16 02:40:32,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=379941.5, ans=0.125 2024-09-16 02:41:02,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=379998.1666666667, ans=0.0 2024-09-16 02:41:03,089 INFO [train.py:1198] (0/2) Epoch 21, batch 6300, loss[loss=0.2451, ctc_loss=0.169, cr_loss=0.3807, over 20691.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1636, cr_loss=0.3836, over 3965802.41 frames. ], batch size: 71, lr: 3.92e-03, grad_scale: 32.0 2024-09-16 02:41:36,042 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.198e+02 2.332e+02 2.541e+02 4.122e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-16 02:41:38,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.85 vs. limit=15.0 2024-09-16 02:42:08,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=380111.5, ans=0.125 2024-09-16 02:42:17,508 INFO [train.py:1198] (0/2) Epoch 21, batch 6350, loss[loss=0.2974, ctc_loss=0.2132, cr_loss=0.4212, over 15030.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1657, cr_loss=0.3857, over 3919308.03 frames. ], batch size: 150, lr: 3.92e-03, grad_scale: 32.0 2024-09-16 02:42:31,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=380168.1666666667, ans=0.2 2024-09-16 02:42:40,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.46 vs. limit=10.0 2024-09-16 02:43:17,026 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-21.pt 2024-09-16 02:44:03,433 INFO [train.py:1198] (0/2) Epoch 22, batch 0, loss[loss=0.2542, ctc_loss=0.1718, cr_loss=0.4117, over 20677.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1718, cr_loss=0.4117, over 20677.00 frames. ], batch size: 71, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:44:03,434 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 02:44:21,813 INFO [train.py:1230] (0/2) Epoch 22, validation: loss=0.04351, ctc_loss=0.04351, cr_loss=1.173e-14, over 944034.00 frames. 2024-09-16 02:44:21,813 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 02:44:22,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380256.0, ans=0.1 2024-09-16 02:44:49,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-16 02:44:56,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=380312.6666666667, ans=0.0 2024-09-16 02:45:11,273 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.163e+02 2.356e+02 2.596e+02 6.775e+02, threshold=4.712e+02, percent-clipped=1.0 2024-09-16 02:45:31,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=380369.3333333333, ans=0.0 2024-09-16 02:45:42,593 INFO [train.py:1198] (0/2) Epoch 22, batch 50, loss[loss=0.2622, ctc_loss=0.1789, cr_loss=0.4168, over 21033.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1606, cr_loss=0.3796, over 915525.85 frames. ], batch size: 62, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:45:50,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=380397.6666666667, ans=0.09899494936611666 2024-09-16 02:45:53,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-09-16 02:45:56,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=380426.0, ans=0.125 2024-09-16 02:46:12,937 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.138e-03 2024-09-16 02:46:20,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380454.3333333333, ans=0.1 2024-09-16 02:46:57,752 INFO [train.py:1198] (0/2) Epoch 22, batch 100, loss[loss=0.2348, ctc_loss=0.1586, cr_loss=0.3806, over 21006.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1608, cr_loss=0.382, over 1622755.23 frames. ], batch size: 63, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:47:13,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=380567.6666666667, ans=0.125 2024-09-16 02:47:28,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=380596.0, ans=0.125 2024-09-16 02:47:30,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=22.5 2024-09-16 02:47:44,613 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.058e+02 2.210e+02 2.381e+02 2.814e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-16 02:47:45,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=380624.3333333333, ans=0.125 2024-09-16 02:47:56,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=380652.6666666667, ans=0.0 2024-09-16 02:47:59,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=380652.6666666667, ans=0.0 2024-09-16 02:48:07,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=380652.6666666667, ans=0.125 2024-09-16 02:48:13,659 INFO [train.py:1198] (0/2) Epoch 22, batch 150, loss[loss=0.2432, ctc_loss=0.1639, cr_loss=0.3965, over 20967.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1592, cr_loss=0.3783, over 2166546.65 frames. ], batch size: 58, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:49:13,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=380794.3333333333, ans=0.125 2024-09-16 02:49:23,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-09-16 02:49:29,950 INFO [train.py:1198] (0/2) Epoch 22, batch 200, loss[loss=0.2193, ctc_loss=0.148, cr_loss=0.357, over 21021.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1584, cr_loss=0.378, over 2599031.56 frames. ], batch size: 61, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:50:13,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=380879.3333333333, ans=0.0 2024-09-16 02:50:20,301 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.068e+02 2.188e+02 2.417e+02 6.616e+02, threshold=4.376e+02, percent-clipped=1.0 2024-09-16 02:50:44,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=380936.0, ans=0.0 2024-09-16 02:50:49,265 INFO [train.py:1198] (0/2) Epoch 22, batch 250, loss[loss=0.258, ctc_loss=0.1762, cr_loss=0.4088, over 20691.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1593, cr_loss=0.38, over 2932554.72 frames. ], batch size: 71, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:50:51,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=380964.3333333333, ans=0.0 2024-09-16 02:51:09,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=380992.6666666667, ans=0.125 2024-09-16 02:51:46,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=381049.3333333333, ans=0.125 2024-09-16 02:51:58,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=381077.6666666667, ans=0.0 2024-09-16 02:52:07,671 INFO [train.py:1198] (0/2) Epoch 22, batch 300, loss[loss=0.2564, ctc_loss=0.1749, cr_loss=0.4075, over 21037.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1596, cr_loss=0.3798, over 3187114.86 frames. ], batch size: 63, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:52:08,904 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.73 vs. limit=8.0 2024-09-16 02:52:15,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=381106.0, ans=0.125 2024-09-16 02:52:19,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381106.0, ans=0.1 2024-09-16 02:52:27,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=381134.3333333333, ans=0.0 2024-09-16 02:52:42,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=381162.6666666667, ans=0.1 2024-09-16 02:52:53,917 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.113e+02 2.187e+02 2.404e+02 3.032e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-16 02:53:17,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2024-09-16 02:53:22,401 INFO [train.py:1198] (0/2) Epoch 22, batch 350, loss[loss=0.2704, ctc_loss=0.186, cr_loss=0.4222, over 20685.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1601, cr_loss=0.3809, over 3382905.69 frames. ], batch size: 71, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:53:40,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=381276.0, ans=0.125 2024-09-16 02:54:00,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=381304.3333333333, ans=0.2 2024-09-16 02:54:08,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381332.6666666667, ans=0.1 2024-09-16 02:54:24,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=381361.0, ans=0.125 2024-09-16 02:54:37,862 INFO [train.py:1198] (0/2) Epoch 22, batch 400, loss[loss=0.2377, ctc_loss=0.162, cr_loss=0.3783, over 20696.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.16, cr_loss=0.3812, over 3544131.74 frames. ], batch size: 71, lr: 3.83e-03, grad_scale: 32.0 2024-09-16 02:54:51,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=381417.6666666667, ans=0.0 2024-09-16 02:55:00,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381417.6666666667, ans=0.1 2024-09-16 02:55:05,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=381417.6666666667, ans=22.5 2024-09-16 02:55:13,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=381446.0, ans=0.125 2024-09-16 02:55:17,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=381446.0, ans=0.0 2024-09-16 02:55:20,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=381446.0, ans=0.0 2024-09-16 02:55:26,287 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.102e+02 2.216e+02 2.517e+02 3.356e+02, threshold=4.432e+02, percent-clipped=0.0 2024-09-16 02:55:29,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=381474.3333333333, ans=0.125 2024-09-16 02:55:42,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=381502.6666666667, ans=0.125 2024-09-16 02:55:46,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=381502.6666666667, ans=0.0 2024-09-16 02:55:55,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=381531.0, ans=0.125 2024-09-16 02:55:56,497 INFO [train.py:1198] (0/2) Epoch 22, batch 450, loss[loss=0.274, ctc_loss=0.1913, cr_loss=0.4132, over 18543.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1591, cr_loss=0.38, over 3675457.82 frames. ], batch size: 108, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 02:56:07,334 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 02:56:21,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=381559.3333333333, ans=0.2 2024-09-16 02:56:58,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=381644.3333333333, ans=0.125 2024-09-16 02:57:04,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=381644.3333333333, ans=0.125 2024-09-16 02:57:15,128 INFO [train.py:1198] (0/2) Epoch 22, batch 500, loss[loss=0.2186, ctc_loss=0.1493, cr_loss=0.3462, over 21071.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1594, cr_loss=0.3804, over 3769793.76 frames. ], batch size: 56, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 02:57:21,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=381672.6666666667, ans=0.125 2024-09-16 02:57:23,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=381672.6666666667, ans=0.125 2024-09-16 02:57:26,188 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 02:57:27,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=12.0 2024-09-16 02:58:03,587 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.101e+02 2.234e+02 2.387e+02 4.953e+02, threshold=4.468e+02, percent-clipped=1.0 2024-09-16 02:58:31,144 INFO [train.py:1198] (0/2) Epoch 22, batch 550, loss[loss=0.2928, ctc_loss=0.2064, cr_loss=0.4319, over 18448.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1596, cr_loss=0.3806, over 3843553.52 frames. ], batch size: 108, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 02:58:40,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=381814.3333333333, ans=0.0 2024-09-16 02:58:52,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381842.6666666667, ans=0.1 2024-09-16 02:58:55,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=381842.6666666667, ans=0.2 2024-09-16 02:59:39,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2024-09-16 02:59:46,296 INFO [train.py:1198] (0/2) Epoch 22, batch 600, loss[loss=0.2522, ctc_loss=0.1703, cr_loss=0.4094, over 20648.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1596, cr_loss=0.3806, over 3904931.52 frames. ], batch size: 71, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 03:00:34,523 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.066e+02 2.193e+02 2.282e+02 3.663e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-16 03:00:38,542 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-09-16 03:00:45,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=382069.3333333333, ans=0.0 2024-09-16 03:01:00,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=382097.6666666667, ans=0.0 2024-09-16 03:01:02,091 INFO [train.py:1198] (0/2) Epoch 22, batch 650, loss[loss=0.2168, ctc_loss=0.1453, cr_loss=0.3579, over 21080.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.159, cr_loss=0.3807, over 3952524.43 frames. ], batch size: 56, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 03:01:14,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=382097.6666666667, ans=0.0 2024-09-16 03:02:21,180 INFO [train.py:1198] (0/2) Epoch 22, batch 700, loss[loss=0.2513, ctc_loss=0.1688, cr_loss=0.4128, over 20978.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1584, cr_loss=0.3796, over 3983801.49 frames. ], batch size: 64, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 03:02:39,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=382267.6666666667, ans=0.0 2024-09-16 03:02:48,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2024-09-16 03:03:12,008 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.074e+02 2.249e+02 2.464e+02 3.829e+02, threshold=4.499e+02, percent-clipped=0.0 2024-09-16 03:03:28,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=382352.6666666667, ans=0.0 2024-09-16 03:03:39,050 INFO [train.py:1198] (0/2) Epoch 22, batch 750, loss[loss=0.2479, ctc_loss=0.1684, cr_loss=0.3971, over 21002.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1581, cr_loss=0.3797, over 4021971.76 frames. ], batch size: 58, lr: 3.82e-03, grad_scale: 16.0 2024-09-16 03:03:39,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382381.0, ans=0.1 2024-09-16 03:04:00,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=382409.3333333333, ans=0.2 2024-09-16 03:04:22,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=382466.0, ans=0.0 2024-09-16 03:04:36,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=382466.0, ans=0.0 2024-09-16 03:04:39,229 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:04:51,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=382494.3333333333, ans=0.2 2024-09-16 03:04:52,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=382522.6666666667, ans=0.0 2024-09-16 03:04:54,148 INFO [train.py:1198] (0/2) Epoch 22, batch 800, loss[loss=0.19, ctc_loss=0.126, cr_loss=0.3201, over 20325.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1577, cr_loss=0.3793, over 4041294.57 frames. ], batch size: 45, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:04:56,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=382522.6666666667, ans=0.125 2024-09-16 03:05:09,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=382551.0, ans=0.125 2024-09-16 03:05:21,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=382551.0, ans=0.2 2024-09-16 03:05:23,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=382579.3333333333, ans=0.125 2024-09-16 03:05:30,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382579.3333333333, ans=0.1 2024-09-16 03:05:42,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.044e+02 2.212e+02 2.409e+02 3.838e+02, threshold=4.424e+02, percent-clipped=0.0 2024-09-16 03:06:09,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2024-09-16 03:06:09,900 INFO [train.py:1198] (0/2) Epoch 22, batch 850, loss[loss=0.2596, ctc_loss=0.1751, cr_loss=0.4223, over 21011.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1576, cr_loss=0.3792, over 4062221.37 frames. ], batch size: 63, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:06:27,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2024-09-16 03:06:37,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=382692.6666666667, ans=0.125 2024-09-16 03:06:46,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-09-16 03:07:13,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-09-16 03:07:25,291 INFO [train.py:1198] (0/2) Epoch 22, batch 900, loss[loss=0.2135, ctc_loss=0.1406, cr_loss=0.3644, over 20927.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1577, cr_loss=0.379, over 4069465.30 frames. ], batch size: 49, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:07:41,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=382806.0, ans=0.125 2024-09-16 03:08:17,168 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.051e+02 2.198e+02 2.389e+02 6.083e+02, threshold=4.396e+02, percent-clipped=1.0 2024-09-16 03:08:44,625 INFO [train.py:1198] (0/2) Epoch 22, batch 950, loss[loss=0.2839, ctc_loss=0.2048, cr_loss=0.3954, over 14163.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1587, cr_loss=0.3807, over 4068386.34 frames. ], batch size: 150, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:08:46,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=382947.6666666667, ans=0.2 2024-09-16 03:08:46,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=382947.6666666667, ans=0.0 2024-09-16 03:08:46,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=22.5 2024-09-16 03:09:08,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=382976.0, ans=0.125 2024-09-16 03:09:09,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2024-09-16 03:09:24,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2024-09-16 03:09:46,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=383061.0, ans=0.2 2024-09-16 03:10:02,884 INFO [train.py:1198] (0/2) Epoch 22, batch 1000, loss[loss=0.232, ctc_loss=0.1564, cr_loss=0.3777, over 20880.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1584, cr_loss=0.3798, over 4077898.72 frames. ], batch size: 57, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:10:04,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=383089.3333333333, ans=0.0 2024-09-16 03:10:04,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=383089.3333333333, ans=0.125 2024-09-16 03:10:07,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=383089.3333333333, ans=0.0 2024-09-16 03:10:21,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=383117.6666666667, ans=0.0 2024-09-16 03:10:28,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383117.6666666667, ans=0.1 2024-09-16 03:10:30,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=383117.6666666667, ans=0.0 2024-09-16 03:10:51,296 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.084e+02 2.204e+02 2.351e+02 3.756e+02, threshold=4.408e+02, percent-clipped=0.0 2024-09-16 03:11:00,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=383174.3333333333, ans=0.125 2024-09-16 03:11:06,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=383202.6666666667, ans=0.125 2024-09-16 03:11:18,593 INFO [train.py:1198] (0/2) Epoch 22, batch 1050, loss[loss=0.2046, ctc_loss=0.1383, cr_loss=0.3313, over 20954.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1589, cr_loss=0.3805, over 4095762.70 frames. ], batch size: 51, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:11:57,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=383287.6666666667, ans=0.125 2024-09-16 03:12:00,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=383287.6666666667, ans=22.5 2024-09-16 03:12:03,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=383316.0, ans=0.2 2024-09-16 03:12:28,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=383344.3333333333, ans=0.025 2024-09-16 03:12:33,205 INFO [train.py:1198] (0/2) Epoch 22, batch 1100, loss[loss=0.2282, ctc_loss=0.1523, cr_loss=0.3795, over 21062.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1589, cr_loss=0.3803, over 4092668.14 frames. ], batch size: 62, lr: 3.82e-03, grad_scale: 32.0 2024-09-16 03:12:33,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=383372.6666666667, ans=0.125 2024-09-16 03:12:44,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=383372.6666666667, ans=0.125 2024-09-16 03:13:24,401 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.108e+02 2.238e+02 2.414e+02 3.006e+02, threshold=4.477e+02, percent-clipped=0.0 2024-09-16 03:13:30,764 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:13:48,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=383486.0, ans=0.125 2024-09-16 03:13:51,150 INFO [train.py:1198] (0/2) Epoch 22, batch 1150, loss[loss=0.3139, ctc_loss=0.2256, cr_loss=0.4413, over 14196.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1589, cr_loss=0.3806, over 4097270.13 frames. ], batch size: 149, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:14:35,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=383599.3333333333, ans=0.125 2024-09-16 03:14:37,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=383599.3333333333, ans=0.125 2024-09-16 03:15:02,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=383627.6666666667, ans=0.125 2024-09-16 03:15:03,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=383627.6666666667, ans=0.125 2024-09-16 03:15:11,174 INFO [train.py:1198] (0/2) Epoch 22, batch 1200, loss[loss=0.1969, ctc_loss=0.1289, cr_loss=0.3399, over 20979.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1589, cr_loss=0.3808, over 4097805.75 frames. ], batch size: 50, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:15:21,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=383656.0, ans=0.125 2024-09-16 03:15:27,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=383684.3333333333, ans=0.2 2024-09-16 03:15:33,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=383684.3333333333, ans=0.0 2024-09-16 03:15:58,879 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.103e+02 2.212e+02 2.374e+02 2.854e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-16 03:16:05,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=383741.0, ans=0.125 2024-09-16 03:16:11,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=383769.3333333333, ans=0.1 2024-09-16 03:16:26,111 INFO [train.py:1198] (0/2) Epoch 22, batch 1250, loss[loss=0.2635, ctc_loss=0.181, cr_loss=0.4123, over 19474.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1592, cr_loss=0.3804, over 4088446.81 frames. ], batch size: 90, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:16:27,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=383797.6666666667, ans=10.0 2024-09-16 03:16:35,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=383797.6666666667, ans=0.0 2024-09-16 03:16:37,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=12.0 2024-09-16 03:17:00,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383854.3333333333, ans=0.1 2024-09-16 03:17:26,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=383911.0, ans=0.04949747468305833 2024-09-16 03:17:41,419 INFO [train.py:1198] (0/2) Epoch 22, batch 1300, loss[loss=0.2431, ctc_loss=0.1613, cr_loss=0.4087, over 20959.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1602, cr_loss=0.3821, over 4076741.90 frames. ], batch size: 58, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:17:43,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=383939.3333333333, ans=0.0 2024-09-16 03:18:31,503 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.059e+02 2.183e+02 2.355e+02 2.900e+02, threshold=4.366e+02, percent-clipped=0.0 2024-09-16 03:18:50,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=384052.6666666667, ans=0.025 2024-09-16 03:18:50,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=384052.6666666667, ans=0.0 2024-09-16 03:19:00,535 INFO [train.py:1198] (0/2) Epoch 22, batch 1350, loss[loss=0.2403, ctc_loss=0.1645, cr_loss=0.3792, over 21072.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1601, cr_loss=0.3823, over 4087182.65 frames. ], batch size: 59, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:19:09,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=384081.0, ans=0.2 2024-09-16 03:19:56,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=384166.0, ans=0.125 2024-09-16 03:20:04,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=384194.3333333333, ans=0.0 2024-09-16 03:20:07,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=384194.3333333333, ans=0.0 2024-09-16 03:20:13,398 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:20:16,094 INFO [train.py:1198] (0/2) Epoch 22, batch 1400, loss[loss=0.2098, ctc_loss=0.1393, cr_loss=0.3529, over 20950.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1602, cr_loss=0.3824, over 4087471.11 frames. ], batch size: 50, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:20:58,675 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:21:09,190 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.042e+02 2.214e+02 2.345e+02 3.120e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-16 03:21:34,889 INFO [train.py:1198] (0/2) Epoch 22, batch 1450, loss[loss=0.2675, ctc_loss=0.1845, cr_loss=0.4152, over 19959.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1594, cr_loss=0.3805, over 4082983.27 frames. ], batch size: 80, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:21:41,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=22.5 2024-09-16 03:22:29,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=384449.3333333333, ans=0.2 2024-09-16 03:22:50,645 INFO [train.py:1198] (0/2) Epoch 22, batch 1500, loss[loss=0.2377, ctc_loss=0.1601, cr_loss=0.3879, over 20982.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1588, cr_loss=0.3794, over 4089293.74 frames. ], batch size: 55, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:23:04,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=384534.3333333333, ans=0.1 2024-09-16 03:23:07,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=384534.3333333333, ans=0.0 2024-09-16 03:23:30,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-16 03:23:39,876 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.187e+02 2.319e+02 2.511e+02 4.105e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-16 03:24:03,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=384619.3333333333, ans=0.2 2024-09-16 03:24:03,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=384619.3333333333, ans=0.125 2024-09-16 03:24:05,657 INFO [train.py:1198] (0/2) Epoch 22, batch 1550, loss[loss=0.2315, ctc_loss=0.155, cr_loss=0.3825, over 20004.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1576, cr_loss=0.3779, over 4097949.93 frames. ], batch size: 44, lr: 3.81e-03, grad_scale: 16.0 2024-09-16 03:24:53,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=384732.6666666667, ans=0.1 2024-09-16 03:25:25,144 INFO [train.py:1198] (0/2) Epoch 22, batch 1600, loss[loss=0.2251, ctc_loss=0.151, cr_loss=0.3703, over 20772.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1576, cr_loss=0.3777, over 4102780.92 frames. ], batch size: 56, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:25:37,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=384789.3333333333, ans=0.0 2024-09-16 03:25:49,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=384817.6666666667, ans=0.125 2024-09-16 03:25:50,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=384817.6666666667, ans=0.125 2024-09-16 03:26:04,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=384846.0, ans=0.0 2024-09-16 03:26:14,498 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.090e+02 2.207e+02 2.363e+02 3.056e+02, threshold=4.414e+02, percent-clipped=0.0 2024-09-16 03:26:37,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=384902.6666666667, ans=0.125 2024-09-16 03:26:43,496 INFO [train.py:1198] (0/2) Epoch 22, batch 1650, loss[loss=0.1974, ctc_loss=0.1296, cr_loss=0.3388, over 20924.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1582, cr_loss=0.3787, over 4105423.52 frames. ], batch size: 49, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:27:16,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=384987.6666666667, ans=0.1 2024-09-16 03:27:51,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=22.5 2024-09-16 03:27:59,605 INFO [train.py:1198] (0/2) Epoch 22, batch 1700, loss[loss=0.2007, ctc_loss=0.132, cr_loss=0.3434, over 20934.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.159, cr_loss=0.3804, over 4096440.39 frames. ], batch size: 49, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:28:01,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=385072.6666666667, ans=0.035 2024-09-16 03:28:22,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=385101.0, ans=0.125 2024-09-16 03:28:40,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=385129.3333333333, ans=0.2 2024-09-16 03:28:42,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=385129.3333333333, ans=0.0 2024-09-16 03:28:49,255 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.062e+02 2.242e+02 2.396e+02 5.065e+02, threshold=4.484e+02, percent-clipped=1.0 2024-09-16 03:29:14,541 INFO [train.py:1198] (0/2) Epoch 22, batch 1750, loss[loss=0.2055, ctc_loss=0.1361, cr_loss=0.3471, over 20958.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1586, cr_loss=0.38, over 4105741.95 frames. ], batch size: 48, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:29:31,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=385242.6666666667, ans=0.04949747468305833 2024-09-16 03:29:45,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=385271.0, ans=0.125 2024-09-16 03:30:16,629 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-136000.pt 2024-09-16 03:30:20,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=385327.6666666667, ans=0.1 2024-09-16 03:30:25,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=385327.6666666667, ans=0.04949747468305833 2024-09-16 03:30:30,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385356.0, ans=0.1 2024-09-16 03:30:31,087 INFO [train.py:1198] (0/2) Epoch 22, batch 1800, loss[loss=0.2375, ctc_loss=0.1612, cr_loss=0.3819, over 20876.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1596, cr_loss=0.3815, over 4097244.96 frames. ], batch size: 57, lr: 3.81e-03, grad_scale: 32.0 2024-09-16 03:30:52,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=385384.3333333333, ans=0.125 2024-09-16 03:31:05,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.88 vs. limit=10.0 2024-09-16 03:31:20,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.07 vs. limit=22.5 2024-09-16 03:31:25,426 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.081e+02 2.211e+02 2.365e+02 3.519e+02, threshold=4.421e+02, percent-clipped=0.0 2024-09-16 03:31:49,731 INFO [train.py:1198] (0/2) Epoch 22, batch 1850, loss[loss=0.255, ctc_loss=0.1763, cr_loss=0.3933, over 20125.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1585, cr_loss=0.3802, over 4107916.80 frames. ], batch size: 80, lr: 3.80e-03, grad_scale: 16.0 2024-09-16 03:32:24,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=385554.3333333333, ans=0.04949747468305833 2024-09-16 03:33:08,518 INFO [train.py:1198] (0/2) Epoch 22, batch 1900, loss[loss=0.2621, ctc_loss=0.1799, cr_loss=0.4109, over 20870.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1582, cr_loss=0.3797, over 4103501.03 frames. ], batch size: 65, lr: 3.80e-03, grad_scale: 16.0 2024-09-16 03:33:08,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=385639.3333333333, ans=0.125 2024-09-16 03:33:16,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=385639.3333333333, ans=0.5 2024-09-16 03:33:31,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=385667.6666666667, ans=0.1 2024-09-16 03:33:59,619 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.069e+02 2.212e+02 2.397e+02 5.471e+02, threshold=4.425e+02, percent-clipped=1.0 2024-09-16 03:34:09,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=385752.6666666667, ans=0.07 2024-09-16 03:34:23,639 INFO [train.py:1198] (0/2) Epoch 22, batch 1950, loss[loss=0.2543, ctc_loss=0.174, cr_loss=0.4015, over 19656.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1572, cr_loss=0.3785, over 4110970.70 frames. ], batch size: 90, lr: 3.80e-03, grad_scale: 16.0 2024-09-16 03:34:48,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=385809.3333333333, ans=0.1 2024-09-16 03:34:51,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=385809.3333333333, ans=0.125 2024-09-16 03:34:55,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=385837.6666666667, ans=0.0 2024-09-16 03:35:30,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=385894.3333333333, ans=0.2 2024-09-16 03:35:35,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=385894.3333333333, ans=0.125 2024-09-16 03:35:39,604 INFO [train.py:1198] (0/2) Epoch 22, batch 2000, loss[loss=0.2354, ctc_loss=0.1586, cr_loss=0.3839, over 21018.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1588, cr_loss=0.3816, over 4100019.25 frames. ], batch size: 62, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:36:17,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=385979.3333333333, ans=0.025 2024-09-16 03:36:33,994 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.060e+02 2.177e+02 2.279e+02 3.176e+02, threshold=4.353e+02, percent-clipped=0.0 2024-09-16 03:36:51,373 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-09-16 03:36:52,683 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=15.0 2024-09-16 03:36:58,266 INFO [train.py:1198] (0/2) Epoch 22, batch 2050, loss[loss=0.2019, ctc_loss=0.134, cr_loss=0.3396, over 20882.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1583, cr_loss=0.3803, over 4092429.56 frames. ], batch size: 57, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:37:06,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=386064.3333333333, ans=0.0 2024-09-16 03:37:11,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=386092.6666666667, ans=0.125 2024-09-16 03:37:22,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=386092.6666666667, ans=0.0 2024-09-16 03:38:09,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386177.6666666667, ans=0.1 2024-09-16 03:38:16,605 INFO [train.py:1198] (0/2) Epoch 22, batch 2100, loss[loss=0.246, ctc_loss=0.1703, cr_loss=0.3785, over 19398.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1587, cr_loss=0.381, over 4080401.19 frames. ], batch size: 90, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:38:28,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2024-09-16 03:38:36,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=386234.3333333333, ans=15.0 2024-09-16 03:38:56,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386262.6666666667, ans=0.1 2024-09-16 03:38:59,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2024-09-16 03:39:08,049 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.117e+02 2.300e+02 2.525e+02 5.578e+02, threshold=4.599e+02, percent-clipped=2.0 2024-09-16 03:39:32,018 INFO [train.py:1198] (0/2) Epoch 22, batch 2150, loss[loss=0.3068, ctc_loss=0.2245, cr_loss=0.4118, over 13994.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1603, cr_loss=0.3823, over 4063749.26 frames. ], batch size: 151, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:39:32,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=386347.6666666667, ans=0.2 2024-09-16 03:40:03,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-09-16 03:40:10,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=386404.3333333333, ans=0.0 2024-09-16 03:40:17,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=386432.6666666667, ans=0.125 2024-09-16 03:40:45,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=386461.0, ans=0.125 2024-09-16 03:40:47,740 INFO [train.py:1198] (0/2) Epoch 22, batch 2200, loss[loss=0.2196, ctc_loss=0.15, cr_loss=0.3478, over 21072.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1607, cr_loss=0.3829, over 4067194.78 frames. ], batch size: 59, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:41:00,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=386489.3333333333, ans=0.2 2024-09-16 03:41:03,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=386517.6666666667, ans=0.0 2024-09-16 03:41:24,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=386546.0, ans=0.125 2024-09-16 03:41:39,669 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.125e+02 2.244e+02 2.457e+02 3.951e+02, threshold=4.488e+02, percent-clipped=0.0 2024-09-16 03:42:04,001 INFO [train.py:1198] (0/2) Epoch 22, batch 2250, loss[loss=0.2547, ctc_loss=0.1717, cr_loss=0.415, over 20779.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1607, cr_loss=0.3832, over 4072384.31 frames. ], batch size: 71, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:42:05,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=386631.0, ans=0.125 2024-09-16 03:42:16,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2024-09-16 03:42:28,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=386659.3333333333, ans=0.125 2024-09-16 03:42:46,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386687.6666666667, ans=0.1 2024-09-16 03:42:52,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2024-09-16 03:43:17,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-09-16 03:43:22,201 INFO [train.py:1198] (0/2) Epoch 22, batch 2300, loss[loss=0.2803, ctc_loss=0.2003, cr_loss=0.4002, over 14419.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1608, cr_loss=0.3829, over 4066244.27 frames. ], batch size: 150, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:43:40,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=386801.0, ans=0.125 2024-09-16 03:43:52,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=386801.0, ans=0.125 2024-09-16 03:44:09,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=386857.6666666667, ans=0.0 2024-09-16 03:44:15,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=386857.6666666667, ans=0.5 2024-09-16 03:44:16,755 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.076e+02 2.209e+02 2.397e+02 3.724e+02, threshold=4.418e+02, percent-clipped=0.0 2024-09-16 03:44:23,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=386857.6666666667, ans=15.0 2024-09-16 03:44:24,069 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2024-09-16 03:44:32,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=386886.0, ans=0.0 2024-09-16 03:44:41,198 INFO [train.py:1198] (0/2) Epoch 22, batch 2350, loss[loss=0.2417, ctc_loss=0.1635, cr_loss=0.3908, over 21010.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1611, cr_loss=0.383, over 4070352.09 frames. ], batch size: 61, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:44:57,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=386942.6666666667, ans=0.125 2024-09-16 03:45:55,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=387056.0, ans=0.125 2024-09-16 03:45:55,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=22.5 2024-09-16 03:45:56,634 INFO [train.py:1198] (0/2) Epoch 22, batch 2400, loss[loss=0.2343, ctc_loss=0.1596, cr_loss=0.3737, over 20951.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1603, cr_loss=0.3818, over 4075431.64 frames. ], batch size: 60, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:45:58,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=387056.0, ans=0.0 2024-09-16 03:46:03,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=387056.0, ans=0.125 2024-09-16 03:46:13,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=387084.3333333333, ans=0.1 2024-09-16 03:46:14,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=387084.3333333333, ans=0.04949747468305833 2024-09-16 03:46:16,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-16 03:46:47,255 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.146e+02 2.286e+02 2.479e+02 3.833e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-16 03:46:49,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=387141.0, ans=0.2 2024-09-16 03:47:11,659 INFO [train.py:1198] (0/2) Epoch 22, batch 2450, loss[loss=0.2052, ctc_loss=0.1393, cr_loss=0.3295, over 19895.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1601, cr_loss=0.3806, over 4078846.01 frames. ], batch size: 44, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:47:21,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=387197.6666666667, ans=0.125 2024-09-16 03:47:44,086 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:48:01,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=387282.6666666667, ans=0.025 2024-09-16 03:48:03,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=387282.6666666667, ans=0.2 2024-09-16 03:48:09,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=387282.6666666667, ans=0.0 2024-09-16 03:48:09,684 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=12.0 2024-09-16 03:48:29,992 INFO [train.py:1198] (0/2) Epoch 22, batch 2500, loss[loss=0.2295, ctc_loss=0.1548, cr_loss=0.373, over 21050.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.16, cr_loss=0.3806, over 4073326.29 frames. ], batch size: 56, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:48:36,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=387339.3333333333, ans=0.2 2024-09-16 03:48:54,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=387367.6666666667, ans=0.025 2024-09-16 03:49:24,630 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.111e+02 2.264e+02 2.466e+02 3.770e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-16 03:49:39,965 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:49:46,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2024-09-16 03:49:48,867 INFO [train.py:1198] (0/2) Epoch 22, batch 2550, loss[loss=0.2541, ctc_loss=0.1734, cr_loss=0.4035, over 21046.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1593, cr_loss=0.38, over 4087293.53 frames. ], batch size: 56, lr: 3.80e-03, grad_scale: 32.0 2024-09-16 03:49:51,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=22.5 2024-09-16 03:50:40,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=387566.0, ans=0.025 2024-09-16 03:50:46,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=387566.0, ans=0.0 2024-09-16 03:50:49,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=387594.3333333333, ans=0.025 2024-09-16 03:51:03,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=387622.6666666667, ans=0.125 2024-09-16 03:51:04,677 INFO [train.py:1198] (0/2) Epoch 22, batch 2600, loss[loss=0.2283, ctc_loss=0.1541, cr_loss=0.3708, over 20784.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1599, cr_loss=0.3818, over 4090317.28 frames. ], batch size: 56, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:51:16,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=387622.6666666667, ans=0.125 2024-09-16 03:51:40,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=387679.3333333333, ans=0.0 2024-09-16 03:51:55,745 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.061e+02 2.189e+02 2.377e+02 4.956e+02, threshold=4.378e+02, percent-clipped=1.0 2024-09-16 03:52:18,027 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2024-09-16 03:52:20,066 INFO [train.py:1198] (0/2) Epoch 22, batch 2650, loss[loss=0.2773, ctc_loss=0.19, cr_loss=0.4365, over 20970.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1607, cr_loss=0.3824, over 4082070.26 frames. ], batch size: 67, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:52:22,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2024-09-16 03:52:41,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=387792.6666666667, ans=0.125 2024-09-16 03:53:25,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-16 03:53:35,156 INFO [train.py:1198] (0/2) Epoch 22, batch 2700, loss[loss=0.2328, ctc_loss=0.1539, cr_loss=0.3949, over 20885.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1603, cr_loss=0.3819, over 4084157.64 frames. ], batch size: 57, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:53:55,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=387934.3333333333, ans=0.2 2024-09-16 03:54:09,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=387962.6666666667, ans=15.0 2024-09-16 03:54:29,575 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.062e+02 2.198e+02 2.349e+02 3.495e+02, threshold=4.395e+02, percent-clipped=0.0 2024-09-16 03:54:53,747 INFO [train.py:1198] (0/2) Epoch 22, batch 2750, loss[loss=0.2349, ctc_loss=0.1567, cr_loss=0.3911, over 20786.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1596, cr_loss=0.3817, over 4090495.57 frames. ], batch size: 56, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:55:28,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388104.3333333333, ans=0.1 2024-09-16 03:56:11,439 INFO [train.py:1198] (0/2) Epoch 22, batch 2800, loss[loss=0.1838, ctc_loss=0.1217, cr_loss=0.3105, over 20958.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1588, cr_loss=0.3801, over 4091956.90 frames. ], batch size: 49, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:56:20,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=388189.3333333333, ans=0.1 2024-09-16 03:56:40,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=388246.0, ans=0.95 2024-09-16 03:56:40,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=388246.0, ans=0.0 2024-09-16 03:56:47,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=388246.0, ans=0.125 2024-09-16 03:57:02,304 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.043e+02 2.192e+02 2.324e+02 3.361e+02, threshold=4.383e+02, percent-clipped=0.0 2024-09-16 03:57:08,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=388274.3333333333, ans=0.5 2024-09-16 03:57:10,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388302.6666666667, ans=0.1 2024-09-16 03:57:17,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=388302.6666666667, ans=0.025 2024-09-16 03:57:26,413 INFO [train.py:1198] (0/2) Epoch 22, batch 2850, loss[loss=0.2368, ctc_loss=0.1613, cr_loss=0.3776, over 21056.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1583, cr_loss=0.3797, over 4102278.37 frames. ], batch size: 56, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:57:28,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=388331.0, ans=0.025 2024-09-16 03:57:29,777 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 03:57:55,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=388387.6666666667, ans=0.125 2024-09-16 03:58:42,376 INFO [train.py:1198] (0/2) Epoch 22, batch 2900, loss[loss=0.2447, ctc_loss=0.1657, cr_loss=0.395, over 20841.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1579, cr_loss=0.3794, over 4112943.43 frames. ], batch size: 65, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 03:59:11,751 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-16 03:59:21,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=388529.3333333333, ans=0.2 2024-09-16 03:59:36,870 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.053e+02 2.203e+02 2.348e+02 3.053e+02, threshold=4.406e+02, percent-clipped=0.0 2024-09-16 03:59:47,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=388586.0, ans=0.0 2024-09-16 03:59:52,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=22.5 2024-09-16 04:00:01,270 INFO [train.py:1198] (0/2) Epoch 22, batch 2950, loss[loss=0.2016, ctc_loss=0.1327, cr_loss=0.3445, over 21032.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1578, cr_loss=0.3787, over 4113546.79 frames. ], batch size: 53, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:00:15,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=388642.6666666667, ans=0.0 2024-09-16 04:00:18,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388642.6666666667, ans=0.1 2024-09-16 04:01:19,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.33 vs. limit=15.0 2024-09-16 04:01:19,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.99 vs. limit=15.0 2024-09-16 04:01:19,947 INFO [train.py:1198] (0/2) Epoch 22, batch 3000, loss[loss=0.2465, ctc_loss=0.1669, cr_loss=0.3976, over 20356.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.158, cr_loss=0.3794, over 4106627.74 frames. ], batch size: 74, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:01:19,948 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 04:01:45,008 INFO [train.py:1230] (0/2) Epoch 22, validation: loss=0.04351, ctc_loss=0.04351, cr_loss=1.114e-14, over 944034.00 frames. 2024-09-16 04:01:45,009 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 04:01:48,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=388756.0, ans=0.025 2024-09-16 04:01:48,464 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 04:02:23,746 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-16 04:02:24,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=388812.6666666667, ans=0.035 2024-09-16 04:02:36,671 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.099e+02 2.248e+02 2.396e+02 4.773e+02, threshold=4.495e+02, percent-clipped=1.0 2024-09-16 04:03:01,136 INFO [train.py:1198] (0/2) Epoch 22, batch 3050, loss[loss=0.2325, ctc_loss=0.158, cr_loss=0.3724, over 20672.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1579, cr_loss=0.3796, over 4112759.20 frames. ], batch size: 68, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:03:01,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=388897.6666666667, ans=0.025 2024-09-16 04:03:04,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=388897.6666666667, ans=0.2 2024-09-16 04:03:17,026 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2024-09-16 04:03:18,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=388926.0, ans=0.0 2024-09-16 04:03:36,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2024-09-16 04:04:06,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389011.0, ans=0.1 2024-09-16 04:04:17,352 INFO [train.py:1198] (0/2) Epoch 22, batch 3100, loss[loss=0.2266, ctc_loss=0.1515, cr_loss=0.3753, over 21048.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1578, cr_loss=0.3794, over 4106354.09 frames. ], batch size: 62, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:04:17,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389039.3333333333, ans=0.1 2024-09-16 04:04:23,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=389039.3333333333, ans=0.2 2024-09-16 04:04:29,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=389039.3333333333, ans=0.125 2024-09-16 04:04:31,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.30 vs. limit=10.0 2024-09-16 04:04:32,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=389067.6666666667, ans=0.125 2024-09-16 04:04:52,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=389096.0, ans=0.125 2024-09-16 04:05:04,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=389124.3333333333, ans=0.04949747468305833 2024-09-16 04:05:08,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=389124.3333333333, ans=0.0 2024-09-16 04:05:11,804 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.102e+02 2.307e+02 2.423e+02 3.397e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-16 04:05:33,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389152.6666666667, ans=0.1 2024-09-16 04:05:36,209 INFO [train.py:1198] (0/2) Epoch 22, batch 3150, loss[loss=0.2483, ctc_loss=0.1695, cr_loss=0.394, over 20969.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1585, cr_loss=0.381, over 4103953.58 frames. ], batch size: 64, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:05:41,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=389181.0, ans=0.125 2024-09-16 04:05:45,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=389181.0, ans=0.125 2024-09-16 04:05:50,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=389209.3333333333, ans=0.0 2024-09-16 04:05:59,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389209.3333333333, ans=0.1 2024-09-16 04:06:03,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=389209.3333333333, ans=0.2 2024-09-16 04:06:05,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2024-09-16 04:06:14,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=389237.6666666667, ans=0.0 2024-09-16 04:06:32,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=389266.0, ans=0.2 2024-09-16 04:06:54,389 INFO [train.py:1198] (0/2) Epoch 22, batch 3200, loss[loss=0.2509, ctc_loss=0.1708, cr_loss=0.4003, over 20749.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1589, cr_loss=0.3814, over 4102740.72 frames. ], batch size: 68, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:07:00,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=389322.6666666667, ans=0.0 2024-09-16 04:07:25,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.06 vs. limit=10.0 2024-09-16 04:07:45,575 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.107e+02 2.270e+02 2.473e+02 4.706e+02, threshold=4.539e+02, percent-clipped=1.0 2024-09-16 04:07:45,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=389407.6666666667, ans=0.0 2024-09-16 04:08:09,612 INFO [train.py:1198] (0/2) Epoch 22, batch 3250, loss[loss=0.2345, ctc_loss=0.1579, cr_loss=0.3833, over 20829.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1593, cr_loss=0.3816, over 4099383.48 frames. ], batch size: 59, lr: 3.79e-03, grad_scale: 32.0 2024-09-16 04:08:11,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389464.3333333333, ans=0.1 2024-09-16 04:08:23,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=389492.6666666667, ans=0.0 2024-09-16 04:08:43,024 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 04:08:43,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=389521.0, ans=0.2 2024-09-16 04:09:14,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=389577.6666666667, ans=0.0 2024-09-16 04:09:25,388 INFO [train.py:1198] (0/2) Epoch 22, batch 3300, loss[loss=0.2513, ctc_loss=0.1707, cr_loss=0.4033, over 20630.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1589, cr_loss=0.3813, over 4101620.92 frames. ], batch size: 66, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:09:37,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=389606.0, ans=0.0 2024-09-16 04:09:59,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=389662.6666666667, ans=0.025 2024-09-16 04:10:01,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=389662.6666666667, ans=0.125 2024-09-16 04:10:06,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=389662.6666666667, ans=0.07 2024-09-16 04:10:17,119 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.038e+02 2.206e+02 2.374e+02 3.484e+02, threshold=4.412e+02, percent-clipped=0.0 2024-09-16 04:10:25,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389719.3333333333, ans=0.1 2024-09-16 04:10:32,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=389719.3333333333, ans=0.025 2024-09-16 04:10:41,564 INFO [train.py:1198] (0/2) Epoch 22, batch 3350, loss[loss=0.241, ctc_loss=0.1623, cr_loss=0.3935, over 20877.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1578, cr_loss=0.3796, over 4100627.40 frames. ], batch size: 54, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:10:47,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=389747.6666666667, ans=0.5 2024-09-16 04:10:50,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=389747.6666666667, ans=0.125 2024-09-16 04:10:59,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=389776.0, ans=0.0 2024-09-16 04:11:18,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=389804.3333333333, ans=0.0 2024-09-16 04:11:21,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=389804.3333333333, ans=0.2 2024-09-16 04:11:59,835 INFO [train.py:1198] (0/2) Epoch 22, batch 3400, loss[loss=0.1863, ctc_loss=0.124, cr_loss=0.3114, over 21005.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1582, cr_loss=0.3806, over 4103337.84 frames. ], batch size: 52, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:12:30,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=389917.6666666667, ans=0.0 2024-09-16 04:12:36,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=389946.0, ans=0.0 2024-09-16 04:12:54,190 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.077e+02 2.179e+02 2.335e+02 3.315e+02, threshold=4.357e+02, percent-clipped=0.0 2024-09-16 04:13:08,934 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.68 vs. limit=10.0 2024-09-16 04:13:13,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=390002.6666666667, ans=0.05 2024-09-16 04:13:17,972 INFO [train.py:1198] (0/2) Epoch 22, batch 3450, loss[loss=0.2228, ctc_loss=0.1499, cr_loss=0.3645, over 20783.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1584, cr_loss=0.3807, over 4095905.96 frames. ], batch size: 53, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:13:30,949 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=12.0 2024-09-16 04:14:06,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=390116.0, ans=22.5 2024-09-16 04:14:25,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=390144.3333333333, ans=0.025 2024-09-16 04:14:33,093 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=12.0 2024-09-16 04:14:33,745 INFO [train.py:1198] (0/2) Epoch 22, batch 3500, loss[loss=0.2112, ctc_loss=0.1412, cr_loss=0.3502, over 20969.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1575, cr_loss=0.3791, over 4100423.84 frames. ], batch size: 50, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:14:40,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=390172.6666666667, ans=0.1 2024-09-16 04:15:14,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=390229.3333333333, ans=0.025 2024-09-16 04:15:24,898 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.119e+02 2.223e+02 2.406e+02 4.097e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-16 04:15:29,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=390257.6666666667, ans=0.125 2024-09-16 04:15:34,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=390286.0, ans=0.125 2024-09-16 04:15:49,161 INFO [train.py:1198] (0/2) Epoch 22, batch 3550, loss[loss=0.2293, ctc_loss=0.1539, cr_loss=0.3769, over 21050.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1578, cr_loss=0.3795, over 4106515.30 frames. ], batch size: 62, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:15:52,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=390314.3333333333, ans=0.5 2024-09-16 04:16:14,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=390342.6666666667, ans=0.025 2024-09-16 04:16:30,802 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=22.5 2024-09-16 04:16:33,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390371.0, ans=0.1 2024-09-16 04:17:08,500 INFO [train.py:1198] (0/2) Epoch 22, batch 3600, loss[loss=0.2176, ctc_loss=0.1473, cr_loss=0.3517, over 21063.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1578, cr_loss=0.3791, over 4092160.63 frames. ], batch size: 53, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:17:14,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2024-09-16 04:17:18,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=390456.0, ans=0.0 2024-09-16 04:18:03,034 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.081e+02 2.244e+02 2.465e+02 3.596e+02, threshold=4.487e+02, percent-clipped=0.0 2024-09-16 04:18:21,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=390569.3333333333, ans=0.0 2024-09-16 04:18:23,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.98 vs. limit=10.0 2024-09-16 04:18:27,098 INFO [train.py:1198] (0/2) Epoch 22, batch 3650, loss[loss=0.2303, ctc_loss=0.1555, cr_loss=0.3738, over 20786.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1591, cr_loss=0.3799, over 4082504.53 frames. ], batch size: 56, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:18:33,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=390597.6666666667, ans=0.125 2024-09-16 04:19:29,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=390711.0, ans=0.125 2024-09-16 04:19:31,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=390711.0, ans=0.0 2024-09-16 04:19:31,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=390711.0, ans=0.125 2024-09-16 04:19:42,990 INFO [train.py:1198] (0/2) Epoch 22, batch 3700, loss[loss=0.265, ctc_loss=0.1802, cr_loss=0.4242, over 21044.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1587, cr_loss=0.379, over 4088151.59 frames. ], batch size: 63, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:19:58,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=390767.6666666667, ans=0.125 2024-09-16 04:19:58,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=390767.6666666667, ans=0.125 2024-09-16 04:20:15,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=390796.0, ans=0.025 2024-09-16 04:20:34,602 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.089e+02 2.219e+02 2.413e+02 3.077e+02, threshold=4.439e+02, percent-clipped=0.0 2024-09-16 04:20:41,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.52 vs. limit=15.0 2024-09-16 04:20:59,027 INFO [train.py:1198] (0/2) Epoch 22, batch 3750, loss[loss=0.2171, ctc_loss=0.1454, cr_loss=0.3586, over 21084.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1585, cr_loss=0.3786, over 4078232.03 frames. ], batch size: 59, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:21:25,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=390909.3333333333, ans=10.0 2024-09-16 04:21:50,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=390966.0, ans=0.0 2024-09-16 04:22:05,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=390994.3333333333, ans=0.05 2024-09-16 04:22:12,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=390994.3333333333, ans=0.025 2024-09-16 04:22:12,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-16 04:22:18,450 INFO [train.py:1198] (0/2) Epoch 22, batch 3800, loss[loss=0.255, ctc_loss=0.1761, cr_loss=0.3944, over 20046.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1582, cr_loss=0.3779, over 4084562.13 frames. ], batch size: 80, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:22:25,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2024-09-16 04:22:32,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=391051.0, ans=0.0 2024-09-16 04:22:37,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.98 vs. limit=15.0 2024-09-16 04:23:04,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=391107.6666666667, ans=0.2 2024-09-16 04:23:11,205 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.063e+02 2.175e+02 2.345e+02 2.892e+02, threshold=4.350e+02, percent-clipped=0.0 2024-09-16 04:23:21,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-16 04:23:34,013 INFO [train.py:1198] (0/2) Epoch 22, batch 3850, loss[loss=0.1974, ctc_loss=0.1322, cr_loss=0.3259, over 19507.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1582, cr_loss=0.3788, over 4093459.77 frames. ], batch size: 43, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:23:52,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-09-16 04:24:31,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=391249.3333333333, ans=0.125 2024-09-16 04:24:46,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=391277.6666666667, ans=0.1 2024-09-16 04:24:52,223 INFO [train.py:1198] (0/2) Epoch 22, batch 3900, loss[loss=0.2699, ctc_loss=0.194, cr_loss=0.3794, over 14183.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1588, cr_loss=0.3795, over 4083111.54 frames. ], batch size: 149, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:25:45,027 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.101e+02 2.235e+02 2.416e+02 4.416e+02, threshold=4.469e+02, percent-clipped=0.0 2024-09-16 04:25:58,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.98 vs. limit=15.0 2024-09-16 04:26:07,855 INFO [train.py:1198] (0/2) Epoch 22, batch 3950, loss[loss=0.2246, ctc_loss=0.1526, cr_loss=0.3597, over 20868.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1593, cr_loss=0.3803, over 4081314.23 frames. ], batch size: 57, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:26:31,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391476.0, ans=0.1 2024-09-16 04:26:36,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=391504.3333333333, ans=0.125 2024-09-16 04:26:59,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=391532.6666666667, ans=0.1 2024-09-16 04:27:02,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=391532.6666666667, ans=0.2 2024-09-16 04:27:23,320 INFO [train.py:1198] (0/2) Epoch 22, batch 4000, loss[loss=0.2382, ctc_loss=0.1642, cr_loss=0.3697, over 19525.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1606, cr_loss=0.382, over 4062937.18 frames. ], batch size: 90, lr: 3.78e-03, grad_scale: 32.0 2024-09-16 04:27:44,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=391617.6666666667, ans=0.125 2024-09-16 04:27:47,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=391617.6666666667, ans=0.04949747468305833 2024-09-16 04:28:01,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=391646.0, ans=0.125 2024-09-16 04:28:11,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-16 04:28:19,222 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.124e+02 2.337e+02 2.533e+02 3.708e+02, threshold=4.673e+02, percent-clipped=0.0 2024-09-16 04:28:24,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=391674.3333333333, ans=0.0 2024-09-16 04:28:42,047 INFO [train.py:1198] (0/2) Epoch 22, batch 4050, loss[loss=0.2013, ctc_loss=0.134, cr_loss=0.3366, over 20977.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1593, cr_loss=0.3804, over 4075950.14 frames. ], batch size: 51, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:29:12,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=391759.3333333333, ans=0.025 2024-09-16 04:29:26,296 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 04:29:32,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=391816.0, ans=0.125 2024-09-16 04:29:53,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=391844.3333333333, ans=0.0 2024-09-16 04:30:00,565 INFO [train.py:1198] (0/2) Epoch 22, batch 4100, loss[loss=0.2328, ctc_loss=0.1574, cr_loss=0.377, over 21072.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1583, cr_loss=0.3796, over 4083994.91 frames. ], batch size: 59, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:30:19,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=391901.0, ans=0.125 2024-09-16 04:30:23,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=391901.0, ans=0.125 2024-09-16 04:30:25,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=391901.0, ans=0.2 2024-09-16 04:30:25,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=391901.0, ans=0.0 2024-09-16 04:30:29,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=391929.3333333333, ans=0.0 2024-09-16 04:30:31,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=391929.3333333333, ans=0.0 2024-09-16 04:30:41,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=391929.3333333333, ans=0.125 2024-09-16 04:30:49,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-16 04:30:53,212 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.155e+02 2.278e+02 2.484e+02 4.525e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-16 04:31:02,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=391986.0, ans=0.0 2024-09-16 04:31:07,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=391986.0, ans=0.1 2024-09-16 04:31:15,714 INFO [train.py:1198] (0/2) Epoch 22, batch 4150, loss[loss=0.206, ctc_loss=0.1376, cr_loss=0.3419, over 19985.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.158, cr_loss=0.3789, over 4092144.21 frames. ], batch size: 44, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:31:16,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=392014.3333333333, ans=0.125 2024-09-16 04:31:24,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=392014.3333333333, ans=0.125 2024-09-16 04:31:37,361 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2024-09-16 04:31:55,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=392071.0, ans=0.125 2024-09-16 04:32:16,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=392127.6666666667, ans=0.0 2024-09-16 04:32:31,137 INFO [train.py:1198] (0/2) Epoch 22, batch 4200, loss[loss=0.1794, ctc_loss=0.1168, cr_loss=0.3133, over 21012.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1575, cr_loss=0.378, over 4092081.61 frames. ], batch size: 51, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:32:46,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=392184.3333333333, ans=0.2 2024-09-16 04:32:57,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=392184.3333333333, ans=0.0 2024-09-16 04:33:06,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=392212.6666666667, ans=0.07 2024-09-16 04:33:17,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=392212.6666666667, ans=0.125 2024-09-16 04:33:22,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2024-09-16 04:33:27,429 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.081e+02 2.204e+02 2.322e+02 3.465e+02, threshold=4.407e+02, percent-clipped=0.0 2024-09-16 04:33:45,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=392269.3333333333, ans=0.2 2024-09-16 04:33:45,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=392269.3333333333, ans=0.125 2024-09-16 04:33:45,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=392269.3333333333, ans=0.0 2024-09-16 04:33:50,093 INFO [train.py:1198] (0/2) Epoch 22, batch 4250, loss[loss=0.2269, ctc_loss=0.1551, cr_loss=0.3587, over 20995.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1575, cr_loss=0.3781, over 4095732.99 frames. ], batch size: 55, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:33:51,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=392297.6666666667, ans=0.0 2024-09-16 04:34:00,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=392297.6666666667, ans=0.0 2024-09-16 04:34:05,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=392326.0, ans=0.5 2024-09-16 04:34:05,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=392326.0, ans=0.2 2024-09-16 04:34:19,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=392354.3333333333, ans=0.125 2024-09-16 04:35:09,027 INFO [train.py:1198] (0/2) Epoch 22, batch 4300, loss[loss=0.185, ctc_loss=0.1221, cr_loss=0.3144, over 20937.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1578, cr_loss=0.3782, over 4096067.50 frames. ], batch size: 51, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:35:38,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=392496.0, ans=0.1 2024-09-16 04:35:56,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=392524.3333333333, ans=0.0 2024-09-16 04:36:01,779 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.147e+02 2.313e+02 2.532e+02 3.312e+02, threshold=4.626e+02, percent-clipped=0.0 2024-09-16 04:36:19,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=392552.6666666667, ans=15.0 2024-09-16 04:36:24,249 INFO [train.py:1198] (0/2) Epoch 22, batch 4350, loss[loss=0.2222, ctc_loss=0.1457, cr_loss=0.3823, over 21040.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1575, cr_loss=0.3777, over 4092422.69 frames. ], batch size: 56, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:36:24,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-16 04:36:32,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.40 vs. limit=6.0 2024-09-16 04:37:23,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=392694.3333333333, ans=0.05 2024-09-16 04:37:39,396 INFO [train.py:1198] (0/2) Epoch 22, batch 4400, loss[loss=0.2198, ctc_loss=0.1462, cr_loss=0.368, over 20968.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1583, cr_loss=0.3791, over 4091390.68 frames. ], batch size: 58, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:37:39,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=392722.6666666667, ans=0.035 2024-09-16 04:37:41,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=392722.6666666667, ans=0.0 2024-09-16 04:37:44,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=392722.6666666667, ans=0.125 2024-09-16 04:38:16,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=392779.3333333333, ans=0.125 2024-09-16 04:38:33,252 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.079e+02 2.214e+02 2.396e+02 2.802e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-16 04:38:36,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=392807.6666666667, ans=0.025 2024-09-16 04:38:56,684 INFO [train.py:1198] (0/2) Epoch 22, batch 4450, loss[loss=0.2299, ctc_loss=0.1545, cr_loss=0.3771, over 19918.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.159, cr_loss=0.38, over 4087138.47 frames. ], batch size: 44, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:39:04,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=392864.3333333333, ans=0.125 2024-09-16 04:39:27,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=392921.0, ans=0.125 2024-09-16 04:40:00,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=392977.6666666667, ans=0.0 2024-09-16 04:40:12,768 INFO [train.py:1198] (0/2) Epoch 22, batch 4500, loss[loss=0.2277, ctc_loss=0.155, cr_loss=0.3635, over 21084.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1583, cr_loss=0.379, over 4092474.17 frames. ], batch size: 59, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:40:29,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=393034.3333333333, ans=10.0 2024-09-16 04:40:41,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=393034.3333333333, ans=0.125 2024-09-16 04:40:52,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=393062.6666666667, ans=0.125 2024-09-16 04:41:05,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=15.0 2024-09-16 04:41:09,810 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.066e+02 2.202e+02 2.310e+02 7.366e+02, threshold=4.404e+02, percent-clipped=1.0 2024-09-16 04:41:31,055 INFO [train.py:1198] (0/2) Epoch 22, batch 4550, loss[loss=0.2176, ctc_loss=0.1466, cr_loss=0.3547, over 21060.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1574, cr_loss=0.3776, over 4094131.80 frames. ], batch size: 56, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:41:43,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393147.6666666667, ans=0.1 2024-09-16 04:42:46,292 INFO [train.py:1198] (0/2) Epoch 22, batch 4600, loss[loss=0.2353, ctc_loss=0.1593, cr_loss=0.3799, over 20654.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1576, cr_loss=0.3779, over 4094565.53 frames. ], batch size: 68, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:43:00,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=393317.6666666667, ans=0.0 2024-09-16 04:43:18,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=393346.0, ans=0.2 2024-09-16 04:43:18,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.49 vs. limit=22.5 2024-09-16 04:43:36,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=393374.3333333333, ans=0.125 2024-09-16 04:43:40,394 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.069e+02 2.229e+02 2.420e+02 3.166e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-16 04:43:43,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=393374.3333333333, ans=0.125 2024-09-16 04:43:48,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=393402.6666666667, ans=0.05 2024-09-16 04:43:52,897 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 04:43:54,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=393402.6666666667, ans=0.0 2024-09-16 04:44:01,857 INFO [train.py:1198] (0/2) Epoch 22, batch 4650, loss[loss=0.2068, ctc_loss=0.1368, cr_loss=0.3501, over 20422.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1574, cr_loss=0.3783, over 4101009.41 frames. ], batch size: 45, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:44:02,893 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=8.0 2024-09-16 04:44:20,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.87 vs. limit=10.0 2024-09-16 04:44:28,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=393459.3333333333, ans=0.125 2024-09-16 04:45:20,458 INFO [train.py:1198] (0/2) Epoch 22, batch 4700, loss[loss=0.2464, ctc_loss=0.1666, cr_loss=0.399, over 20852.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1571, cr_loss=0.3782, over 4098337.09 frames. ], batch size: 59, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:45:41,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=393601.0, ans=0.2 2024-09-16 04:45:46,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=393601.0, ans=0.025 2024-09-16 04:45:53,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=393629.3333333333, ans=0.0 2024-09-16 04:46:14,726 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.669e+02 2.113e+02 2.231e+02 2.424e+02 3.101e+02, threshold=4.462e+02, percent-clipped=0.0 2024-09-16 04:46:38,926 INFO [train.py:1198] (0/2) Epoch 22, batch 4750, loss[loss=0.2653, ctc_loss=0.1866, cr_loss=0.3937, over 13944.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1569, cr_loss=0.3774, over 4093913.39 frames. ], batch size: 149, lr: 3.77e-03, grad_scale: 32.0 2024-09-16 04:46:46,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=393714.3333333333, ans=0.0 2024-09-16 04:46:51,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=393714.3333333333, ans=0.125 2024-09-16 04:46:58,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=393742.6666666667, ans=0.2 2024-09-16 04:47:21,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=393771.0, ans=0.125 2024-09-16 04:47:54,971 INFO [train.py:1198] (0/2) Epoch 22, batch 4800, loss[loss=0.2504, ctc_loss=0.1686, cr_loss=0.4094, over 19518.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.156, cr_loss=0.3762, over 4103128.60 frames. ], batch size: 90, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 04:48:49,130 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.072e+02 2.229e+02 2.410e+02 4.609e+02, threshold=4.458e+02, percent-clipped=1.0 2024-09-16 04:49:10,480 INFO [train.py:1198] (0/2) Epoch 22, batch 4850, loss[loss=0.2573, ctc_loss=0.1737, cr_loss=0.4182, over 20967.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1565, cr_loss=0.377, over 4100423.63 frames. ], batch size: 58, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 04:49:25,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=394026.0, ans=0.125 2024-09-16 04:49:36,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=394026.0, ans=0.0 2024-09-16 04:49:51,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=394054.3333333333, ans=0.0 2024-09-16 04:49:52,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=394054.3333333333, ans=0.125 2024-09-16 04:50:00,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=394082.6666666667, ans=15.0 2024-09-16 04:50:28,136 INFO [train.py:1198] (0/2) Epoch 22, batch 4900, loss[loss=0.2434, ctc_loss=0.1626, cr_loss=0.4041, over 20698.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1569, cr_loss=0.3778, over 4096993.83 frames. ], batch size: 71, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:50:46,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=394167.6666666667, ans=0.0 2024-09-16 04:51:23,768 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.117e+02 2.269e+02 2.499e+02 3.241e+02, threshold=4.537e+02, percent-clipped=0.0 2024-09-16 04:51:43,333 INFO [train.py:1198] (0/2) Epoch 22, batch 4950, loss[loss=0.2057, ctc_loss=0.1363, cr_loss=0.347, over 21057.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1567, cr_loss=0.3769, over 4106956.96 frames. ], batch size: 56, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:51:46,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=22.5 2024-09-16 04:52:07,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=394309.3333333333, ans=0.0 2024-09-16 04:52:12,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394337.6666666667, ans=0.1 2024-09-16 04:52:57,344 INFO [train.py:1198] (0/2) Epoch 22, batch 5000, loss[loss=0.1975, ctc_loss=0.1299, cr_loss=0.3378, over 20981.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1573, cr_loss=0.3787, over 4105805.01 frames. ], batch size: 51, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:53:08,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-09-16 04:53:31,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=394479.3333333333, ans=0.1 2024-09-16 04:53:35,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.28 vs. limit=12.0 2024-09-16 04:53:55,274 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.090e+02 2.194e+02 2.309e+02 3.699e+02, threshold=4.388e+02, percent-clipped=0.0 2024-09-16 04:54:12,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=394536.0, ans=0.025 2024-09-16 04:54:13,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394564.3333333333, ans=0.1 2024-09-16 04:54:14,747 INFO [train.py:1198] (0/2) Epoch 22, batch 5050, loss[loss=0.2235, ctc_loss=0.1489, cr_loss=0.3732, over 20819.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1576, cr_loss=0.3796, over 4108773.00 frames. ], batch size: 59, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:54:15,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=394564.3333333333, ans=0.025 2024-09-16 04:54:15,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394564.3333333333, ans=0.1 2024-09-16 04:54:16,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=394564.3333333333, ans=10.0 2024-09-16 04:54:46,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=394621.0, ans=0.0 2024-09-16 04:54:46,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.77 vs. limit=6.0 2024-09-16 04:54:49,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-16 04:54:52,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=394621.0, ans=0.125 2024-09-16 04:54:59,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=394649.3333333333, ans=0.125 2024-09-16 04:55:29,223 INFO [train.py:1198] (0/2) Epoch 22, batch 5100, loss[loss=0.2426, ctc_loss=0.1616, cr_loss=0.4052, over 20970.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1578, cr_loss=0.3794, over 4094618.20 frames. ], batch size: 58, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:55:39,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=394706.0, ans=0.0 2024-09-16 04:55:59,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=394762.6666666667, ans=0.125 2024-09-16 04:56:09,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=394762.6666666667, ans=0.0 2024-09-16 04:56:15,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=394791.0, ans=0.125 2024-09-16 04:56:24,315 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.134e+02 2.275e+02 2.438e+02 4.042e+02, threshold=4.549e+02, percent-clipped=0.0 2024-09-16 04:56:30,613 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 04:56:43,706 INFO [train.py:1198] (0/2) Epoch 22, batch 5150, loss[loss=0.2597, ctc_loss=0.1771, cr_loss=0.4126, over 20683.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1578, cr_loss=0.3788, over 4098757.84 frames. ], batch size: 66, lr: 3.76e-03, grad_scale: 16.0 2024-09-16 04:56:54,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=394847.6666666667, ans=0.125 2024-09-16 04:57:13,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=394904.3333333333, ans=0.125 2024-09-16 04:57:22,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2024-09-16 04:57:57,831 INFO [train.py:1198] (0/2) Epoch 22, batch 5200, loss[loss=0.2253, ctc_loss=0.1478, cr_loss=0.3874, over 20776.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1575, cr_loss=0.3785, over 4104629.83 frames. ], batch size: 53, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 04:57:58,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=22.5 2024-09-16 04:58:07,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=394989.3333333333, ans=0.125 2024-09-16 04:58:10,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=394989.3333333333, ans=0.09899494936611666 2024-09-16 04:58:53,077 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.099e+02 2.191e+02 2.359e+02 3.179e+02, threshold=4.381e+02, percent-clipped=0.0 2024-09-16 04:59:11,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=395131.0, ans=0.2 2024-09-16 04:59:12,571 INFO [train.py:1198] (0/2) Epoch 22, batch 5250, loss[loss=0.249, ctc_loss=0.1704, cr_loss=0.3935, over 20969.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1576, cr_loss=0.3786, over 4085100.36 frames. ], batch size: 58, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 04:59:25,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2024-09-16 05:00:29,389 INFO [train.py:1198] (0/2) Epoch 22, batch 5300, loss[loss=0.2571, ctc_loss=0.1762, cr_loss=0.4048, over 20999.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1577, cr_loss=0.3797, over 4089289.32 frames. ], batch size: 61, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 05:00:35,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=395272.6666666667, ans=0.0 2024-09-16 05:00:43,611 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-16 05:00:49,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=395301.0, ans=0.07 2024-09-16 05:00:59,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-16 05:01:06,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=395329.3333333333, ans=0.125 2024-09-16 05:01:12,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=395357.6666666667, ans=0.1 2024-09-16 05:01:23,928 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.048e+02 2.194e+02 2.320e+02 3.593e+02, threshold=4.387e+02, percent-clipped=0.0 2024-09-16 05:01:43,190 INFO [train.py:1198] (0/2) Epoch 22, batch 5350, loss[loss=0.2173, ctc_loss=0.1449, cr_loss=0.3621, over 21070.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1581, cr_loss=0.3799, over 4095569.29 frames. ], batch size: 53, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 05:01:46,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=395414.3333333333, ans=0.125 2024-09-16 05:01:54,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-09-16 05:02:33,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=395499.3333333333, ans=0.0 2024-09-16 05:02:39,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395499.3333333333, ans=0.1 2024-09-16 05:02:41,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=395527.6666666667, ans=0.125 2024-09-16 05:02:56,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=395556.0, ans=0.0 2024-09-16 05:02:57,269 INFO [train.py:1198] (0/2) Epoch 22, batch 5400, loss[loss=0.2123, ctc_loss=0.1402, cr_loss=0.3601, over 20951.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1582, cr_loss=0.3799, over 4102771.36 frames. ], batch size: 49, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 05:03:11,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=395584.3333333333, ans=0.125 2024-09-16 05:03:32,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=395612.6666666667, ans=0.0 2024-09-16 05:03:33,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=395612.6666666667, ans=0.035 2024-09-16 05:03:35,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2024-09-16 05:03:41,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=395612.6666666667, ans=0.0 2024-09-16 05:03:50,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=395641.0, ans=0.2 2024-09-16 05:03:54,462 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.069e+02 2.240e+02 2.415e+02 3.523e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-16 05:04:13,814 INFO [train.py:1198] (0/2) Epoch 22, batch 5450, loss[loss=0.2069, ctc_loss=0.1398, cr_loss=0.3359, over 20798.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1582, cr_loss=0.379, over 4090146.74 frames. ], batch size: 53, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 05:04:24,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395697.6666666667, ans=0.1 2024-09-16 05:04:36,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=395726.0, ans=0.5 2024-09-16 05:05:28,128 INFO [train.py:1198] (0/2) Epoch 22, batch 5500, loss[loss=0.2722, ctc_loss=0.1943, cr_loss=0.3894, over 13959.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1577, cr_loss=0.3784, over 4091421.07 frames. ], batch size: 150, lr: 3.76e-03, grad_scale: 32.0 2024-09-16 05:05:56,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=395896.0, ans=0.2 2024-09-16 05:06:23,294 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.061e+02 2.214e+02 2.357e+02 4.725e+02, threshold=4.428e+02, percent-clipped=1.0 2024-09-16 05:06:28,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=12.0 2024-09-16 05:06:38,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=395952.6666666667, ans=0.0 2024-09-16 05:06:39,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=395952.6666666667, ans=0.035 2024-09-16 05:06:42,624 INFO [train.py:1198] (0/2) Epoch 22, batch 5550, loss[loss=0.1909, ctc_loss=0.1256, cr_loss=0.3267, over 20942.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1575, cr_loss=0.3788, over 4108595.55 frames. ], batch size: 50, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:07:17,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=396037.6666666667, ans=0.0 2024-09-16 05:07:29,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=396066.0, ans=0.125 2024-09-16 05:07:56,463 INFO [train.py:1198] (0/2) Epoch 22, batch 5600, loss[loss=0.2399, ctc_loss=0.1618, cr_loss=0.3907, over 20661.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1569, cr_loss=0.3783, over 4118452.25 frames. ], batch size: 66, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:08:01,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=396122.6666666667, ans=0.2 2024-09-16 05:08:43,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396207.6666666667, ans=0.1 2024-09-16 05:08:51,038 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.075e+02 2.188e+02 2.369e+02 3.808e+02, threshold=4.376e+02, percent-clipped=0.0 2024-09-16 05:09:10,506 INFO [train.py:1198] (0/2) Epoch 22, batch 5650, loss[loss=0.2182, ctc_loss=0.1464, cr_loss=0.359, over 20943.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1576, cr_loss=0.3787, over 4109736.46 frames. ], batch size: 49, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:09:12,518 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2024-09-16 05:09:27,261 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-16 05:09:45,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=396321.0, ans=0.0 2024-09-16 05:10:18,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=396377.6666666667, ans=0.07 2024-09-16 05:10:26,697 INFO [train.py:1198] (0/2) Epoch 22, batch 5700, loss[loss=0.2048, ctc_loss=0.1341, cr_loss=0.3533, over 20961.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1581, cr_loss=0.3796, over 4097901.83 frames. ], batch size: 51, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:11:21,706 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.065e+02 2.212e+02 2.371e+02 4.969e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-16 05:11:40,879 INFO [train.py:1198] (0/2) Epoch 22, batch 5750, loss[loss=0.2976, ctc_loss=0.2069, cr_loss=0.4535, over 20648.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1589, cr_loss=0.3807, over 4102970.40 frames. ], batch size: 68, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:11:49,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=396547.6666666667, ans=0.125 2024-09-16 05:12:28,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=396632.6666666667, ans=0.125 2024-09-16 05:12:36,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=396632.6666666667, ans=0.125 2024-09-16 05:12:38,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=396632.6666666667, ans=0.0 2024-09-16 05:12:43,816 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-140000.pt 2024-09-16 05:12:44,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=396661.0, ans=0.125 2024-09-16 05:12:54,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2024-09-16 05:12:58,150 INFO [train.py:1198] (0/2) Epoch 22, batch 5800, loss[loss=0.2687, ctc_loss=0.1843, cr_loss=0.4219, over 20936.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1585, cr_loss=0.3798, over 4098369.28 frames. ], batch size: 64, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:13:02,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=396689.3333333333, ans=0.0 2024-09-16 05:13:18,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=396717.6666666667, ans=0.0 2024-09-16 05:13:39,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=396746.0, ans=0.125 2024-09-16 05:13:53,847 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.119e+02 2.223e+02 2.388e+02 4.323e+02, threshold=4.445e+02, percent-clipped=1.0 2024-09-16 05:13:55,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=396774.3333333333, ans=0.2 2024-09-16 05:13:57,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=396802.6666666667, ans=0.0 2024-09-16 05:14:13,396 INFO [train.py:1198] (0/2) Epoch 22, batch 5850, loss[loss=0.2565, ctc_loss=0.1764, cr_loss=0.4007, over 20928.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1584, cr_loss=0.3791, over 4098054.27 frames. ], batch size: 60, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:14:18,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=396831.0, ans=0.0 2024-09-16 05:14:28,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=396859.3333333333, ans=0.125 2024-09-16 05:15:04,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=396916.0, ans=0.125 2024-09-16 05:15:28,112 INFO [train.py:1198] (0/2) Epoch 22, batch 5900, loss[loss=0.2672, ctc_loss=0.1863, cr_loss=0.4045, over 21017.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1584, cr_loss=0.3793, over 4106840.79 frames. ], batch size: 63, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:15:36,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.67 vs. limit=12.0 2024-09-16 05:16:08,797 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 05:16:13,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=397057.6666666667, ans=0.015 2024-09-16 05:16:23,260 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.101e+02 2.202e+02 2.308e+02 3.334e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-16 05:16:36,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=397086.0, ans=0.0 2024-09-16 05:16:42,600 INFO [train.py:1198] (0/2) Epoch 22, batch 5950, loss[loss=0.2682, ctc_loss=0.1886, cr_loss=0.3982, over 14559.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1583, cr_loss=0.3795, over 4103159.34 frames. ], batch size: 149, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:16:50,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-09-16 05:17:02,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=397142.6666666667, ans=0.07 2024-09-16 05:17:21,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=397171.0, ans=0.125 2024-09-16 05:17:24,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=397171.0, ans=0.125 2024-09-16 05:17:25,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-09-16 05:17:57,330 INFO [train.py:1198] (0/2) Epoch 22, batch 6000, loss[loss=0.2641, ctc_loss=0.1801, cr_loss=0.4198, over 20985.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1578, cr_loss=0.3783, over 4087864.95 frames. ], batch size: 64, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:17:57,331 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 05:18:19,330 INFO [train.py:1230] (0/2) Epoch 22, validation: loss=0.04273, ctc_loss=0.04273, cr_loss=1.154e-14, over 944034.00 frames. 2024-09-16 05:18:19,331 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 05:18:22,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=397256.0, ans=0.125 2024-09-16 05:18:22,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=397256.0, ans=0.0 2024-09-16 05:18:36,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.23 vs. limit=10.0 2024-09-16 05:18:40,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397284.3333333333, ans=0.1 2024-09-16 05:19:02,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=397341.0, ans=0.125 2024-09-16 05:19:02,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=15.0 2024-09-16 05:19:13,519 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.118e+02 2.278e+02 2.424e+02 4.575e+02, threshold=4.557e+02, percent-clipped=1.0 2024-09-16 05:19:20,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.44 vs. limit=6.0 2024-09-16 05:19:28,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-09-16 05:19:29,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-16 05:19:33,234 INFO [train.py:1198] (0/2) Epoch 22, batch 6050, loss[loss=0.2279, ctc_loss=0.1517, cr_loss=0.3808, over 20953.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1586, cr_loss=0.379, over 4082723.33 frames. ], batch size: 58, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:19:48,057 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2024-09-16 05:20:07,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=397454.3333333333, ans=0.1 2024-09-16 05:20:15,444 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 05:20:44,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=397511.0, ans=0.125 2024-09-16 05:20:47,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=397539.3333333333, ans=0.0 2024-09-16 05:20:47,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=397539.3333333333, ans=0.2 2024-09-16 05:20:49,180 INFO [train.py:1198] (0/2) Epoch 22, batch 6100, loss[loss=0.2444, ctc_loss=0.1631, cr_loss=0.4063, over 20865.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1588, cr_loss=0.3795, over 4078342.98 frames. ], batch size: 57, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:20:52,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=397539.3333333333, ans=0.125 2024-09-16 05:21:06,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=397567.6666666667, ans=0.125 2024-09-16 05:21:44,520 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.733e+02 2.115e+02 2.288e+02 2.485e+02 5.440e+02, threshold=4.576e+02, percent-clipped=1.0 2024-09-16 05:21:49,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=397652.6666666667, ans=0.125 2024-09-16 05:22:03,537 INFO [train.py:1198] (0/2) Epoch 22, batch 6150, loss[loss=0.2339, ctc_loss=0.1577, cr_loss=0.3809, over 21043.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1596, cr_loss=0.3802, over 4058543.31 frames. ], batch size: 62, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:22:06,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=397681.0, ans=0.125 2024-09-16 05:22:09,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=397681.0, ans=0.0 2024-09-16 05:22:27,657 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 05:22:30,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2024-09-16 05:22:39,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=397737.6666666667, ans=0.2 2024-09-16 05:22:49,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=397766.0, ans=0.025 2024-09-16 05:23:12,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=397794.3333333333, ans=0.125 2024-09-16 05:23:17,978 INFO [train.py:1198] (0/2) Epoch 22, batch 6200, loss[loss=0.1912, ctc_loss=0.1263, cr_loss=0.3245, over 20988.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1587, cr_loss=0.3795, over 4062104.89 frames. ], batch size: 52, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:24:12,410 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.126e+02 2.251e+02 2.431e+02 4.639e+02, threshold=4.502e+02, percent-clipped=1.0 2024-09-16 05:24:17,528 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.25 vs. limit=10.0 2024-09-16 05:24:18,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-09-16 05:24:28,865 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-09-16 05:24:31,327 INFO [train.py:1198] (0/2) Epoch 22, batch 6250, loss[loss=0.2107, ctc_loss=0.1399, cr_loss=0.3541, over 20785.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1601, cr_loss=0.3813, over 4053714.00 frames. ], batch size: 53, lr: 3.75e-03, grad_scale: 32.0 2024-09-16 05:24:31,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397964.3333333333, ans=0.1 2024-09-16 05:25:28,580 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.52 vs. limit=12.0 2024-09-16 05:25:44,904 INFO [train.py:1198] (0/2) Epoch 22, batch 6300, loss[loss=0.2515, ctc_loss=0.1755, cr_loss=0.38, over 20098.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1624, cr_loss=0.384, over 4018876.49 frames. ], batch size: 80, lr: 3.74e-03, grad_scale: 32.0 2024-09-16 05:26:06,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=15.0 2024-09-16 05:26:09,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=398134.3333333333, ans=0.125 2024-09-16 05:26:28,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=398191.0, ans=0.0 2024-09-16 05:26:39,201 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.118e+02 2.336e+02 2.606e+02 6.355e+02, threshold=4.673e+02, percent-clipped=1.0 2024-09-16 05:26:58,207 INFO [train.py:1198] (0/2) Epoch 22, batch 6350, loss[loss=0.2937, ctc_loss=0.2126, cr_loss=0.4053, over 13783.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1673, cr_loss=0.3872, over 3858167.02 frames. ], batch size: 149, lr: 3.74e-03, grad_scale: 32.0 2024-09-16 05:27:01,814 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.89 vs. limit=6.0 2024-09-16 05:27:04,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=398247.6666666667, ans=0.125 2024-09-16 05:27:22,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=398276.0, ans=0.125 2024-09-16 05:27:44,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398332.6666666667, ans=0.1 2024-09-16 05:27:56,740 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-22.pt 2024-09-16 05:28:45,917 INFO [train.py:1198] (0/2) Epoch 23, batch 0, loss[loss=0.1965, ctc_loss=0.1313, cr_loss=0.3261, over 20250.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1313, cr_loss=0.3261, over 20250.00 frames. ], batch size: 45, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:28:45,918 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 05:29:06,942 INFO [train.py:1230] (0/2) Epoch 23, validation: loss=0.04295, ctc_loss=0.04295, cr_loss=1.117e-14, over 944034.00 frames. 2024-09-16 05:29:06,943 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 05:29:52,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=398448.8333333333, ans=0.125 2024-09-16 05:30:07,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-16 05:30:16,558 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.227e+02 2.595e+02 2.855e+02 4.695e+02, threshold=5.191e+02, percent-clipped=1.0 2024-09-16 05:30:22,604 INFO [train.py:1198] (0/2) Epoch 23, batch 50, loss[loss=0.1768, ctc_loss=0.1178, cr_loss=0.2953, over 20999.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1559, cr_loss=0.3776, over 929321.67 frames. ], batch size: 52, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:30:30,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=398505.5, ans=0.04949747468305833 2024-09-16 05:30:39,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=398533.8333333333, ans=0.125 2024-09-16 05:30:48,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=398533.8333333333, ans=0.125 2024-09-16 05:30:51,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=398562.1666666667, ans=0.2 2024-09-16 05:30:59,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=398562.1666666667, ans=0.125 2024-09-16 05:31:00,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=398562.1666666667, ans=0.125 2024-09-16 05:31:11,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.65 vs. limit=6.0 2024-09-16 05:31:28,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=398618.8333333333, ans=0.125 2024-09-16 05:31:38,502 INFO [train.py:1198] (0/2) Epoch 23, batch 100, loss[loss=0.2474, ctc_loss=0.1701, cr_loss=0.3862, over 18223.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1606, cr_loss=0.3843, over 1619436.72 frames. ], batch size: 108, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:31:42,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2024-09-16 05:31:52,457 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-16 05:31:55,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=398675.5, ans=0.2 2024-09-16 05:31:56,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=398675.5, ans=0.125 2024-09-16 05:31:58,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=398675.5, ans=0.2 2024-09-16 05:32:47,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.069e+02 2.165e+02 2.353e+02 4.020e+02, threshold=4.329e+02, percent-clipped=0.0 2024-09-16 05:32:53,786 INFO [train.py:1198] (0/2) Epoch 23, batch 150, loss[loss=0.2164, ctc_loss=0.1451, cr_loss=0.3564, over 21057.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.16, cr_loss=0.3828, over 2172653.41 frames. ], batch size: 56, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:32:55,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=398788.8333333333, ans=0.0 2024-09-16 05:33:09,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=398817.1666666667, ans=0.125 2024-09-16 05:33:22,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=398817.1666666667, ans=0.2 2024-09-16 05:34:13,853 INFO [train.py:1198] (0/2) Epoch 23, batch 200, loss[loss=0.2757, ctc_loss=0.1899, cr_loss=0.4289, over 20661.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1587, cr_loss=0.3811, over 2598551.03 frames. ], batch size: 66, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:35:06,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=399015.5, ans=0.125 2024-09-16 05:35:15,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=399043.8333333333, ans=0.0 2024-09-16 05:35:26,222 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.121e+02 2.238e+02 2.415e+02 3.835e+02, threshold=4.476e+02, percent-clipped=0.0 2024-09-16 05:35:32,222 INFO [train.py:1198] (0/2) Epoch 23, batch 250, loss[loss=0.2189, ctc_loss=0.1486, cr_loss=0.3511, over 19943.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1585, cr_loss=0.3811, over 2933481.97 frames. ], batch size: 44, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:35:38,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=399072.1666666667, ans=0.2 2024-09-16 05:35:51,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.68 vs. limit=22.5 2024-09-16 05:36:10,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=399128.8333333333, ans=0.125 2024-09-16 05:36:28,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=399157.1666666667, ans=0.125 2024-09-16 05:36:30,638 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-09-16 05:36:34,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=399185.5, ans=0.125 2024-09-16 05:36:43,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=399185.5, ans=0.2 2024-09-16 05:36:47,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399213.8333333333, ans=0.1 2024-09-16 05:36:48,154 INFO [train.py:1198] (0/2) Epoch 23, batch 300, loss[loss=0.2404, ctc_loss=0.1656, cr_loss=0.374, over 21038.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1579, cr_loss=0.3802, over 3200689.77 frames. ], batch size: 62, lr: 3.66e-03, grad_scale: 32.0 2024-09-16 05:37:26,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.29 vs. limit=15.0 2024-09-16 05:37:58,393 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.116e+02 2.237e+02 2.381e+02 3.373e+02, threshold=4.474e+02, percent-clipped=0.0 2024-09-16 05:38:04,330 INFO [train.py:1198] (0/2) Epoch 23, batch 350, loss[loss=0.3074, ctc_loss=0.2218, cr_loss=0.4281, over 13960.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1581, cr_loss=0.3802, over 3389527.81 frames. ], batch size: 149, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:38:06,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=399355.5, ans=0.125 2024-09-16 05:38:16,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=399355.5, ans=0.2 2024-09-16 05:39:23,450 INFO [train.py:1198] (0/2) Epoch 23, batch 400, loss[loss=0.2055, ctc_loss=0.1391, cr_loss=0.3323, over 20870.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1581, cr_loss=0.3799, over 3540268.90 frames. ], batch size: 54, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:39:23,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399497.1666666667, ans=0.1 2024-09-16 05:39:43,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=399525.5, ans=0.125 2024-09-16 05:40:05,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=399553.8333333333, ans=0.0 2024-09-16 05:40:31,964 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.137e+02 2.279e+02 2.469e+02 3.568e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-16 05:40:41,294 INFO [train.py:1198] (0/2) Epoch 23, batch 450, loss[loss=0.246, ctc_loss=0.1655, cr_loss=0.4026, over 21044.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.158, cr_loss=0.3797, over 3666622.67 frames. ], batch size: 56, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:41:28,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=399723.8333333333, ans=0.025 2024-09-16 05:41:44,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2024-09-16 05:41:48,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=399752.1666666667, ans=0.125 2024-09-16 05:41:48,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=399752.1666666667, ans=0.0 2024-09-16 05:41:57,299 INFO [train.py:1198] (0/2) Epoch 23, batch 500, loss[loss=0.2042, ctc_loss=0.1349, cr_loss=0.3462, over 20927.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1578, cr_loss=0.3793, over 3766750.69 frames. ], batch size: 54, lr: 3.65e-03, grad_scale: 64.0 2024-09-16 05:41:59,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=399780.5, ans=0.125 2024-09-16 05:42:41,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=399865.5, ans=0.125 2024-09-16 05:43:03,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-16 05:43:06,088 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.053e+02 2.177e+02 2.332e+02 4.710e+02, threshold=4.355e+02, percent-clipped=1.0 2024-09-16 05:43:12,208 INFO [train.py:1198] (0/2) Epoch 23, batch 550, loss[loss=0.1972, ctc_loss=0.1321, cr_loss=0.3258, over 20970.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1574, cr_loss=0.3786, over 3835929.31 frames. ], batch size: 49, lr: 3.65e-03, grad_scale: 64.0 2024-09-16 05:43:18,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=399922.1666666667, ans=0.025 2024-09-16 05:43:23,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=399922.1666666667, ans=0.2 2024-09-16 05:43:26,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=399950.5, ans=0.125 2024-09-16 05:43:42,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=399978.8333333333, ans=0.125 2024-09-16 05:44:14,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400035.5, ans=0.1 2024-09-16 05:44:18,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=400035.5, ans=0.125 2024-09-16 05:44:25,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=400063.8333333333, ans=0.025 2024-09-16 05:44:26,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=400063.8333333333, ans=0.2 2024-09-16 05:44:27,287 INFO [train.py:1198] (0/2) Epoch 23, batch 600, loss[loss=0.2059, ctc_loss=0.137, cr_loss=0.3444, over 20995.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1577, cr_loss=0.3789, over 3895022.61 frames. ], batch size: 52, lr: 3.65e-03, grad_scale: 64.0 2024-09-16 05:45:11,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400120.5, ans=0.1 2024-09-16 05:45:24,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=400148.8333333333, ans=0.0 2024-09-16 05:45:29,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-09-16 05:45:40,769 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.087e+02 2.248e+02 2.403e+02 3.401e+02, threshold=4.496e+02, percent-clipped=0.0 2024-09-16 05:45:45,428 INFO [train.py:1198] (0/2) Epoch 23, batch 650, loss[loss=0.2651, ctc_loss=0.1777, cr_loss=0.4366, over 20701.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1566, cr_loss=0.3776, over 3949939.16 frames. ], batch size: 66, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:45:45,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-09-16 05:45:59,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=400233.8333333333, ans=10.0 2024-09-16 05:47:04,635 INFO [train.py:1198] (0/2) Epoch 23, batch 700, loss[loss=0.2228, ctc_loss=0.1499, cr_loss=0.3642, over 21032.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.156, cr_loss=0.3768, over 3993932.40 frames. ], batch size: 62, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:47:14,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.29 vs. limit=10.0 2024-09-16 05:47:15,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=400347.1666666667, ans=0.125 2024-09-16 05:47:39,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=400403.8333333333, ans=0.025 2024-09-16 05:47:45,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=400403.8333333333, ans=0.0 2024-09-16 05:48:15,936 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.106e+02 2.327e+02 2.598e+02 4.216e+02, threshold=4.654e+02, percent-clipped=0.0 2024-09-16 05:48:20,512 INFO [train.py:1198] (0/2) Epoch 23, batch 750, loss[loss=0.2539, ctc_loss=0.1692, cr_loss=0.4236, over 19998.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1564, cr_loss=0.377, over 4007731.81 frames. ], batch size: 80, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:48:37,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=22.5 2024-09-16 05:48:49,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=400545.5, ans=0.125 2024-09-16 05:49:07,746 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2024-09-16 05:49:25,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=400602.1666666667, ans=0.125 2024-09-16 05:49:36,141 INFO [train.py:1198] (0/2) Epoch 23, batch 800, loss[loss=0.2469, ctc_loss=0.1691, cr_loss=0.3893, over 21015.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1569, cr_loss=0.3772, over 4005368.67 frames. ], batch size: 61, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:49:39,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=400630.5, ans=0.1 2024-09-16 05:50:05,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=400687.1666666667, ans=0.07 2024-09-16 05:50:50,140 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.716e+02 2.111e+02 2.250e+02 2.471e+02 5.388e+02, threshold=4.500e+02, percent-clipped=1.0 2024-09-16 05:50:54,370 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=12.0 2024-09-16 05:50:54,913 INFO [train.py:1198] (0/2) Epoch 23, batch 850, loss[loss=0.2039, ctc_loss=0.1358, cr_loss=0.3408, over 20975.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1572, cr_loss=0.378, over 4023094.38 frames. ], batch size: 51, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:50:56,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=400772.1666666667, ans=0.125 2024-09-16 05:51:17,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=400800.5, ans=0.95 2024-09-16 05:51:19,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=12.0 2024-09-16 05:51:26,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400828.8333333333, ans=0.1 2024-09-16 05:51:35,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-16 05:51:41,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=400857.1666666667, ans=0.0 2024-09-16 05:51:47,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=400857.1666666667, ans=0.1 2024-09-16 05:52:13,023 INFO [train.py:1198] (0/2) Epoch 23, batch 900, loss[loss=0.296, ctc_loss=0.2133, cr_loss=0.4138, over 14381.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1573, cr_loss=0.3779, over 4023866.93 frames. ], batch size: 149, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:52:26,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=400942.1666666667, ans=0.0 2024-09-16 05:52:58,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=400998.8333333333, ans=0.0 2024-09-16 05:53:23,695 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.095e+02 2.213e+02 2.436e+02 3.425e+02, threshold=4.425e+02, percent-clipped=0.0 2024-09-16 05:53:28,452 INFO [train.py:1198] (0/2) Epoch 23, batch 950, loss[loss=0.2362, ctc_loss=0.1595, cr_loss=0.3834, over 21094.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1579, cr_loss=0.3789, over 4042613.48 frames. ], batch size: 59, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:53:40,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=401055.5, ans=0.125 2024-09-16 05:53:47,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=401083.8333333333, ans=0.125 2024-09-16 05:53:49,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.43 vs. limit=22.5 2024-09-16 05:53:52,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=401083.8333333333, ans=0.025 2024-09-16 05:54:43,318 INFO [train.py:1198] (0/2) Epoch 23, batch 1000, loss[loss=0.2181, ctc_loss=0.1462, cr_loss=0.3598, over 20978.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1588, cr_loss=0.3805, over 4045953.68 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:54:50,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-09-16 05:55:12,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=401253.8333333333, ans=0.125 2024-09-16 05:55:20,046 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-16 05:55:33,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401282.1666666667, ans=0.1 2024-09-16 05:55:53,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=12.0 2024-09-16 05:55:53,876 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.067e+02 2.185e+02 2.324e+02 4.884e+02, threshold=4.370e+02, percent-clipped=2.0 2024-09-16 05:55:58,442 INFO [train.py:1198] (0/2) Epoch 23, batch 1050, loss[loss=0.2402, ctc_loss=0.1608, cr_loss=0.397, over 20658.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1584, cr_loss=0.3806, over 4053201.15 frames. ], batch size: 66, lr: 3.65e-03, grad_scale: 32.0 2024-09-16 05:56:12,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=401367.1666666667, ans=0.1 2024-09-16 05:56:19,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=401367.1666666667, ans=0.0 2024-09-16 05:56:27,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=401367.1666666667, ans=0.0 2024-09-16 05:56:56,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401423.8333333333, ans=0.1 2024-09-16 05:57:17,087 INFO [train.py:1198] (0/2) Epoch 23, batch 1100, loss[loss=0.2233, ctc_loss=0.1495, cr_loss=0.3688, over 21085.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1584, cr_loss=0.3804, over 4062911.07 frames. ], batch size: 59, lr: 3.65e-03, grad_scale: 16.0 2024-09-16 05:57:20,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=401480.5, ans=0.2 2024-09-16 05:58:16,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=401565.5, ans=0.2 2024-09-16 05:58:19,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=401593.8333333333, ans=0.125 2024-09-16 05:58:30,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=401593.8333333333, ans=0.035 2024-09-16 05:58:30,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=401593.8333333333, ans=0.07 2024-09-16 05:58:32,943 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.065e+02 2.227e+02 2.358e+02 2.746e+02, threshold=4.454e+02, percent-clipped=0.0 2024-09-16 05:58:33,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=401593.8333333333, ans=0.125 2024-09-16 05:58:35,991 INFO [train.py:1198] (0/2) Epoch 23, batch 1150, loss[loss=0.2333, ctc_loss=0.1561, cr_loss=0.3858, over 20944.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1577, cr_loss=0.3792, over 4071834.97 frames. ], batch size: 60, lr: 3.64e-03, grad_scale: 16.0 2024-09-16 05:58:45,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=401622.1666666667, ans=0.125 2024-09-16 05:58:50,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=401650.5, ans=0.0 2024-09-16 05:59:05,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=401678.8333333333, ans=0.0 2024-09-16 05:59:06,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=401678.8333333333, ans=0.125 2024-09-16 05:59:31,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=401707.1666666667, ans=0.0 2024-09-16 05:59:51,565 INFO [train.py:1198] (0/2) Epoch 23, batch 1200, loss[loss=0.2777, ctc_loss=0.1903, cr_loss=0.4372, over 20658.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.159, cr_loss=0.3809, over 4076491.12 frames. ], batch size: 68, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:00:04,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=401792.1666666667, ans=0.125 2024-09-16 06:00:39,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=401848.8333333333, ans=0.125 2024-09-16 06:00:49,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=401877.1666666667, ans=0.125 2024-09-16 06:01:03,029 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.076e+02 2.193e+02 2.323e+02 8.340e+02, threshold=4.385e+02, percent-clipped=1.0 2024-09-16 06:01:06,097 INFO [train.py:1198] (0/2) Epoch 23, batch 1250, loss[loss=0.2771, ctc_loss=0.1932, cr_loss=0.4194, over 18150.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1593, cr_loss=0.3814, over 4082097.30 frames. ], batch size: 108, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:01:47,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=401962.1666666667, ans=0.125 2024-09-16 06:02:14,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=402018.8333333333, ans=0.0 2024-09-16 06:02:24,195 INFO [train.py:1198] (0/2) Epoch 23, batch 1300, loss[loss=0.2414, ctc_loss=0.1618, cr_loss=0.3982, over 21037.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.159, cr_loss=0.3808, over 4089222.08 frames. ], batch size: 63, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:02:27,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402047.1666666667, ans=0.1 2024-09-16 06:02:30,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=402047.1666666667, ans=0.125 2024-09-16 06:02:42,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=15.0 2024-09-16 06:02:56,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=402103.8333333333, ans=0.0 2024-09-16 06:03:40,194 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.101e+02 2.238e+02 2.376e+02 3.312e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-16 06:03:43,363 INFO [train.py:1198] (0/2) Epoch 23, batch 1350, loss[loss=0.2007, ctc_loss=0.1335, cr_loss=0.3362, over 21060.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1585, cr_loss=0.3793, over 4080974.90 frames. ], batch size: 56, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:04:59,057 INFO [train.py:1198] (0/2) Epoch 23, batch 1400, loss[loss=0.2111, ctc_loss=0.141, cr_loss=0.3501, over 21062.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1579, cr_loss=0.3778, over 4078569.54 frames. ], batch size: 53, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:04:59,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=402330.5, ans=0.0 2024-09-16 06:06:11,137 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.121e+02 2.285e+02 2.456e+02 3.388e+02, threshold=4.571e+02, percent-clipped=0.0 2024-09-16 06:06:14,083 INFO [train.py:1198] (0/2) Epoch 23, batch 1450, loss[loss=0.2619, ctc_loss=0.1835, cr_loss=0.3918, over 20696.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1577, cr_loss=0.3772, over 4083715.95 frames. ], batch size: 71, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:06:40,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-09-16 06:06:41,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=402500.5, ans=0.125 2024-09-16 06:07:29,952 INFO [train.py:1198] (0/2) Epoch 23, batch 1500, loss[loss=0.1872, ctc_loss=0.1244, cr_loss=0.3143, over 20967.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1572, cr_loss=0.3768, over 4090001.95 frames. ], batch size: 48, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:07:41,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=402613.8333333333, ans=0.0 2024-09-16 06:07:43,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2024-09-16 06:07:52,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=402642.1666666667, ans=0.025 2024-09-16 06:08:01,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=402642.1666666667, ans=0.0 2024-09-16 06:08:10,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=402670.5, ans=0.125 2024-09-16 06:08:21,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=22.5 2024-09-16 06:08:25,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=402698.8333333333, ans=0.125 2024-09-16 06:08:41,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=402727.1666666667, ans=0.0 2024-09-16 06:08:47,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.099e+02 2.250e+02 2.440e+02 4.560e+02, threshold=4.499e+02, percent-clipped=0.0 2024-09-16 06:08:50,187 INFO [train.py:1198] (0/2) Epoch 23, batch 1550, loss[loss=0.2544, ctc_loss=0.1748, cr_loss=0.3982, over 20974.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1569, cr_loss=0.377, over 4097267.30 frames. ], batch size: 58, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:09:02,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=402755.5, ans=0.0 2024-09-16 06:09:38,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=402840.5, ans=0.125 2024-09-16 06:10:02,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=402868.8333333333, ans=0.0 2024-09-16 06:10:09,717 INFO [train.py:1198] (0/2) Epoch 23, batch 1600, loss[loss=0.2396, ctc_loss=0.1606, cr_loss=0.3948, over 21041.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1569, cr_loss=0.3775, over 4107374.47 frames. ], batch size: 62, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:10:13,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=402897.1666666667, ans=0.0 2024-09-16 06:10:17,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-09-16 06:10:40,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=402953.8333333333, ans=0.0 2024-09-16 06:11:09,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=403010.5, ans=0.0 2024-09-16 06:11:13,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403010.5, ans=0.1 2024-09-16 06:11:18,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-16 06:11:21,936 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.113e+02 2.293e+02 2.561e+02 4.068e+02, threshold=4.585e+02, percent-clipped=0.0 2024-09-16 06:11:25,073 INFO [train.py:1198] (0/2) Epoch 23, batch 1650, loss[loss=0.2466, ctc_loss=0.1671, cr_loss=0.3976, over 20655.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.158, cr_loss=0.3794, over 4089775.93 frames. ], batch size: 68, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:11:29,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=403038.8333333333, ans=0.025 2024-09-16 06:12:06,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=403095.5, ans=0.2 2024-09-16 06:12:27,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=403152.1666666667, ans=10.0 2024-09-16 06:12:34,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-16 06:12:34,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=403152.1666666667, ans=0.0 2024-09-16 06:12:40,660 INFO [train.py:1198] (0/2) Epoch 23, batch 1700, loss[loss=0.1808, ctc_loss=0.1207, cr_loss=0.3005, over 21035.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.157, cr_loss=0.3781, over 4095790.13 frames. ], batch size: 56, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:12:53,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=403180.5, ans=0.125 2024-09-16 06:13:07,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=12.0 2024-09-16 06:13:43,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=403293.8333333333, ans=0.125 2024-09-16 06:13:46,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=403293.8333333333, ans=0.0 2024-09-16 06:13:56,317 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.040e+02 2.240e+02 2.452e+02 4.086e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-16 06:13:59,470 INFO [train.py:1198] (0/2) Epoch 23, batch 1750, loss[loss=0.2059, ctc_loss=0.1358, cr_loss=0.3502, over 20282.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1561, cr_loss=0.3767, over 4106563.42 frames. ], batch size: 45, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:14:05,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=403322.1666666667, ans=0.125 2024-09-16 06:14:17,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=403350.5, ans=0.2 2024-09-16 06:14:30,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2024-09-16 06:14:39,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=403378.8333333333, ans=0.125 2024-09-16 06:14:39,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=403378.8333333333, ans=0.125 2024-09-16 06:14:51,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.07 vs. limit=10.0 2024-09-16 06:14:57,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2024-09-16 06:15:03,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=403435.5, ans=0.0 2024-09-16 06:15:18,181 INFO [train.py:1198] (0/2) Epoch 23, batch 1800, loss[loss=0.2068, ctc_loss=0.1359, cr_loss=0.3546, over 20959.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1559, cr_loss=0.3764, over 4110688.00 frames. ], batch size: 49, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:15:47,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=403520.5, ans=0.125 2024-09-16 06:16:00,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=403520.5, ans=0.025 2024-09-16 06:16:02,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-16 06:16:12,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=403548.8333333333, ans=0.125 2024-09-16 06:16:24,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=403577.1666666667, ans=0.125 2024-09-16 06:16:28,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403577.1666666667, ans=0.1 2024-09-16 06:16:30,788 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.070e+02 2.202e+02 2.361e+02 2.764e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-16 06:16:33,898 INFO [train.py:1198] (0/2) Epoch 23, batch 1850, loss[loss=0.2494, ctc_loss=0.1687, cr_loss=0.4035, over 21056.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1562, cr_loss=0.3765, over 4104801.10 frames. ], batch size: 56, lr: 3.64e-03, grad_scale: 32.0 2024-09-16 06:17:40,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-09-16 06:17:48,773 INFO [train.py:1198] (0/2) Epoch 23, batch 1900, loss[loss=0.2616, ctc_loss=0.1774, cr_loss=0.421, over 20828.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1566, cr_loss=0.3768, over 4106737.74 frames. ], batch size: 65, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:17:57,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=403747.1666666667, ans=0.2 2024-09-16 06:18:10,326 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 06:18:10,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=403775.5, ans=0.0 2024-09-16 06:18:13,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=403775.5, ans=0.125 2024-09-16 06:18:19,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=403803.8333333333, ans=0.2 2024-09-16 06:18:56,288 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2024-09-16 06:19:01,249 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.067e+02 2.183e+02 2.353e+02 2.811e+02, threshold=4.366e+02, percent-clipped=0.0 2024-09-16 06:19:04,383 INFO [train.py:1198] (0/2) Epoch 23, batch 1950, loss[loss=0.215, ctc_loss=0.1428, cr_loss=0.3609, over 21047.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.156, cr_loss=0.377, over 4108844.48 frames. ], batch size: 62, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:19:27,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=12.0 2024-09-16 06:19:58,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=403973.8333333333, ans=0.125 2024-09-16 06:20:03,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=403973.8333333333, ans=0.125 2024-09-16 06:20:14,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=404002.1666666667, ans=0.125 2024-09-16 06:20:23,106 INFO [train.py:1198] (0/2) Epoch 23, batch 2000, loss[loss=0.2044, ctc_loss=0.1373, cr_loss=0.3355, over 20954.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.157, cr_loss=0.3787, over 4105615.66 frames. ], batch size: 49, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:20:45,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=404058.8333333333, ans=0.125 2024-09-16 06:20:57,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=404087.1666666667, ans=0.125 2024-09-16 06:21:37,731 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.132e+02 2.284e+02 2.501e+02 4.801e+02, threshold=4.568e+02, percent-clipped=2.0 2024-09-16 06:21:40,762 INFO [train.py:1198] (0/2) Epoch 23, batch 2050, loss[loss=0.253, ctc_loss=0.172, cr_loss=0.4051, over 21073.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1573, cr_loss=0.3787, over 4093787.95 frames. ], batch size: 59, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:22:13,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-09-16 06:22:56,183 INFO [train.py:1198] (0/2) Epoch 23, batch 2100, loss[loss=0.2363, ctc_loss=0.1586, cr_loss=0.3883, over 20783.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1561, cr_loss=0.3768, over 4106668.04 frames. ], batch size: 56, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:23:02,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=404313.8333333333, ans=0.125 2024-09-16 06:23:04,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=404313.8333333333, ans=0.0 2024-09-16 06:23:14,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=404342.1666666667, ans=0.05 2024-09-16 06:23:47,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=404398.8333333333, ans=0.125 2024-09-16 06:23:47,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=404398.8333333333, ans=0.0 2024-09-16 06:24:08,577 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.052e+02 2.203e+02 2.372e+02 2.715e+02, threshold=4.406e+02, percent-clipped=0.0 2024-09-16 06:24:11,601 INFO [train.py:1198] (0/2) Epoch 23, batch 2150, loss[loss=0.2596, ctc_loss=0.1782, cr_loss=0.4068, over 19432.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1558, cr_loss=0.3768, over 4116146.83 frames. ], batch size: 90, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:24:51,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=404512.1666666667, ans=0.07 2024-09-16 06:25:30,538 INFO [train.py:1198] (0/2) Epoch 23, batch 2200, loss[loss=0.2245, ctc_loss=0.1507, cr_loss=0.3689, over 20875.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1568, cr_loss=0.3783, over 4110876.75 frames. ], batch size: 57, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:25:44,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=404625.5, ans=0.0 2024-09-16 06:25:52,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404625.5, ans=0.1 2024-09-16 06:25:59,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=22.5 2024-09-16 06:26:05,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-09-16 06:26:23,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404682.1666666667, ans=0.1 2024-09-16 06:26:45,831 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.071e+02 2.210e+02 2.331e+02 3.694e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-16 06:26:48,918 INFO [train.py:1198] (0/2) Epoch 23, batch 2250, loss[loss=0.2333, ctc_loss=0.1573, cr_loss=0.3799, over 20802.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1567, cr_loss=0.378, over 4104801.30 frames. ], batch size: 53, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:27:12,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.40 vs. limit=15.0 2024-09-16 06:27:49,465 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 06:28:04,467 INFO [train.py:1198] (0/2) Epoch 23, batch 2300, loss[loss=0.2608, ctc_loss=0.1753, cr_loss=0.4272, over 21035.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1571, cr_loss=0.3786, over 4100140.52 frames. ], batch size: 62, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:28:39,061 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.39 vs. limit=15.0 2024-09-16 06:28:47,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=404937.1666666667, ans=0.125 2024-09-16 06:28:51,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=404965.5, ans=0.2 2024-09-16 06:28:55,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-09-16 06:29:03,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-16 06:29:17,623 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.057e+02 2.172e+02 2.331e+02 3.388e+02, threshold=4.345e+02, percent-clipped=0.0 2024-09-16 06:29:20,661 INFO [train.py:1198] (0/2) Epoch 23, batch 2350, loss[loss=0.2854, ctc_loss=0.1996, cr_loss=0.4291, over 13989.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1566, cr_loss=0.3781, over 4106035.92 frames. ], batch size: 149, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:30:15,517 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 06:30:30,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=405135.5, ans=0.125 2024-09-16 06:30:36,068 INFO [train.py:1198] (0/2) Epoch 23, batch 2400, loss[loss=0.2493, ctc_loss=0.1674, cr_loss=0.4096, over 20956.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1583, cr_loss=0.3806, over 4081564.30 frames. ], batch size: 58, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:31:01,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.86 vs. limit=15.0 2024-09-16 06:31:04,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=405192.1666666667, ans=0.1 2024-09-16 06:31:21,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=405220.5, ans=0.125 2024-09-16 06:31:26,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-16 06:31:51,078 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.111e+02 2.261e+02 2.435e+02 4.919e+02, threshold=4.522e+02, percent-clipped=1.0 2024-09-16 06:31:54,232 INFO [train.py:1198] (0/2) Epoch 23, batch 2450, loss[loss=0.2115, ctc_loss=0.142, cr_loss=0.3477, over 20977.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1576, cr_loss=0.3797, over 4094796.77 frames. ], batch size: 49, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:32:08,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=405333.8333333333, ans=0.09899494936611666 2024-09-16 06:32:14,911 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.26 vs. limit=22.5 2024-09-16 06:32:38,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-16 06:32:44,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.56 vs. limit=10.0 2024-09-16 06:33:02,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=405418.8333333333, ans=0.1 2024-09-16 06:33:12,703 INFO [train.py:1198] (0/2) Epoch 23, batch 2500, loss[loss=0.2704, ctc_loss=0.1828, cr_loss=0.4379, over 20690.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1579, cr_loss=0.3794, over 4087442.05 frames. ], batch size: 68, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:33:28,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=405475.5, ans=0.0 2024-09-16 06:33:38,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=405475.5, ans=0.125 2024-09-16 06:34:14,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=405560.5, ans=0.0 2024-09-16 06:34:15,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=405560.5, ans=0.035 2024-09-16 06:34:20,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=405560.5, ans=0.125 2024-09-16 06:34:25,899 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.075e+02 2.182e+02 2.343e+02 3.194e+02, threshold=4.364e+02, percent-clipped=0.0 2024-09-16 06:34:28,847 INFO [train.py:1198] (0/2) Epoch 23, batch 2550, loss[loss=0.2277, ctc_loss=0.1522, cr_loss=0.3774, over 20872.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1569, cr_loss=0.3775, over 4094860.20 frames. ], batch size: 57, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:34:33,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=405588.8333333333, ans=0.2 2024-09-16 06:35:06,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405645.5, ans=0.1 2024-09-16 06:35:09,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=405645.5, ans=0.0 2024-09-16 06:35:45,471 INFO [train.py:1198] (0/2) Epoch 23, batch 2600, loss[loss=0.2231, ctc_loss=0.1494, cr_loss=0.3685, over 20997.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1562, cr_loss=0.3775, over 4104550.82 frames. ], batch size: 63, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:36:02,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=405758.8333333333, ans=0.0 2024-09-16 06:36:14,365 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.466e-02 2024-09-16 06:36:56,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=405843.8333333333, ans=0.0 2024-09-16 06:37:00,994 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.069e+02 2.208e+02 2.353e+02 6.566e+02, threshold=4.416e+02, percent-clipped=1.0 2024-09-16 06:37:02,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=405872.1666666667, ans=0.025 2024-09-16 06:37:03,896 INFO [train.py:1198] (0/2) Epoch 23, batch 2650, loss[loss=0.2068, ctc_loss=0.1349, cr_loss=0.3597, over 20975.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1566, cr_loss=0.3782, over 4102782.82 frames. ], batch size: 48, lr: 3.63e-03, grad_scale: 32.0 2024-09-16 06:37:13,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=405872.1666666667, ans=0.125 2024-09-16 06:37:17,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405900.5, ans=0.1 2024-09-16 06:37:37,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=405928.8333333333, ans=0.125 2024-09-16 06:37:39,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=405928.8333333333, ans=0.2 2024-09-16 06:38:11,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=405985.5, ans=0.125 2024-09-16 06:38:15,765 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=15.0 2024-09-16 06:38:22,595 INFO [train.py:1198] (0/2) Epoch 23, batch 2700, loss[loss=0.2497, ctc_loss=0.1668, cr_loss=0.4145, over 20973.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1561, cr_loss=0.3773, over 4110550.62 frames. ], batch size: 58, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:38:31,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.32 vs. limit=12.0 2024-09-16 06:39:29,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=406127.1666666667, ans=0.125 2024-09-16 06:39:33,444 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-09-16 06:39:37,160 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.121e+02 2.268e+02 2.452e+02 3.905e+02, threshold=4.536e+02, percent-clipped=0.0 2024-09-16 06:39:38,760 INFO [train.py:1198] (0/2) Epoch 23, batch 2750, loss[loss=0.2431, ctc_loss=0.1603, cr_loss=0.4138, over 21023.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1574, cr_loss=0.3797, over 4104671.83 frames. ], batch size: 61, lr: 3.62e-03, grad_scale: 16.0 2024-09-16 06:39:49,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=406155.5, ans=0.0 2024-09-16 06:39:52,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-16 06:40:31,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=406240.5, ans=0.05 2024-09-16 06:40:52,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=406297.1666666667, ans=0.125 2024-09-16 06:40:53,927 INFO [train.py:1198] (0/2) Epoch 23, batch 2800, loss[loss=0.2362, ctc_loss=0.1593, cr_loss=0.3847, over 20873.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1584, cr_loss=0.3813, over 4099274.71 frames. ], batch size: 65, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:41:53,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=406410.5, ans=0.07 2024-09-16 06:42:06,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.38 vs. limit=15.0 2024-09-16 06:42:07,862 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.113e+02 2.299e+02 2.488e+02 7.893e+02, threshold=4.598e+02, percent-clipped=1.0 2024-09-16 06:42:09,418 INFO [train.py:1198] (0/2) Epoch 23, batch 2850, loss[loss=0.202, ctc_loss=0.132, cr_loss=0.3502, over 20921.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1576, cr_loss=0.3795, over 4089739.16 frames. ], batch size: 48, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:42:18,820 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 06:42:26,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2024-09-16 06:42:46,153 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=12.0 2024-09-16 06:43:16,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=12.0 2024-09-16 06:43:28,010 INFO [train.py:1198] (0/2) Epoch 23, batch 2900, loss[loss=0.2051, ctc_loss=0.1353, cr_loss=0.3491, over 20969.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1565, cr_loss=0.3785, over 4098312.07 frames. ], batch size: 48, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:43:43,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=406608.8333333333, ans=0.025 2024-09-16 06:44:03,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=406637.1666666667, ans=0.0 2024-09-16 06:44:28,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=406665.5, ans=0.125 2024-09-16 06:44:44,379 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.062e+02 2.219e+02 2.383e+02 3.445e+02, threshold=4.438e+02, percent-clipped=0.0 2024-09-16 06:44:45,966 INFO [train.py:1198] (0/2) Epoch 23, batch 2950, loss[loss=0.2158, ctc_loss=0.1444, cr_loss=0.357, over 21010.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1575, cr_loss=0.3796, over 4092048.47 frames. ], batch size: 52, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:45:25,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=406778.8333333333, ans=0.025 2024-09-16 06:45:33,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=406807.1666666667, ans=0.125 2024-09-16 06:45:39,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=406807.1666666667, ans=0.125 2024-09-16 06:45:48,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=406835.5, ans=0.025 2024-09-16 06:46:01,701 INFO [train.py:1198] (0/2) Epoch 23, batch 3000, loss[loss=0.2662, ctc_loss=0.1795, cr_loss=0.4336, over 20648.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1582, cr_loss=0.381, over 4084062.85 frames. ], batch size: 66, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:46:01,701 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 06:46:12,702 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.3254, 6.0181, 5.7361, 5.4818], device='cuda:0') 2024-09-16 06:46:24,117 INFO [train.py:1230] (0/2) Epoch 23, validation: loss=0.0429, ctc_loss=0.0429, cr_loss=1.141e-14, over 944034.00 frames. 2024-09-16 06:46:24,118 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 06:46:38,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=406892.1666666667, ans=0.125 2024-09-16 06:46:38,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=406892.1666666667, ans=0.125 2024-09-16 06:46:41,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=406892.1666666667, ans=0.0 2024-09-16 06:46:52,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=406892.1666666667, ans=0.2 2024-09-16 06:47:01,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=406920.5, ans=0.2 2024-09-16 06:47:39,362 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.070e+02 2.200e+02 2.343e+02 3.689e+02, threshold=4.399e+02, percent-clipped=0.0 2024-09-16 06:47:40,875 INFO [train.py:1198] (0/2) Epoch 23, batch 3050, loss[loss=0.1874, ctc_loss=0.1231, cr_loss=0.3218, over 19854.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1579, cr_loss=0.3799, over 4094313.49 frames. ], batch size: 44, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:47:42,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=407005.5, ans=0.125 2024-09-16 06:48:01,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=407033.8333333333, ans=0.2 2024-09-16 06:48:09,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=407033.8333333333, ans=0.125 2024-09-16 06:48:20,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=407062.1666666667, ans=0.125 2024-09-16 06:48:37,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=407090.5, ans=0.125 2024-09-16 06:48:44,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=407118.8333333333, ans=0.125 2024-09-16 06:48:59,742 INFO [train.py:1198] (0/2) Epoch 23, batch 3100, loss[loss=0.2395, ctc_loss=0.1629, cr_loss=0.3828, over 19454.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.158, cr_loss=0.3802, over 4100535.92 frames. ], batch size: 90, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:50:05,736 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=22.5 2024-09-16 06:50:06,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407260.5, ans=0.1 2024-09-16 06:50:12,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407260.5, ans=0.1 2024-09-16 06:50:14,776 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-16 06:50:17,062 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.128e+02 2.254e+02 2.369e+02 3.498e+02, threshold=4.508e+02, percent-clipped=0.0 2024-09-16 06:50:18,681 INFO [train.py:1198] (0/2) Epoch 23, batch 3150, loss[loss=0.2032, ctc_loss=0.1326, cr_loss=0.3532, over 20959.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1571, cr_loss=0.3789, over 4097610.06 frames. ], batch size: 49, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:51:06,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=407373.8333333333, ans=0.1 2024-09-16 06:51:27,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=407402.1666666667, ans=0.0 2024-09-16 06:51:34,789 INFO [train.py:1198] (0/2) Epoch 23, batch 3200, loss[loss=0.2562, ctc_loss=0.1784, cr_loss=0.3889, over 20886.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1577, cr_loss=0.3795, over 4081573.37 frames. ], batch size: 57, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:51:45,044 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=8.0 2024-09-16 06:51:48,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=407458.8333333333, ans=0.0 2024-09-16 06:51:59,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=407458.8333333333, ans=0.125 2024-09-16 06:52:02,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=22.5 2024-09-16 06:52:08,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-16 06:52:35,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=407543.8333333333, ans=0.0 2024-09-16 06:52:48,501 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.167e+02 2.292e+02 2.503e+02 3.773e+02, threshold=4.584e+02, percent-clipped=0.0 2024-09-16 06:52:50,102 INFO [train.py:1198] (0/2) Epoch 23, batch 3250, loss[loss=0.2529, ctc_loss=0.1697, cr_loss=0.4161, over 20729.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.158, cr_loss=0.3803, over 4092051.88 frames. ], batch size: 71, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:53:00,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=407572.1666666667, ans=0.125 2024-09-16 06:53:25,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=407628.8333333333, ans=0.125 2024-09-16 06:53:46,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=407657.1666666667, ans=0.025 2024-09-16 06:53:54,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=407685.5, ans=0.125 2024-09-16 06:53:56,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=22.5 2024-09-16 06:54:08,822 INFO [train.py:1198] (0/2) Epoch 23, batch 3300, loss[loss=0.2398, ctc_loss=0.1634, cr_loss=0.3818, over 21036.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1574, cr_loss=0.3794, over 4102510.89 frames. ], batch size: 62, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:55:26,799 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.045e+02 2.180e+02 2.349e+02 7.274e+02, threshold=4.361e+02, percent-clipped=1.0 2024-09-16 06:55:28,283 INFO [train.py:1198] (0/2) Epoch 23, batch 3350, loss[loss=0.1976, ctc_loss=0.1286, cr_loss=0.3448, over 19836.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1569, cr_loss=0.3785, over 4104730.42 frames. ], batch size: 44, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:56:26,062 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 06:56:26,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=407940.5, ans=0.2 2024-09-16 06:56:44,195 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-144000.pt 2024-09-16 06:56:45,076 INFO [train.py:1198] (0/2) Epoch 23, batch 3400, loss[loss=0.2632, ctc_loss=0.1793, cr_loss=0.4198, over 20953.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1576, cr_loss=0.3792, over 4102862.74 frames. ], batch size: 64, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:56:48,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=407997.1666666667, ans=0.025 2024-09-16 06:57:48,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408110.5, ans=0.1 2024-09-16 06:57:51,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408110.5, ans=0.1 2024-09-16 06:57:59,000 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.130e+02 2.286e+02 2.441e+02 4.492e+02, threshold=4.572e+02, percent-clipped=1.0 2024-09-16 06:58:00,485 INFO [train.py:1198] (0/2) Epoch 23, batch 3450, loss[loss=0.2735, ctc_loss=0.189, cr_loss=0.4227, over 20853.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1574, cr_loss=0.3795, over 4101595.41 frames. ], batch size: 65, lr: 3.62e-03, grad_scale: 32.0 2024-09-16 06:58:11,925 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2024-09-16 06:58:16,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=408167.1666666667, ans=0.2 2024-09-16 06:58:19,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=408167.1666666667, ans=0.125 2024-09-16 06:58:34,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408195.5, ans=0.1 2024-09-16 06:58:37,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=408195.5, ans=0.2 2024-09-16 06:58:54,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=408223.8333333333, ans=0.125 2024-09-16 06:59:06,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=408252.1666666667, ans=0.125 2024-09-16 06:59:16,282 INFO [train.py:1198] (0/2) Epoch 23, batch 3500, loss[loss=0.1899, ctc_loss=0.1243, cr_loss=0.3281, over 20959.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1568, cr_loss=0.3785, over 4101902.01 frames. ], batch size: 48, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 06:59:21,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.26 vs. limit=10.0 2024-09-16 07:00:33,382 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.097e+02 2.202e+02 2.396e+02 4.312e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-16 07:00:34,878 INFO [train.py:1198] (0/2) Epoch 23, batch 3550, loss[loss=0.2203, ctc_loss=0.1468, cr_loss=0.3672, over 20867.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1571, cr_loss=0.3787, over 4096908.87 frames. ], batch size: 57, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:00:39,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=408422.1666666667, ans=0.0 2024-09-16 07:00:53,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=408450.5, ans=0.125 2024-09-16 07:01:03,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=408450.5, ans=0.125 2024-09-16 07:01:11,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=408478.8333333333, ans=0.125 2024-09-16 07:01:15,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=408478.8333333333, ans=0.0 2024-09-16 07:01:17,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=408478.8333333333, ans=0.125 2024-09-16 07:01:27,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=408507.1666666667, ans=0.2 2024-09-16 07:01:29,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-16 07:01:53,432 INFO [train.py:1198] (0/2) Epoch 23, batch 3600, loss[loss=0.2382, ctc_loss=0.1618, cr_loss=0.3819, over 20997.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1569, cr_loss=0.3777, over 4086373.03 frames. ], batch size: 67, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:01:55,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2024-09-16 07:01:58,402 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:02:19,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=408592.1666666667, ans=0.07 2024-09-16 07:02:42,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=408648.8333333333, ans=0.125 2024-09-16 07:03:07,728 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.045e+02 2.183e+02 2.379e+02 3.734e+02, threshold=4.365e+02, percent-clipped=0.0 2024-09-16 07:03:09,146 INFO [train.py:1198] (0/2) Epoch 23, batch 3650, loss[loss=0.2371, ctc_loss=0.1612, cr_loss=0.3796, over 20682.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1567, cr_loss=0.3775, over 4094764.20 frames. ], batch size: 66, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:03:30,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=408733.8333333333, ans=0.2 2024-09-16 07:03:51,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=408762.1666666667, ans=0.0 2024-09-16 07:04:06,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-09-16 07:04:07,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=408790.5, ans=0.2 2024-09-16 07:04:24,716 INFO [train.py:1198] (0/2) Epoch 23, batch 3700, loss[loss=0.2073, ctc_loss=0.1409, cr_loss=0.3319, over 20946.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1568, cr_loss=0.3782, over 4098650.44 frames. ], batch size: 50, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:05:05,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=408903.8333333333, ans=0.2 2024-09-16 07:05:07,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=408903.8333333333, ans=0.2 2024-09-16 07:05:28,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=408960.5, ans=0.2 2024-09-16 07:05:30,394 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2024-09-16 07:05:41,004 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.125e+02 2.301e+02 2.549e+02 3.917e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-16 07:05:42,627 INFO [train.py:1198] (0/2) Epoch 23, batch 3750, loss[loss=0.238, ctc_loss=0.1594, cr_loss=0.393, over 20783.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1573, cr_loss=0.3792, over 4092656.75 frames. ], batch size: 53, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:05:50,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=408988.8333333333, ans=0.125 2024-09-16 07:05:59,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=409017.1666666667, ans=0.1 2024-09-16 07:06:08,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=409017.1666666667, ans=0.95 2024-09-16 07:06:24,263 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.71 vs. limit=10.0 2024-09-16 07:07:00,786 INFO [train.py:1198] (0/2) Epoch 23, batch 3800, loss[loss=0.2205, ctc_loss=0.1462, cr_loss=0.3713, over 20982.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1578, cr_loss=0.3791, over 4082054.31 frames. ], batch size: 51, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:07:05,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=409130.5, ans=0.0 2024-09-16 07:07:05,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=409130.5, ans=0.025 2024-09-16 07:07:37,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-09-16 07:08:06,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=409243.8333333333, ans=0.125 2024-09-16 07:08:14,936 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.136e+02 2.252e+02 2.439e+02 4.200e+02, threshold=4.504e+02, percent-clipped=0.0 2024-09-16 07:08:16,620 INFO [train.py:1198] (0/2) Epoch 23, batch 3850, loss[loss=0.2468, ctc_loss=0.1645, cr_loss=0.4115, over 21075.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1568, cr_loss=0.3774, over 4092841.79 frames. ], batch size: 59, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:08:29,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-09-16 07:08:47,071 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-09-16 07:08:54,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=409328.8333333333, ans=0.125 2024-09-16 07:09:16,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=409385.5, ans=0.04949747468305833 2024-09-16 07:09:30,984 INFO [train.py:1198] (0/2) Epoch 23, batch 3900, loss[loss=0.2518, ctc_loss=0.1747, cr_loss=0.3853, over 20879.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1569, cr_loss=0.3773, over 4091075.75 frames. ], batch size: 54, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:09:41,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=409413.8333333333, ans=0.0 2024-09-16 07:09:53,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=22.5 2024-09-16 07:10:03,628 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-09-16 07:10:45,070 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.057e+02 2.198e+02 2.318e+02 3.832e+02, threshold=4.396e+02, percent-clipped=0.0 2024-09-16 07:10:45,822 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2024-09-16 07:10:46,691 INFO [train.py:1198] (0/2) Epoch 23, batch 3950, loss[loss=0.2795, ctc_loss=0.1927, cr_loss=0.434, over 18163.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1564, cr_loss=0.3771, over 4089003.57 frames. ], batch size: 108, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:11:29,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=409612.1666666667, ans=0.025 2024-09-16 07:11:46,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=409640.5, ans=0.2 2024-09-16 07:11:52,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=409668.8333333333, ans=0.025 2024-09-16 07:12:05,516 INFO [train.py:1198] (0/2) Epoch 23, batch 4000, loss[loss=0.2316, ctc_loss=0.1557, cr_loss=0.3795, over 20436.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1564, cr_loss=0.377, over 4094029.43 frames. ], batch size: 45, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:12:32,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=409725.5, ans=0.125 2024-09-16 07:12:36,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=409725.5, ans=0.125 2024-09-16 07:12:39,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=409753.8333333333, ans=0.0 2024-09-16 07:13:22,728 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.055e+02 2.223e+02 2.385e+02 3.119e+02, threshold=4.446e+02, percent-clipped=0.0 2024-09-16 07:13:23,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.36 vs. limit=15.0 2024-09-16 07:13:24,261 INFO [train.py:1198] (0/2) Epoch 23, batch 4050, loss[loss=0.2286, ctc_loss=0.1525, cr_loss=0.3805, over 20830.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1575, cr_loss=0.3785, over 4077282.18 frames. ], batch size: 59, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:13:40,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=409867.1666666667, ans=0.2 2024-09-16 07:13:47,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=409867.1666666667, ans=0.125 2024-09-16 07:13:55,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=409895.5, ans=0.2 2024-09-16 07:13:56,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=409895.5, ans=0.1 2024-09-16 07:13:56,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=409895.5, ans=0.125 2024-09-16 07:14:16,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=409923.8333333333, ans=0.125 2024-09-16 07:14:17,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=409923.8333333333, ans=0.125 2024-09-16 07:14:40,069 INFO [train.py:1198] (0/2) Epoch 23, batch 4100, loss[loss=0.2605, ctc_loss=0.1768, cr_loss=0.4187, over 20876.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.157, cr_loss=0.3774, over 4070492.17 frames. ], batch size: 57, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:15:13,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=410037.1666666667, ans=0.07 2024-09-16 07:15:32,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=410065.5, ans=0.0 2024-09-16 07:15:53,403 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:15:54,455 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.086e+02 2.178e+02 2.319e+02 4.043e+02, threshold=4.355e+02, percent-clipped=0.0 2024-09-16 07:15:55,286 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=22.5 2024-09-16 07:15:55,978 INFO [train.py:1198] (0/2) Epoch 23, batch 4150, loss[loss=0.2139, ctc_loss=0.1422, cr_loss=0.3584, over 20774.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1569, cr_loss=0.3776, over 4076378.80 frames. ], batch size: 53, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:16:08,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=410122.1666666667, ans=0.2 2024-09-16 07:16:37,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=410178.8333333333, ans=0.0 2024-09-16 07:17:14,294 INFO [train.py:1198] (0/2) Epoch 23, batch 4200, loss[loss=0.2334, ctc_loss=0.1577, cr_loss=0.3785, over 20798.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1561, cr_loss=0.3759, over 4068550.36 frames. ], batch size: 56, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:17:25,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=410263.8333333333, ans=0.0 2024-09-16 07:17:55,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-09-16 07:18:23,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=410377.1666666667, ans=0.0 2024-09-16 07:18:31,179 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.123e+02 2.246e+02 2.486e+02 3.752e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-16 07:18:32,827 INFO [train.py:1198] (0/2) Epoch 23, batch 4250, loss[loss=0.2785, ctc_loss=0.1951, cr_loss=0.4169, over 18409.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1562, cr_loss=0.3767, over 4073895.18 frames. ], batch size: 108, lr: 3.61e-03, grad_scale: 32.0 2024-09-16 07:18:39,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=12.0 2024-09-16 07:19:48,434 INFO [train.py:1198] (0/2) Epoch 23, batch 4300, loss[loss=0.2516, ctc_loss=0.1747, cr_loss=0.3841, over 20339.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.156, cr_loss=0.3769, over 4082214.97 frames. ], batch size: 74, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:19:53,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410547.1666666667, ans=0.1 2024-09-16 07:20:35,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=410632.1666666667, ans=0.125 2024-09-16 07:20:39,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=410632.1666666667, ans=0.0 2024-09-16 07:20:47,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410660.5, ans=0.1 2024-09-16 07:20:50,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=410660.5, ans=0.2 2024-09-16 07:21:02,462 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 2.085e+02 2.236e+02 2.414e+02 4.322e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-16 07:21:03,902 INFO [train.py:1198] (0/2) Epoch 23, batch 4350, loss[loss=0.2683, ctc_loss=0.1842, cr_loss=0.4206, over 20690.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1564, cr_loss=0.3772, over 4091215.16 frames. ], batch size: 66, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:21:07,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-09-16 07:21:37,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=410745.5, ans=0.2 2024-09-16 07:21:40,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=410745.5, ans=0.125 2024-09-16 07:21:49,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=410773.8333333333, ans=0.0 2024-09-16 07:21:58,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=410773.8333333333, ans=0.125 2024-09-16 07:22:13,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=410802.1666666667, ans=0.025 2024-09-16 07:22:19,333 INFO [train.py:1198] (0/2) Epoch 23, batch 4400, loss[loss=0.2658, ctc_loss=0.1793, cr_loss=0.4325, over 20865.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1574, cr_loss=0.3788, over 4089190.89 frames. ], batch size: 65, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:22:24,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410830.5, ans=0.1 2024-09-16 07:22:32,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=410830.5, ans=0.09899494936611666 2024-09-16 07:22:44,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=410858.8333333333, ans=0.125 2024-09-16 07:23:03,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=410887.1666666667, ans=0.0 2024-09-16 07:23:20,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=410915.5, ans=0.125 2024-09-16 07:23:26,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=410943.8333333333, ans=0.125 2024-09-16 07:23:39,285 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.181e+02 2.290e+02 2.501e+02 4.284e+02, threshold=4.579e+02, percent-clipped=0.0 2024-09-16 07:23:40,844 INFO [train.py:1198] (0/2) Epoch 23, batch 4450, loss[loss=0.2044, ctc_loss=0.1358, cr_loss=0.3427, over 21065.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1579, cr_loss=0.3802, over 4085281.94 frames. ], batch size: 62, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:24:25,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.81 vs. limit=6.0 2024-09-16 07:24:56,357 INFO [train.py:1198] (0/2) Epoch 23, batch 4500, loss[loss=0.2048, ctc_loss=0.1366, cr_loss=0.3409, over 20407.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1577, cr_loss=0.3797, over 4081206.09 frames. ], batch size: 45, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:25:13,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=411142.1666666667, ans=0.125 2024-09-16 07:25:19,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=411142.1666666667, ans=0.07 2024-09-16 07:25:46,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=411198.8333333333, ans=0.5 2024-09-16 07:26:10,747 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.116e+02 2.247e+02 2.381e+02 3.714e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-16 07:26:12,125 INFO [train.py:1198] (0/2) Epoch 23, batch 4550, loss[loss=0.2272, ctc_loss=0.1536, cr_loss=0.3678, over 20930.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1577, cr_loss=0.3799, over 4094341.65 frames. ], batch size: 60, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:26:26,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411283.8333333333, ans=0.1 2024-09-16 07:26:29,211 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=22.5 2024-09-16 07:26:34,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=411283.8333333333, ans=0.0 2024-09-16 07:27:02,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=411340.5, ans=0.125 2024-09-16 07:27:13,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=411368.8333333333, ans=6.0 2024-09-16 07:27:28,334 INFO [train.py:1198] (0/2) Epoch 23, batch 4600, loss[loss=0.2138, ctc_loss=0.1433, cr_loss=0.3523, over 20642.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1575, cr_loss=0.3789, over 4080352.72 frames. ], batch size: 71, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:27:33,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=411397.1666666667, ans=0.0 2024-09-16 07:27:33,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=12.0 2024-09-16 07:27:39,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=411397.1666666667, ans=0.0 2024-09-16 07:28:09,742 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-09-16 07:28:12,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=411482.1666666667, ans=0.2 2024-09-16 07:28:39,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=411510.5, ans=0.5 2024-09-16 07:28:45,161 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.068e+02 2.187e+02 2.459e+02 3.117e+02, threshold=4.373e+02, percent-clipped=0.0 2024-09-16 07:28:46,689 INFO [train.py:1198] (0/2) Epoch 23, batch 4650, loss[loss=0.2041, ctc_loss=0.1352, cr_loss=0.3446, over 20985.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1569, cr_loss=0.3778, over 4078920.28 frames. ], batch size: 51, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:28:53,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=411538.8333333333, ans=12.0 2024-09-16 07:29:21,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=411595.5, ans=0.5 2024-09-16 07:29:39,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=411623.8333333333, ans=0.125 2024-09-16 07:29:53,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411652.1666666667, ans=0.1 2024-09-16 07:29:54,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-16 07:30:00,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=411652.1666666667, ans=0.0 2024-09-16 07:30:04,590 INFO [train.py:1198] (0/2) Epoch 23, batch 4700, loss[loss=0.2186, ctc_loss=0.1424, cr_loss=0.381, over 20891.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1575, cr_loss=0.3787, over 4079593.99 frames. ], batch size: 54, lr: 3.60e-03, grad_scale: 16.0 2024-09-16 07:30:07,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=411680.5, ans=0.0 2024-09-16 07:30:27,867 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2024-09-16 07:31:03,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=411793.8333333333, ans=0.2 2024-09-16 07:31:08,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=411793.8333333333, ans=0.125 2024-09-16 07:31:20,006 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.105e+02 2.220e+02 2.381e+02 4.274e+02, threshold=4.440e+02, percent-clipped=0.0 2024-09-16 07:31:20,027 INFO [train.py:1198] (0/2) Epoch 23, batch 4750, loss[loss=0.2067, ctc_loss=0.1373, cr_loss=0.3471, over 20318.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1579, cr_loss=0.3796, over 4074004.98 frames. ], batch size: 45, lr: 3.60e-03, grad_scale: 16.0 2024-09-16 07:31:46,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=411850.5, ans=0.0 2024-09-16 07:31:46,707 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=22.5 2024-09-16 07:31:49,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=411878.8333333333, ans=0.025 2024-09-16 07:32:07,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=411907.1666666667, ans=0.0 2024-09-16 07:32:36,070 INFO [train.py:1198] (0/2) Epoch 23, batch 4800, loss[loss=0.198, ctc_loss=0.1324, cr_loss=0.328, over 19564.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1574, cr_loss=0.3794, over 4076766.32 frames. ], batch size: 43, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:32:38,419 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-09-16 07:32:45,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=411963.8333333333, ans=0.125 2024-09-16 07:32:51,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=411992.1666666667, ans=0.0 2024-09-16 07:32:53,002 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:33:15,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=412020.5, ans=0.125 2024-09-16 07:33:35,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=412077.1666666667, ans=0.5 2024-09-16 07:33:48,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=412077.1666666667, ans=0.0 2024-09-16 07:33:51,433 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.107e+02 2.248e+02 2.391e+02 3.143e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-16 07:33:51,451 INFO [train.py:1198] (0/2) Epoch 23, batch 4850, loss[loss=0.2399, ctc_loss=0.1605, cr_loss=0.3974, over 20775.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1581, cr_loss=0.3812, over 4082763.15 frames. ], batch size: 56, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:33:56,570 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2024-09-16 07:34:19,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2024-09-16 07:34:24,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=412162.1666666667, ans=0.125 2024-09-16 07:34:43,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=412190.5, ans=0.025 2024-09-16 07:35:12,410 INFO [train.py:1198] (0/2) Epoch 23, batch 4900, loss[loss=0.227, ctc_loss=0.1496, cr_loss=0.3867, over 20860.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1581, cr_loss=0.382, over 4092197.50 frames. ], batch size: 65, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:35:17,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412247.1666666667, ans=0.1 2024-09-16 07:35:50,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=412303.8333333333, ans=0.125 2024-09-16 07:36:07,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=412332.1666666667, ans=0.125 2024-09-16 07:36:11,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=412360.5, ans=0.05 2024-09-16 07:36:25,956 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.150e+02 2.264e+02 2.435e+02 4.831e+02, threshold=4.527e+02, percent-clipped=1.0 2024-09-16 07:36:25,976 INFO [train.py:1198] (0/2) Epoch 23, batch 4950, loss[loss=0.2607, ctc_loss=0.1786, cr_loss=0.4109, over 18050.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1581, cr_loss=0.3819, over 4095149.59 frames. ], batch size: 108, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:36:34,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=412388.8333333333, ans=0.0 2024-09-16 07:36:45,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=412417.1666666667, ans=0.1 2024-09-16 07:37:40,687 INFO [train.py:1198] (0/2) Epoch 23, batch 5000, loss[loss=0.2237, ctc_loss=0.1498, cr_loss=0.3692, over 20041.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1581, cr_loss=0.3816, over 4099510.74 frames. ], batch size: 44, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:37:44,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=12.0 2024-09-16 07:37:45,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-09-16 07:38:13,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=412587.1666666667, ans=0.125 2024-09-16 07:38:54,939 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.025e+02 2.166e+02 2.351e+02 2.819e+02, threshold=4.332e+02, percent-clipped=0.0 2024-09-16 07:38:54,956 INFO [train.py:1198] (0/2) Epoch 23, batch 5050, loss[loss=0.235, ctc_loss=0.1563, cr_loss=0.3935, over 20785.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.157, cr_loss=0.3791, over 4103542.58 frames. ], batch size: 56, lr: 3.60e-03, grad_scale: 32.0 2024-09-16 07:39:16,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=412700.5, ans=0.0 2024-09-16 07:39:49,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=412757.1666666667, ans=0.2 2024-09-16 07:40:09,156 INFO [train.py:1198] (0/2) Epoch 23, batch 5100, loss[loss=0.2539, ctc_loss=0.1717, cr_loss=0.4107, over 21083.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1578, cr_loss=0.3801, over 4097924.03 frames. ], batch size: 59, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:40:23,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2024-09-16 07:40:49,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=412870.5, ans=0.0 2024-09-16 07:41:18,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=412927.1666666667, ans=0.0 2024-09-16 07:41:18,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=412927.1666666667, ans=0.125 2024-09-16 07:41:22,867 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.048e+02 2.221e+02 2.428e+02 4.386e+02, threshold=4.442e+02, percent-clipped=1.0 2024-09-16 07:41:22,885 INFO [train.py:1198] (0/2) Epoch 23, batch 5150, loss[loss=0.2479, ctc_loss=0.1697, cr_loss=0.3911, over 20821.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1586, cr_loss=0.3807, over 4081430.39 frames. ], batch size: 65, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:41:52,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413012.1666666667, ans=0.1 2024-09-16 07:42:23,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.36 vs. limit=22.5 2024-09-16 07:42:37,387 INFO [train.py:1198] (0/2) Epoch 23, batch 5200, loss[loss=0.2529, ctc_loss=0.171, cr_loss=0.4093, over 20661.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1578, cr_loss=0.3794, over 4093200.21 frames. ], batch size: 68, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:42:40,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=413097.1666666667, ans=0.125 2024-09-16 07:43:03,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=413125.5, ans=0.0 2024-09-16 07:43:38,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=413210.5, ans=0.1 2024-09-16 07:43:54,695 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.127e+02 2.264e+02 2.474e+02 4.390e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-16 07:43:54,730 INFO [train.py:1198] (0/2) Epoch 23, batch 5250, loss[loss=0.2714, ctc_loss=0.1869, cr_loss=0.4228, over 18073.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1583, cr_loss=0.3802, over 4076039.48 frames. ], batch size: 108, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:43:59,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=413238.8333333333, ans=0.05 2024-09-16 07:44:12,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=413267.1666666667, ans=0.125 2024-09-16 07:44:48,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=413323.8333333333, ans=0.02 2024-09-16 07:45:12,166 INFO [train.py:1198] (0/2) Epoch 23, batch 5300, loss[loss=0.2266, ctc_loss=0.1506, cr_loss=0.3799, over 20893.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1583, cr_loss=0.3806, over 4083577.56 frames. ], batch size: 54, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:45:14,224 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:45:22,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=413380.5, ans=0.07 2024-09-16 07:45:34,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=413408.8333333333, ans=0.025 2024-09-16 07:46:03,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=413465.5, ans=0.125 2024-09-16 07:46:26,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.109e+02 2.290e+02 2.512e+02 6.419e+02, threshold=4.580e+02, percent-clipped=1.0 2024-09-16 07:46:26,699 INFO [train.py:1198] (0/2) Epoch 23, batch 5350, loss[loss=0.2337, ctc_loss=0.1561, cr_loss=0.388, over 20964.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1582, cr_loss=0.3808, over 4089728.49 frames. ], batch size: 58, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:47:41,239 INFO [train.py:1198] (0/2) Epoch 23, batch 5400, loss[loss=0.2285, ctc_loss=0.1557, cr_loss=0.3643, over 20873.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1574, cr_loss=0.3795, over 4103243.25 frames. ], batch size: 57, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:48:52,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=413777.1666666667, ans=0.125 2024-09-16 07:48:55,219 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.134e+02 2.266e+02 2.500e+02 4.304e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-16 07:48:55,237 INFO [train.py:1198] (0/2) Epoch 23, batch 5450, loss[loss=0.2336, ctc_loss=0.1558, cr_loss=0.3887, over 21049.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1573, cr_loss=0.3791, over 4090557.32 frames. ], batch size: 56, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:48:59,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413805.5, ans=0.1 2024-09-16 07:49:34,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=413862.1666666667, ans=0.0 2024-09-16 07:49:39,074 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 07:50:09,821 INFO [train.py:1198] (0/2) Epoch 23, batch 5500, loss[loss=0.265, ctc_loss=0.1813, cr_loss=0.4182, over 20704.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.158, cr_loss=0.3804, over 4073921.68 frames. ], batch size: 71, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:50:14,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=413947.1666666667, ans=0.125 2024-09-16 07:50:34,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=413975.5, ans=0.0 2024-09-16 07:51:16,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=414060.5, ans=0.0 2024-09-16 07:51:17,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-09-16 07:51:24,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.114e+02 2.262e+02 2.494e+02 4.113e+02, threshold=4.525e+02, percent-clipped=0.0 2024-09-16 07:51:24,044 INFO [train.py:1198] (0/2) Epoch 23, batch 5550, loss[loss=0.2318, ctc_loss=0.1559, cr_loss=0.3798, over 21059.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1581, cr_loss=0.3803, over 4071581.91 frames. ], batch size: 53, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:51:48,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414117.1666666667, ans=0.1 2024-09-16 07:52:27,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2024-09-16 07:52:32,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.72 vs. limit=15.0 2024-09-16 07:52:39,262 INFO [train.py:1198] (0/2) Epoch 23, batch 5600, loss[loss=0.2606, ctc_loss=0.1771, cr_loss=0.4177, over 20948.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1581, cr_loss=0.3799, over 4068872.24 frames. ], batch size: 60, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:52:44,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414230.5, ans=0.1 2024-09-16 07:52:45,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=414230.5, ans=0.125 2024-09-16 07:53:12,099 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.36 vs. limit=15.0 2024-09-16 07:53:22,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=12.0 2024-09-16 07:53:54,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414372.1666666667, ans=0.1 2024-09-16 07:53:56,156 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.115e+02 2.228e+02 2.446e+02 3.372e+02, threshold=4.456e+02, percent-clipped=0.0 2024-09-16 07:53:56,175 INFO [train.py:1198] (0/2) Epoch 23, batch 5650, loss[loss=0.2502, ctc_loss=0.1721, cr_loss=0.3905, over 20967.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1566, cr_loss=0.3775, over 4074129.89 frames. ], batch size: 58, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:54:08,652 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-09-16 07:54:12,549 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.725e-02 2024-09-16 07:54:27,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=414428.8333333333, ans=0.125 2024-09-16 07:54:43,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=414457.1666666667, ans=0.0 2024-09-16 07:54:47,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.07 vs. limit=15.0 2024-09-16 07:54:54,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=414457.1666666667, ans=0.2 2024-09-16 07:55:00,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=414485.5, ans=0.2 2024-09-16 07:55:05,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2024-09-16 07:55:08,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414485.5, ans=0.1 2024-09-16 07:55:13,724 INFO [train.py:1198] (0/2) Epoch 23, batch 5700, loss[loss=0.2384, ctc_loss=0.1607, cr_loss=0.3885, over 20949.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1573, cr_loss=0.3783, over 4072950.62 frames. ], batch size: 60, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:56:06,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=414598.8333333333, ans=0.0 2024-09-16 07:56:13,925 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-09-16 07:56:28,290 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.107e+02 2.261e+02 2.440e+02 3.847e+02, threshold=4.522e+02, percent-clipped=0.0 2024-09-16 07:56:28,311 INFO [train.py:1198] (0/2) Epoch 23, batch 5750, loss[loss=0.1948, ctc_loss=0.1298, cr_loss=0.3249, over 21076.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1568, cr_loss=0.3768, over 4078404.60 frames. ], batch size: 53, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:56:33,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=414655.5, ans=0.0 2024-09-16 07:56:37,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=414655.5, ans=0.0 2024-09-16 07:57:28,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=414768.8333333333, ans=0.0 2024-09-16 07:57:42,584 INFO [train.py:1198] (0/2) Epoch 23, batch 5800, loss[loss=0.2546, ctc_loss=0.1724, cr_loss=0.4112, over 20030.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1568, cr_loss=0.3769, over 4087394.34 frames. ], batch size: 80, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:58:56,702 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.105e+02 2.233e+02 2.415e+02 7.942e+02, threshold=4.466e+02, percent-clipped=1.0 2024-09-16 07:58:56,721 INFO [train.py:1198] (0/2) Epoch 23, batch 5850, loss[loss=0.2507, ctc_loss=0.1696, cr_loss=0.4056, over 20836.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1573, cr_loss=0.3782, over 4090113.16 frames. ], batch size: 65, lr: 3.59e-03, grad_scale: 32.0 2024-09-16 07:59:05,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=414938.8333333333, ans=0.125 2024-09-16 07:59:37,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=414995.5, ans=0.2 2024-09-16 08:00:02,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=415052.1666666667, ans=0.125 2024-09-16 08:00:10,866 INFO [train.py:1198] (0/2) Epoch 23, batch 5900, loss[loss=0.2019, ctc_loss=0.1325, cr_loss=0.3471, over 20969.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1565, cr_loss=0.3769, over 4099389.61 frames. ], batch size: 49, lr: 3.59e-03, grad_scale: 16.0 2024-09-16 08:00:11,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=415080.5, ans=0.02 2024-09-16 08:00:14,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=415080.5, ans=0.0 2024-09-16 08:00:42,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=415137.1666666667, ans=0.035 2024-09-16 08:00:42,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=415137.1666666667, ans=0.125 2024-09-16 08:01:01,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=415165.5, ans=0.0 2024-09-16 08:01:12,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2024-09-16 08:01:13,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=415193.8333333333, ans=0.1 2024-09-16 08:01:24,691 INFO [train.py:1198] (0/2) Epoch 23, batch 5950, loss[loss=0.2493, ctc_loss=0.1688, cr_loss=0.4027, over 21016.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1573, cr_loss=0.3783, over 4088614.97 frames. ], batch size: 63, lr: 3.58e-03, grad_scale: 16.0 2024-09-16 08:01:26,223 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.148e+02 2.249e+02 2.375e+02 6.008e+02, threshold=4.497e+02, percent-clipped=1.0 2024-09-16 08:02:11,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=415307.1666666667, ans=0.0 2024-09-16 08:02:21,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=415307.1666666667, ans=0.125 2024-09-16 08:02:26,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=415335.5, ans=0.125 2024-09-16 08:02:41,062 INFO [train.py:1198] (0/2) Epoch 23, batch 6000, loss[loss=0.2515, ctc_loss=0.17, cr_loss=0.4074, over 20408.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1568, cr_loss=0.378, over 4094689.46 frames. ], batch size: 74, lr: 3.58e-03, grad_scale: 32.0 2024-09-16 08:02:41,063 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 08:03:01,828 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.0276, 3.5845, 3.7369, 3.5582], device='cuda:0') 2024-09-16 08:03:04,417 INFO [train.py:1230] (0/2) Epoch 23, validation: loss=0.04285, ctc_loss=0.04285, cr_loss=1.126e-14, over 944034.00 frames. 2024-09-16 08:03:04,417 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 08:03:53,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=415448.8333333333, ans=0.025 2024-09-16 08:04:19,350 INFO [train.py:1198] (0/2) Epoch 23, batch 6050, loss[loss=0.193, ctc_loss=0.1267, cr_loss=0.3316, over 20937.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1568, cr_loss=0.3786, over 4105649.88 frames. ], batch size: 49, lr: 3.58e-03, grad_scale: 32.0 2024-09-16 08:04:20,817 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.070e+02 2.252e+02 2.432e+02 2.990e+02, threshold=4.503e+02, percent-clipped=0.0 2024-09-16 08:04:24,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=415505.5, ans=0.0 2024-09-16 08:04:34,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=415533.8333333333, ans=0.125 2024-09-16 08:05:13,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=415590.5, ans=0.04949747468305833 2024-09-16 08:05:15,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=12.0 2024-09-16 08:05:33,192 INFO [train.py:1198] (0/2) Epoch 23, batch 6100, loss[loss=0.2383, ctc_loss=0.1614, cr_loss=0.3849, over 20676.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1573, cr_loss=0.3799, over 4100374.38 frames. ], batch size: 71, lr: 3.58e-03, grad_scale: 32.0 2024-09-16 08:05:47,397 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-09-16 08:06:01,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=415703.8333333333, ans=0.025 2024-09-16 08:06:18,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=415732.1666666667, ans=0.125 2024-09-16 08:06:46,678 INFO [train.py:1198] (0/2) Epoch 23, batch 6150, loss[loss=0.2287, ctc_loss=0.1552, cr_loss=0.3675, over 21033.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1592, cr_loss=0.3822, over 4073002.71 frames. ], batch size: 62, lr: 3.58e-03, grad_scale: 32.0 2024-09-16 08:06:48,233 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.135e+02 2.285e+02 2.429e+02 3.479e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-16 08:07:00,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=415817.1666666667, ans=0.125 2024-09-16 08:08:00,917 INFO [train.py:1198] (0/2) Epoch 23, batch 6200, loss[loss=0.1913, ctc_loss=0.1279, cr_loss=0.317, over 19925.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1576, cr_loss=0.3801, over 4074610.80 frames. ], batch size: 44, lr: 3.58e-03, grad_scale: 16.0 2024-09-16 08:08:18,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=415958.8333333333, ans=0.0 2024-09-16 08:08:51,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=416015.5, ans=0.125 2024-09-16 08:08:54,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=416015.5, ans=0.0 2024-09-16 08:09:09,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=22.5 2024-09-16 08:09:14,950 INFO [train.py:1198] (0/2) Epoch 23, batch 6250, loss[loss=0.1986, ctc_loss=0.1324, cr_loss=0.3313, over 19113.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1582, cr_loss=0.3807, over 4056308.45 frames. ], batch size: 42, lr: 3.58e-03, grad_scale: 16.0 2024-09-16 08:09:17,822 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.140e+02 2.232e+02 2.381e+02 4.416e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-16 08:09:19,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=416072.1666666667, ans=0.0 2024-09-16 08:10:29,906 INFO [train.py:1198] (0/2) Epoch 23, batch 6300, loss[loss=0.2432, ctc_loss=0.1656, cr_loss=0.3879, over 20364.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.159, cr_loss=0.3802, over 4019011.17 frames. ], batch size: 74, lr: 3.58e-03, grad_scale: 16.0 2024-09-16 08:10:47,694 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.15 vs. limit=10.0 2024-09-16 08:11:12,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416270.5, ans=0.1 2024-09-16 08:11:43,201 INFO [train.py:1198] (0/2) Epoch 23, batch 6350, loss[loss=0.2583, ctc_loss=0.1846, cr_loss=0.3683, over 14252.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1629, cr_loss=0.3829, over 3868495.95 frames. ], batch size: 150, lr: 3.58e-03, grad_scale: 16.0 2024-09-16 08:11:44,002 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2024-09-16 08:11:45,959 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.199e+02 2.375e+02 2.594e+02 3.544e+02, threshold=4.749e+02, percent-clipped=0.0 2024-09-16 08:11:52,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416355.5, ans=0.1 2024-09-16 08:12:41,658 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-23.pt 2024-09-16 08:13:32,400 INFO [train.py:1198] (0/2) Epoch 24, batch 0, loss[loss=0.1885, ctc_loss=0.1223, cr_loss=0.3307, over 20316.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1223, cr_loss=0.3307, over 20316.00 frames. ], batch size: 45, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:13:32,401 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 08:13:50,631 INFO [train.py:1230] (0/2) Epoch 24, validation: loss=0.04276, ctc_loss=0.04276, cr_loss=1.112e-14, over 944034.00 frames. 2024-09-16 08:13:50,632 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 08:13:58,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=416471.6666666667, ans=0.0 2024-09-16 08:14:19,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=416528.3333333333, ans=0.2 2024-09-16 08:15:05,589 INFO [train.py:1198] (0/2) Epoch 24, batch 50, loss[loss=0.2195, ctc_loss=0.1469, cr_loss=0.363, over 20867.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1548, cr_loss=0.3745, over 928048.93 frames. ], batch size: 57, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:15:11,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=416613.3333333333, ans=0.025 2024-09-16 08:15:21,742 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.109e+02 2.337e+02 2.584e+02 3.452e+02, threshold=4.674e+02, percent-clipped=0.0 2024-09-16 08:15:26,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=12.0 2024-09-16 08:16:20,120 INFO [train.py:1198] (0/2) Epoch 24, batch 100, loss[loss=0.253, ctc_loss=0.1698, cr_loss=0.4157, over 20964.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1543, cr_loss=0.3746, over 1637346.45 frames. ], batch size: 64, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:16:26,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.40 vs. limit=22.5 2024-09-16 08:16:26,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2024-09-16 08:16:29,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2024-09-16 08:16:38,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=416783.3333333333, ans=0.125 2024-09-16 08:16:44,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=416783.3333333333, ans=0.1 2024-09-16 08:17:01,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=416811.6666666667, ans=0.1 2024-09-16 08:17:12,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=416840.0, ans=22.5 2024-09-16 08:17:16,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=416840.0, ans=0.025 2024-09-16 08:17:33,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=416868.3333333333, ans=0.125 2024-09-16 08:17:35,982 INFO [train.py:1198] (0/2) Epoch 24, batch 150, loss[loss=0.2272, ctc_loss=0.1525, cr_loss=0.3739, over 21056.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1558, cr_loss=0.3776, over 2179597.18 frames. ], batch size: 59, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:17:46,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=22.5 2024-09-16 08:17:52,575 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.114e+02 2.240e+02 2.430e+02 5.162e+02, threshold=4.480e+02, percent-clipped=1.0 2024-09-16 08:18:53,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=417010.0, ans=0.0 2024-09-16 08:18:57,633 INFO [train.py:1198] (0/2) Epoch 24, batch 200, loss[loss=0.2555, ctc_loss=0.1754, cr_loss=0.4009, over 20690.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.157, cr_loss=0.3794, over 2607850.83 frames. ], batch size: 68, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:18:59,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=417038.3333333333, ans=0.05 2024-09-16 08:19:02,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=417038.3333333333, ans=0.125 2024-09-16 08:19:11,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2024-09-16 08:19:23,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.45 vs. limit=22.5 2024-09-16 08:19:29,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=22.5 2024-09-16 08:19:39,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.42 vs. limit=15.0 2024-09-16 08:19:51,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=417123.3333333333, ans=0.1 2024-09-16 08:19:57,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=417151.6666666667, ans=0.1 2024-09-16 08:20:13,896 INFO [train.py:1198] (0/2) Epoch 24, batch 250, loss[loss=0.222, ctc_loss=0.1484, cr_loss=0.3679, over 20901.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1565, cr_loss=0.3788, over 2939427.17 frames. ], batch size: 54, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:20:30,549 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.104e+02 2.210e+02 2.352e+02 3.316e+02, threshold=4.420e+02, percent-clipped=0.0 2024-09-16 08:20:39,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=417208.3333333333, ans=0.125 2024-09-16 08:20:53,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=417236.6666666667, ans=0.125 2024-09-16 08:21:29,662 INFO [train.py:1198] (0/2) Epoch 24, batch 300, loss[loss=0.2075, ctc_loss=0.1407, cr_loss=0.3341, over 20886.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1563, cr_loss=0.3781, over 3198036.70 frames. ], batch size: 54, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:21:46,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=417350.0, ans=0.125 2024-09-16 08:22:10,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=417378.3333333333, ans=0.0 2024-09-16 08:22:15,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=417406.6666666667, ans=0.125 2024-09-16 08:22:33,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=417435.0, ans=0.0 2024-09-16 08:22:45,289 INFO [train.py:1198] (0/2) Epoch 24, batch 350, loss[loss=0.2489, ctc_loss=0.168, cr_loss=0.4042, over 20350.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1556, cr_loss=0.3768, over 3402537.72 frames. ], batch size: 74, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:23:01,783 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.065e+02 2.195e+02 2.372e+02 3.334e+02, threshold=4.391e+02, percent-clipped=0.0 2024-09-16 08:23:03,778 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 08:23:05,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=417491.6666666667, ans=0.0 2024-09-16 08:23:13,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=15.0 2024-09-16 08:23:33,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=417548.3333333333, ans=0.2 2024-09-16 08:23:43,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=417548.3333333333, ans=0.2 2024-09-16 08:23:45,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-16 08:24:00,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=12.0 2024-09-16 08:24:01,253 INFO [train.py:1198] (0/2) Epoch 24, batch 400, loss[loss=0.2316, ctc_loss=0.1566, cr_loss=0.3747, over 20990.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1557, cr_loss=0.3777, over 3553494.40 frames. ], batch size: 55, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:24:59,001 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-16 08:25:01,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=417690.0, ans=0.025 2024-09-16 08:25:09,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=417718.3333333333, ans=0.125 2024-09-16 08:25:22,740 INFO [train.py:1198] (0/2) Epoch 24, batch 450, loss[loss=0.2328, ctc_loss=0.1544, cr_loss=0.3922, over 20779.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1551, cr_loss=0.3768, over 3681785.22 frames. ], batch size: 56, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:25:39,166 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.072e+02 2.190e+02 2.334e+02 3.687e+02, threshold=4.380e+02, percent-clipped=0.0 2024-09-16 08:26:05,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-09-16 08:26:15,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=417831.6666666667, ans=0.04949747468305833 2024-09-16 08:26:38,515 INFO [train.py:1198] (0/2) Epoch 24, batch 500, loss[loss=0.2494, ctc_loss=0.172, cr_loss=0.3873, over 20942.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1557, cr_loss=0.3766, over 3778756.33 frames. ], batch size: 60, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:26:51,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2024-09-16 08:26:55,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=417916.6666666667, ans=0.0 2024-09-16 08:26:58,831 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=22.5 2024-09-16 08:27:50,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=418001.6666666667, ans=0.125 2024-09-16 08:27:53,398 INFO [train.py:1198] (0/2) Epoch 24, batch 550, loss[loss=0.2383, ctc_loss=0.1602, cr_loss=0.3906, over 21075.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1558, cr_loss=0.3768, over 3833423.43 frames. ], batch size: 59, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:28:02,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=418030.0, ans=0.125 2024-09-16 08:28:08,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=22.5 2024-09-16 08:28:09,621 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.091e+02 2.222e+02 2.388e+02 4.824e+02, threshold=4.444e+02, percent-clipped=1.0 2024-09-16 08:28:09,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=418058.3333333333, ans=0.2 2024-09-16 08:29:08,671 INFO [train.py:1198] (0/2) Epoch 24, batch 600, loss[loss=0.2004, ctc_loss=0.1339, cr_loss=0.3325, over 21039.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.155, cr_loss=0.3758, over 3899947.23 frames. ], batch size: 53, lr: 3.50e-03, grad_scale: 32.0 2024-09-16 08:29:16,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=418171.6666666667, ans=0.125 2024-09-16 08:29:21,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=418171.6666666667, ans=0.07 2024-09-16 08:29:31,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418200.0, ans=0.1 2024-09-16 08:29:32,057 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2024-09-16 08:29:53,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=418256.6666666667, ans=0.5 2024-09-16 08:30:09,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=418256.6666666667, ans=0.125 2024-09-16 08:30:30,674 INFO [train.py:1198] (0/2) Epoch 24, batch 650, loss[loss=0.259, ctc_loss=0.1783, cr_loss=0.4037, over 19357.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1543, cr_loss=0.3744, over 3948062.68 frames. ], batch size: 90, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:30:47,079 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.070e+02 2.181e+02 2.393e+02 3.062e+02, threshold=4.363e+02, percent-clipped=0.0 2024-09-16 08:30:52,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=418341.6666666667, ans=0.1 2024-09-16 08:31:02,638 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.72 vs. limit=10.0 2024-09-16 08:31:46,669 INFO [train.py:1198] (0/2) Epoch 24, batch 700, loss[loss=0.2584, ctc_loss=0.1752, cr_loss=0.4163, over 20267.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1547, cr_loss=0.3757, over 3988555.10 frames. ], batch size: 74, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:32:14,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=418483.3333333333, ans=0.025 2024-09-16 08:32:30,296 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=22.5 2024-09-16 08:33:00,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=418568.3333333333, ans=0.125 2024-09-16 08:33:02,898 INFO [train.py:1198] (0/2) Epoch 24, batch 750, loss[loss=0.2481, ctc_loss=0.1649, cr_loss=0.4163, over 21064.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1551, cr_loss=0.3762, over 4012568.18 frames. ], batch size: 59, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:33:19,683 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.722e+02 2.137e+02 2.258e+02 2.452e+02 4.942e+02, threshold=4.516e+02, percent-clipped=2.0 2024-09-16 08:33:36,994 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.47 vs. limit=6.0 2024-09-16 08:33:52,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=418681.6666666667, ans=0.0 2024-09-16 08:34:03,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=418710.0, ans=0.0 2024-09-16 08:34:13,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=418710.0, ans=0.0 2024-09-16 08:34:18,175 INFO [train.py:1198] (0/2) Epoch 24, batch 800, loss[loss=0.2303, ctc_loss=0.155, cr_loss=0.3764, over 20881.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1562, cr_loss=0.3778, over 4031847.28 frames. ], batch size: 57, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:34:21,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=418738.3333333333, ans=0.125 2024-09-16 08:34:44,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=418766.6666666667, ans=0.0 2024-09-16 08:34:56,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=418795.0, ans=0.025 2024-09-16 08:35:09,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=418823.3333333333, ans=0.125 2024-09-16 08:35:21,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=418851.6666666667, ans=0.1 2024-09-16 08:35:33,529 INFO [train.py:1198] (0/2) Epoch 24, batch 850, loss[loss=0.2342, ctc_loss=0.1571, cr_loss=0.3858, over 20902.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1563, cr_loss=0.378, over 4039544.03 frames. ], batch size: 60, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:35:35,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418880.0, ans=0.1 2024-09-16 08:35:37,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=418880.0, ans=0.125 2024-09-16 08:35:41,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=418880.0, ans=0.1 2024-09-16 08:35:45,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=418880.0, ans=0.125 2024-09-16 08:35:52,926 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.132e+02 2.261e+02 2.449e+02 6.815e+02, threshold=4.523e+02, percent-clipped=1.0 2024-09-16 08:35:57,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=418908.3333333333, ans=0.0 2024-09-16 08:35:58,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=418908.3333333333, ans=0.025 2024-09-16 08:36:06,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=418908.3333333333, ans=0.0 2024-09-16 08:36:12,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=418936.6666666667, ans=0.0 2024-09-16 08:36:14,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=418936.6666666667, ans=0.125 2024-09-16 08:36:32,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=22.5 2024-09-16 08:36:37,128 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=22.5 2024-09-16 08:36:54,413 INFO [train.py:1198] (0/2) Epoch 24, batch 900, loss[loss=0.2259, ctc_loss=0.1499, cr_loss=0.3802, over 21031.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1558, cr_loss=0.3777, over 4059281.90 frames. ], batch size: 63, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:36:56,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=419021.6666666667, ans=0.0 2024-09-16 08:37:00,887 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 08:37:19,320 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 08:38:10,543 INFO [train.py:1198] (0/2) Epoch 24, batch 950, loss[loss=0.2455, ctc_loss=0.1648, cr_loss=0.4035, over 20148.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.156, cr_loss=0.3783, over 4072882.78 frames. ], batch size: 80, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:38:15,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=419163.3333333333, ans=0.025 2024-09-16 08:38:26,957 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.101e+02 2.246e+02 2.373e+02 3.641e+02, threshold=4.492e+02, percent-clipped=0.0 2024-09-16 08:38:59,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=419248.3333333333, ans=0.125 2024-09-16 08:39:07,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=419248.3333333333, ans=0.125 2024-09-16 08:39:11,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=419276.6666666667, ans=0.2 2024-09-16 08:39:26,827 INFO [train.py:1198] (0/2) Epoch 24, batch 1000, loss[loss=0.2644, ctc_loss=0.1915, cr_loss=0.3646, over 14623.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1569, cr_loss=0.3796, over 4073532.66 frames. ], batch size: 150, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:39:40,825 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-148000.pt 2024-09-16 08:39:43,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=419333.3333333333, ans=0.0 2024-09-16 08:39:53,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2024-09-16 08:40:43,889 INFO [train.py:1198] (0/2) Epoch 24, batch 1050, loss[loss=0.1751, ctc_loss=0.1145, cr_loss=0.3027, over 20950.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1564, cr_loss=0.3788, over 4084134.83 frames. ], batch size: 48, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:40:55,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=419446.6666666667, ans=0.125 2024-09-16 08:41:00,077 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.125e+02 2.248e+02 2.425e+02 4.002e+02, threshold=4.497e+02, percent-clipped=0.0 2024-09-16 08:41:09,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=419475.0, ans=0.04949747468305833 2024-09-16 08:41:15,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=419503.3333333333, ans=0.125 2024-09-16 08:41:25,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=22.5 2024-09-16 08:41:32,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=419531.6666666667, ans=0.1 2024-09-16 08:41:34,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=419531.6666666667, ans=0.1 2024-09-16 08:42:05,359 INFO [train.py:1198] (0/2) Epoch 24, batch 1100, loss[loss=0.2022, ctc_loss=0.1341, cr_loss=0.3407, over 20963.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.156, cr_loss=0.3773, over 4070555.88 frames. ], batch size: 50, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:42:20,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.20 vs. limit=22.5 2024-09-16 08:42:20,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419616.6666666667, ans=0.1 2024-09-16 08:43:16,640 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 08:43:20,724 INFO [train.py:1198] (0/2) Epoch 24, batch 1150, loss[loss=0.2154, ctc_loss=0.1429, cr_loss=0.3628, over 20989.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1559, cr_loss=0.3769, over 4066723.55 frames. ], batch size: 49, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:43:22,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=419730.0, ans=0.04949747468305833 2024-09-16 08:43:37,474 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.103e+02 2.223e+02 2.465e+02 3.224e+02, threshold=4.446e+02, percent-clipped=0.0 2024-09-16 08:44:15,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=419815.0, ans=0.2 2024-09-16 08:44:36,149 INFO [train.py:1198] (0/2) Epoch 24, batch 1200, loss[loss=0.1846, ctc_loss=0.1219, cr_loss=0.3135, over 19790.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1554, cr_loss=0.3766, over 4076609.21 frames. ], batch size: 44, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:44:54,872 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 08:44:59,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=419900.0, ans=0.1 2024-09-16 08:45:11,904 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-09-16 08:45:51,961 INFO [train.py:1198] (0/2) Epoch 24, batch 1250, loss[loss=0.2413, ctc_loss=0.162, cr_loss=0.3963, over 20954.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1566, cr_loss=0.3778, over 4061974.85 frames. ], batch size: 60, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:46:08,347 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.143e+02 2.229e+02 2.408e+02 3.533e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-16 08:46:23,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420070.0, ans=0.1 2024-09-16 08:46:32,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=420070.0, ans=0.125 2024-09-16 08:47:10,219 INFO [train.py:1198] (0/2) Epoch 24, batch 1300, loss[loss=0.1941, ctc_loss=0.1291, cr_loss=0.3246, over 20981.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1575, cr_loss=0.3791, over 4060623.56 frames. ], batch size: 52, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:47:25,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=420155.0, ans=0.0 2024-09-16 08:48:13,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2024-09-16 08:48:21,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=420268.3333333333, ans=0.125 2024-09-16 08:48:27,537 INFO [train.py:1198] (0/2) Epoch 24, batch 1350, loss[loss=0.258, ctc_loss=0.1719, cr_loss=0.4303, over 21017.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1575, cr_loss=0.3789, over 4066910.30 frames. ], batch size: 63, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:48:44,077 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.131e+02 2.263e+02 2.450e+02 7.002e+02, threshold=4.527e+02, percent-clipped=1.0 2024-09-16 08:48:44,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420325.0, ans=0.1 2024-09-16 08:49:05,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420353.3333333333, ans=0.1 2024-09-16 08:49:10,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=420353.3333333333, ans=0.2 2024-09-16 08:49:15,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=12.0 2024-09-16 08:49:23,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=420381.6666666667, ans=0.0 2024-09-16 08:49:34,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=420410.0, ans=0.125 2024-09-16 08:49:42,636 INFO [train.py:1198] (0/2) Epoch 24, batch 1400, loss[loss=0.2767, ctc_loss=0.1879, cr_loss=0.4438, over 17947.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1576, cr_loss=0.3797, over 4074260.42 frames. ], batch size: 108, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:49:53,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=420438.3333333333, ans=0.0 2024-09-16 08:49:59,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=420466.6666666667, ans=0.125 2024-09-16 08:50:40,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=420523.3333333333, ans=0.125 2024-09-16 08:50:57,846 INFO [train.py:1198] (0/2) Epoch 24, batch 1450, loss[loss=0.1885, ctc_loss=0.1247, cr_loss=0.3193, over 20931.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1575, cr_loss=0.3798, over 4088893.67 frames. ], batch size: 48, lr: 3.49e-03, grad_scale: 32.0 2024-09-16 08:51:10,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=22.5 2024-09-16 08:51:10,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.24 vs. limit=10.0 2024-09-16 08:51:14,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.142e+02 2.300e+02 2.520e+02 4.396e+02, threshold=4.601e+02, percent-clipped=0.0 2024-09-16 08:51:27,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=420636.6666666667, ans=0.0 2024-09-16 08:51:30,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=420636.6666666667, ans=0.0 2024-09-16 08:51:31,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420636.6666666667, ans=0.1 2024-09-16 08:51:50,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=420665.0, ans=0.0 2024-09-16 08:52:13,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=420721.6666666667, ans=15.0 2024-09-16 08:52:13,890 INFO [train.py:1198] (0/2) Epoch 24, batch 1500, loss[loss=0.2415, ctc_loss=0.1633, cr_loss=0.3913, over 21078.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.157, cr_loss=0.3786, over 4086035.79 frames. ], batch size: 59, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:52:18,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=420721.6666666667, ans=0.125 2024-09-16 08:52:50,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=420778.3333333333, ans=0.125 2024-09-16 08:53:23,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=420835.0, ans=0.125 2024-09-16 08:53:32,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=420835.0, ans=0.125 2024-09-16 08:53:32,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=420835.0, ans=0.125 2024-09-16 08:53:35,360 INFO [train.py:1198] (0/2) Epoch 24, batch 1550, loss[loss=0.2257, ctc_loss=0.1524, cr_loss=0.3665, over 20945.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1561, cr_loss=0.3773, over 4086286.34 frames. ], batch size: 60, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:53:44,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=420863.3333333333, ans=0.125 2024-09-16 08:53:51,891 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.039e+02 2.175e+02 2.385e+02 3.423e+02, threshold=4.350e+02, percent-clipped=0.0 2024-09-16 08:53:52,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=420891.6666666667, ans=0.035 2024-09-16 08:53:58,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=420891.6666666667, ans=0.125 2024-09-16 08:53:58,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=420891.6666666667, ans=0.0 2024-09-16 08:54:35,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-09-16 08:54:51,259 INFO [train.py:1198] (0/2) Epoch 24, batch 1600, loss[loss=0.2428, ctc_loss=0.1654, cr_loss=0.3872, over 20966.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1556, cr_loss=0.3766, over 4093728.75 frames. ], batch size: 64, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:55:00,880 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=22.5 2024-09-16 08:55:52,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=421118.3333333333, ans=0.125 2024-09-16 08:56:07,512 INFO [train.py:1198] (0/2) Epoch 24, batch 1650, loss[loss=0.2086, ctc_loss=0.1367, cr_loss=0.3594, over 20946.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1551, cr_loss=0.3761, over 4106635.05 frames. ], batch size: 49, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:56:15,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=421146.6666666667, ans=0.02 2024-09-16 08:56:24,169 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.102e+02 2.248e+02 2.461e+02 4.188e+02, threshold=4.496e+02, percent-clipped=0.0 2024-09-16 08:56:48,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=421203.3333333333, ans=0.025 2024-09-16 08:57:20,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=421260.0, ans=0.125 2024-09-16 08:57:23,080 INFO [train.py:1198] (0/2) Epoch 24, batch 1700, loss[loss=0.2424, ctc_loss=0.1632, cr_loss=0.3962, over 20932.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1555, cr_loss=0.3767, over 4113314.02 frames. ], batch size: 60, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:57:26,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=421288.3333333333, ans=0.125 2024-09-16 08:57:45,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=421316.6666666667, ans=0.125 2024-09-16 08:57:55,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=22.5 2024-09-16 08:58:28,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=421401.6666666667, ans=0.125 2024-09-16 08:58:41,697 INFO [train.py:1198] (0/2) Epoch 24, batch 1750, loss[loss=0.2347, ctc_loss=0.1599, cr_loss=0.374, over 20676.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1553, cr_loss=0.3759, over 4100775.71 frames. ], batch size: 71, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 08:59:01,285 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.112e+02 2.261e+02 2.410e+02 3.889e+02, threshold=4.522e+02, percent-clipped=0.0 2024-09-16 08:59:11,266 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2024-09-16 08:59:49,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=421543.3333333333, ans=0.2 2024-09-16 09:00:00,997 INFO [train.py:1198] (0/2) Epoch 24, batch 1800, loss[loss=0.2484, ctc_loss=0.1667, cr_loss=0.4087, over 20651.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1556, cr_loss=0.3764, over 4093379.76 frames. ], batch size: 68, lr: 3.48e-03, grad_scale: 64.0 2024-09-16 09:00:16,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=421600.0, ans=0.125 2024-09-16 09:00:19,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=421600.0, ans=0.125 2024-09-16 09:00:52,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=421656.6666666667, ans=0.0 2024-09-16 09:00:54,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-09-16 09:01:06,374 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-09-16 09:01:16,240 INFO [train.py:1198] (0/2) Epoch 24, batch 1850, loss[loss=0.2317, ctc_loss=0.1544, cr_loss=0.3868, over 20953.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1567, cr_loss=0.3775, over 4086381.19 frames. ], batch size: 58, lr: 3.48e-03, grad_scale: 64.0 2024-09-16 09:01:19,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=421713.3333333333, ans=0.1 2024-09-16 09:01:32,831 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.705e+02 2.107e+02 2.233e+02 2.383e+02 3.396e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-16 09:01:42,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421741.6666666667, ans=0.1 2024-09-16 09:01:43,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421741.6666666667, ans=0.1 2024-09-16 09:02:20,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=421826.6666666667, ans=0.05 2024-09-16 09:02:20,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2024-09-16 09:02:32,118 INFO [train.py:1198] (0/2) Epoch 24, batch 1900, loss[loss=0.2281, ctc_loss=0.1541, cr_loss=0.3701, over 20980.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1568, cr_loss=0.3784, over 4082877.39 frames. ], batch size: 55, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:03:09,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.69 vs. limit=15.0 2024-09-16 09:03:20,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=12.0 2024-09-16 09:03:25,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=421940.0, ans=0.125 2024-09-16 09:03:47,242 INFO [train.py:1198] (0/2) Epoch 24, batch 1950, loss[loss=0.2769, ctc_loss=0.1939, cr_loss=0.4146, over 19418.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1568, cr_loss=0.3778, over 4066394.95 frames. ], batch size: 90, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:04:04,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=422025.0, ans=0.0 2024-09-16 09:04:05,196 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.091e+02 2.292e+02 2.472e+02 3.296e+02, threshold=4.585e+02, percent-clipped=0.0 2024-09-16 09:04:17,857 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2024-09-16 09:04:34,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=422053.3333333333, ans=0.125 2024-09-16 09:04:58,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=422110.0, ans=0.0 2024-09-16 09:04:58,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=22.5 2024-09-16 09:05:05,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=422110.0, ans=0.125 2024-09-16 09:05:08,378 INFO [train.py:1198] (0/2) Epoch 24, batch 2000, loss[loss=0.2693, ctc_loss=0.1838, cr_loss=0.4272, over 17967.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1569, cr_loss=0.3782, over 4071473.59 frames. ], batch size: 108, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:05:22,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422166.6666666667, ans=0.1 2024-09-16 09:05:38,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=422195.0, ans=0.125 2024-09-16 09:05:46,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-09-16 09:06:07,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=422251.6666666667, ans=0.0 2024-09-16 09:06:15,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=422251.6666666667, ans=0.125 2024-09-16 09:06:24,080 INFO [train.py:1198] (0/2) Epoch 24, batch 2050, loss[loss=0.2513, ctc_loss=0.1701, cr_loss=0.406, over 20660.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1571, cr_loss=0.3784, over 4074503.19 frames. ], batch size: 66, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:06:42,265 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.169e+02 2.314e+02 2.452e+02 4.779e+02, threshold=4.628e+02, percent-clipped=1.0 2024-09-16 09:07:00,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422336.6666666667, ans=0.1 2024-09-16 09:07:05,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=422336.6666666667, ans=0.0 2024-09-16 09:07:11,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=422365.0, ans=0.025 2024-09-16 09:07:39,961 INFO [train.py:1198] (0/2) Epoch 24, batch 2100, loss[loss=0.1897, ctc_loss=0.1248, cr_loss=0.3242, over 20969.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1562, cr_loss=0.3774, over 4094547.77 frames. ], batch size: 50, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:07:40,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=422421.6666666667, ans=0.1 2024-09-16 09:07:48,260 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 09:08:00,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=422450.0, ans=0.125 2024-09-16 09:08:55,497 INFO [train.py:1198] (0/2) Epoch 24, batch 2150, loss[loss=0.2523, ctc_loss=0.1697, cr_loss=0.4132, over 20640.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1557, cr_loss=0.3767, over 4094223.72 frames. ], batch size: 68, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:09:07,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=422563.3333333333, ans=0.125 2024-09-16 09:09:13,363 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.646e+02 2.065e+02 2.226e+02 2.404e+02 5.265e+02, threshold=4.452e+02, percent-clipped=1.0 2024-09-16 09:09:54,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=422676.6666666667, ans=0.2 2024-09-16 09:10:16,247 INFO [train.py:1198] (0/2) Epoch 24, batch 2200, loss[loss=0.2117, ctc_loss=0.1392, cr_loss=0.3626, over 20768.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1567, cr_loss=0.3784, over 4088717.54 frames. ], batch size: 56, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:10:32,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=422733.3333333333, ans=0.125 2024-09-16 09:11:01,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=422790.0, ans=0.125 2024-09-16 09:11:31,367 INFO [train.py:1198] (0/2) Epoch 24, batch 2250, loss[loss=0.244, ctc_loss=0.1674, cr_loss=0.383, over 21014.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1565, cr_loss=0.3783, over 4083238.22 frames. ], batch size: 61, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:11:49,081 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.103e+02 2.247e+02 2.384e+02 2.915e+02, threshold=4.493e+02, percent-clipped=0.0 2024-09-16 09:11:52,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=422875.0, ans=0.2 2024-09-16 09:12:14,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=422903.3333333333, ans=0.125 2024-09-16 09:12:47,166 INFO [train.py:1198] (0/2) Epoch 24, batch 2300, loss[loss=0.2327, ctc_loss=0.1568, cr_loss=0.3796, over 20938.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1566, cr_loss=0.379, over 4091056.08 frames. ], batch size: 60, lr: 3.48e-03, grad_scale: 32.0 2024-09-16 09:12:50,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=22.5 2024-09-16 09:12:51,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422988.3333333333, ans=0.1 2024-09-16 09:13:16,736 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2024-09-16 09:13:28,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=423045.0, ans=0.2 2024-09-16 09:13:33,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=423073.3333333333, ans=0.0 2024-09-16 09:14:03,024 INFO [train.py:1198] (0/2) Epoch 24, batch 2350, loss[loss=0.2131, ctc_loss=0.1433, cr_loss=0.3489, over 20947.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1571, cr_loss=0.3802, over 4082960.90 frames. ], batch size: 51, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:14:15,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=423130.0, ans=0.125 2024-09-16 09:14:21,180 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.098e+02 2.203e+02 2.397e+02 6.665e+02, threshold=4.406e+02, percent-clipped=1.0 2024-09-16 09:14:25,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-16 09:14:57,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=423215.0, ans=0.125 2024-09-16 09:15:18,444 INFO [train.py:1198] (0/2) Epoch 24, batch 2400, loss[loss=0.212, ctc_loss=0.1423, cr_loss=0.3486, over 20990.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1568, cr_loss=0.3792, over 4090013.71 frames. ], batch size: 48, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:15:21,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=423271.6666666667, ans=0.0 2024-09-16 09:16:39,675 INFO [train.py:1198] (0/2) Epoch 24, batch 2450, loss[loss=0.2702, ctc_loss=0.1844, cr_loss=0.429, over 18016.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1568, cr_loss=0.3795, over 4094192.17 frames. ], batch size: 108, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:16:48,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.62 vs. limit=5.0 2024-09-16 09:16:57,958 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.091e+02 2.189e+02 2.384e+02 3.010e+02, threshold=4.377e+02, percent-clipped=0.0 2024-09-16 09:17:20,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=423470.0, ans=0.0 2024-09-16 09:17:34,743 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=10.0 2024-09-16 09:17:45,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=423526.6666666667, ans=0.0 2024-09-16 09:17:55,383 INFO [train.py:1198] (0/2) Epoch 24, batch 2500, loss[loss=0.2054, ctc_loss=0.1343, cr_loss=0.3556, over 20779.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1562, cr_loss=0.3785, over 4091065.57 frames. ], batch size: 53, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:17:57,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=22.5 2024-09-16 09:18:16,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=423583.3333333333, ans=0.0 2024-09-16 09:18:24,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=423611.6666666667, ans=0.125 2024-09-16 09:18:24,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=423611.6666666667, ans=0.04949747468305833 2024-09-16 09:19:02,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=423668.3333333333, ans=0.04949747468305833 2024-09-16 09:19:10,774 INFO [train.py:1198] (0/2) Epoch 24, batch 2550, loss[loss=0.2433, ctc_loss=0.1636, cr_loss=0.3985, over 21079.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1564, cr_loss=0.3788, over 4088694.69 frames. ], batch size: 59, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:19:18,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=423696.6666666667, ans=0.0 2024-09-16 09:19:18,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=423696.6666666667, ans=0.2 2024-09-16 09:19:29,054 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.095e+02 2.206e+02 2.357e+02 3.619e+02, threshold=4.413e+02, percent-clipped=0.0 2024-09-16 09:19:45,091 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-16 09:20:10,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=423810.0, ans=0.125 2024-09-16 09:20:13,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=423810.0, ans=0.2 2024-09-16 09:20:27,061 INFO [train.py:1198] (0/2) Epoch 24, batch 2600, loss[loss=0.2579, ctc_loss=0.1747, cr_loss=0.4162, over 18173.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1553, cr_loss=0.3777, over 4100923.06 frames. ], batch size: 108, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:20:33,996 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.56 vs. limit=6.0 2024-09-16 09:20:38,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=423838.3333333333, ans=0.02 2024-09-16 09:20:50,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=423866.6666666667, ans=0.0 2024-09-16 09:21:33,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=423951.6666666667, ans=0.125 2024-09-16 09:21:38,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=423951.6666666667, ans=0.0 2024-09-16 09:21:48,369 INFO [train.py:1198] (0/2) Epoch 24, batch 2650, loss[loss=0.2422, ctc_loss=0.1615, cr_loss=0.4034, over 20855.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.155, cr_loss=0.3769, over 4105122.97 frames. ], batch size: 65, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:22:06,649 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.080e+02 2.199e+02 2.345e+02 2.937e+02, threshold=4.399e+02, percent-clipped=0.0 2024-09-16 09:22:09,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=424008.3333333333, ans=0.125 2024-09-16 09:22:23,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=424036.6666666667, ans=0.0 2024-09-16 09:22:34,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=22.5 2024-09-16 09:22:40,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=424065.0, ans=0.125 2024-09-16 09:23:04,048 INFO [train.py:1198] (0/2) Epoch 24, batch 2700, loss[loss=0.1803, ctc_loss=0.1181, cr_loss=0.3112, over 20968.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1553, cr_loss=0.3767, over 4116460.52 frames. ], batch size: 51, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:23:06,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2024-09-16 09:23:12,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-09-16 09:23:26,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=424150.0, ans=0.125 2024-09-16 09:23:29,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=424150.0, ans=0.09899494936611666 2024-09-16 09:23:43,729 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=22.5 2024-09-16 09:24:19,237 INFO [train.py:1198] (0/2) Epoch 24, batch 2750, loss[loss=0.2126, ctc_loss=0.1418, cr_loss=0.3545, over 21016.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1553, cr_loss=0.3759, over 4101813.43 frames. ], batch size: 52, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:24:37,503 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.107e+02 2.260e+02 2.391e+02 3.699e+02, threshold=4.519e+02, percent-clipped=0.0 2024-09-16 09:24:47,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2024-09-16 09:24:56,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=424320.0, ans=0.0 2024-09-16 09:25:14,716 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 09:25:35,557 INFO [train.py:1198] (0/2) Epoch 24, batch 2800, loss[loss=0.2645, ctc_loss=0.1796, cr_loss=0.4246, over 20940.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1559, cr_loss=0.3771, over 4085302.99 frames. ], batch size: 64, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:26:04,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=424461.6666666667, ans=0.0 2024-09-16 09:26:24,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=424490.0, ans=0.125 2024-09-16 09:26:25,921 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 09:26:33,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=424490.0, ans=0.07 2024-09-16 09:26:34,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=424518.3333333333, ans=0.125 2024-09-16 09:26:51,164 INFO [train.py:1198] (0/2) Epoch 24, batch 2850, loss[loss=0.2347, ctc_loss=0.1583, cr_loss=0.3821, over 21065.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1574, cr_loss=0.38, over 4077174.27 frames. ], batch size: 59, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:27:09,488 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.139e+02 2.236e+02 2.347e+02 3.219e+02, threshold=4.471e+02, percent-clipped=0.0 2024-09-16 09:27:19,246 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 09:27:37,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424603.3333333333, ans=0.1 2024-09-16 09:27:48,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=424631.6666666667, ans=0.125 2024-09-16 09:27:54,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.68 vs. limit=22.5 2024-09-16 09:28:01,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=424660.0, ans=0.125 2024-09-16 09:28:01,997 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.60 vs. limit=10.0 2024-09-16 09:28:03,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=424660.0, ans=0.125 2024-09-16 09:28:13,429 INFO [train.py:1198] (0/2) Epoch 24, batch 2900, loss[loss=0.202, ctc_loss=0.1342, cr_loss=0.3392, over 20969.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1569, cr_loss=0.379, over 4075075.54 frames. ], batch size: 51, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:28:16,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424688.3333333333, ans=0.1 2024-09-16 09:29:22,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=424801.6666666667, ans=0.125 2024-09-16 09:29:29,726 INFO [train.py:1198] (0/2) Epoch 24, batch 2950, loss[loss=0.2839, ctc_loss=0.2087, cr_loss=0.3757, over 13880.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1562, cr_loss=0.3775, over 4081305.12 frames. ], batch size: 149, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:29:30,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=424830.0, ans=0.0 2024-09-16 09:29:33,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=424830.0, ans=0.125 2024-09-16 09:29:47,777 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.087e+02 2.181e+02 2.372e+02 4.689e+02, threshold=4.361e+02, percent-clipped=1.0 2024-09-16 09:30:01,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=424886.6666666667, ans=0.0 2024-09-16 09:30:01,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=424886.6666666667, ans=0.0 2024-09-16 09:30:15,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=22.5 2024-09-16 09:30:24,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=424915.0, ans=0.2 2024-09-16 09:30:44,797 INFO [train.py:1198] (0/2) Epoch 24, batch 3000, loss[loss=0.2574, ctc_loss=0.1799, cr_loss=0.3874, over 19339.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1563, cr_loss=0.3774, over 4083087.98 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 32.0 2024-09-16 09:30:44,798 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 09:31:09,971 INFO [train.py:1230] (0/2) Epoch 24, validation: loss=0.04253, ctc_loss=0.04253, cr_loss=1.102e-14, over 944034.00 frames. 2024-09-16 09:31:09,972 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 09:31:22,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=424971.6666666667, ans=0.1 2024-09-16 09:31:22,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=424971.6666666667, ans=0.0 2024-09-16 09:31:50,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=425028.3333333333, ans=0.2 2024-09-16 09:32:26,332 INFO [train.py:1198] (0/2) Epoch 24, batch 3050, loss[loss=0.2105, ctc_loss=0.1407, cr_loss=0.3489, over 19885.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1548, cr_loss=0.3757, over 4090214.16 frames. ], batch size: 44, lr: 3.47e-03, grad_scale: 16.0 2024-09-16 09:32:34,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=425113.3333333333, ans=0.1 2024-09-16 09:32:40,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=425141.6666666667, ans=0.2 2024-09-16 09:32:41,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=425141.6666666667, ans=0.015 2024-09-16 09:32:46,180 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.090e+02 2.229e+02 2.364e+02 3.613e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-16 09:33:04,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=425170.0, ans=0.2 2024-09-16 09:33:12,618 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 09:33:45,899 INFO [train.py:1198] (0/2) Epoch 24, batch 3100, loss[loss=0.2141, ctc_loss=0.142, cr_loss=0.3608, over 21055.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1549, cr_loss=0.3765, over 4082870.55 frames. ], batch size: 56, lr: 3.47e-03, grad_scale: 16.0 2024-09-16 09:33:58,841 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2024-09-16 09:34:29,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=425340.0, ans=0.125 2024-09-16 09:34:58,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=425368.3333333333, ans=0.09899494936611666 2024-09-16 09:35:00,953 INFO [train.py:1198] (0/2) Epoch 24, batch 3150, loss[loss=0.2554, ctc_loss=0.1733, cr_loss=0.4104, over 20978.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1561, cr_loss=0.378, over 4076002.04 frames. ], batch size: 64, lr: 3.47e-03, grad_scale: 16.0 2024-09-16 09:35:16,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=425425.0, ans=0.125 2024-09-16 09:35:19,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=425425.0, ans=0.0 2024-09-16 09:35:20,721 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.089e+02 2.222e+02 2.393e+02 3.442e+02, threshold=4.445e+02, percent-clipped=0.0 2024-09-16 09:35:21,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-09-16 09:35:24,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=425425.0, ans=0.1 2024-09-16 09:35:50,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=425481.6666666667, ans=0.0 2024-09-16 09:36:16,789 INFO [train.py:1198] (0/2) Epoch 24, batch 3200, loss[loss=0.2312, ctc_loss=0.1556, cr_loss=0.3777, over 21059.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1556, cr_loss=0.3773, over 4076216.05 frames. ], batch size: 62, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:36:20,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=425538.3333333333, ans=0.125 2024-09-16 09:36:42,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=425566.6666666667, ans=0.0 2024-09-16 09:36:54,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=425595.0, ans=0.125 2024-09-16 09:37:33,124 INFO [train.py:1198] (0/2) Epoch 24, batch 3250, loss[loss=0.2403, ctc_loss=0.1602, cr_loss=0.4007, over 20782.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1555, cr_loss=0.3776, over 4096797.03 frames. ], batch size: 56, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:37:36,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=425680.0, ans=0.0 2024-09-16 09:37:53,059 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.097e+02 2.289e+02 2.402e+02 4.297e+02, threshold=4.577e+02, percent-clipped=0.0 2024-09-16 09:38:28,277 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 09:38:47,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=425793.3333333333, ans=0.0 2024-09-16 09:38:49,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=425793.3333333333, ans=0.0 2024-09-16 09:38:49,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=425793.3333333333, ans=0.2 2024-09-16 09:38:55,169 INFO [train.py:1198] (0/2) Epoch 24, batch 3300, loss[loss=0.2292, ctc_loss=0.1534, cr_loss=0.3792, over 20892.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1556, cr_loss=0.3777, over 4102094.91 frames. ], batch size: 54, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:39:37,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=425878.3333333333, ans=0.2 2024-09-16 09:39:49,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=425906.6666666667, ans=0.125 2024-09-16 09:40:02,018 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2024-09-16 09:40:12,100 INFO [train.py:1198] (0/2) Epoch 24, batch 3350, loss[loss=0.2415, ctc_loss=0.1651, cr_loss=0.3818, over 20984.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1553, cr_loss=0.3772, over 4104979.41 frames. ], batch size: 55, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:40:28,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=12.0 2024-09-16 09:40:31,547 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.153e+02 2.322e+02 2.452e+02 3.830e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-16 09:40:49,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=426020.0, ans=0.0 2024-09-16 09:40:54,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=426020.0, ans=0.035 2024-09-16 09:40:57,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-09-16 09:41:09,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=426048.3333333333, ans=0.125 2024-09-16 09:41:27,373 INFO [train.py:1198] (0/2) Epoch 24, batch 3400, loss[loss=0.1707, ctc_loss=0.1136, cr_loss=0.2851, over 20971.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1555, cr_loss=0.3767, over 4097005.84 frames. ], batch size: 49, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:41:44,766 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2024-09-16 09:41:55,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426133.3333333333, ans=0.1 2024-09-16 09:42:14,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=426190.0, ans=0.125 2024-09-16 09:42:43,528 INFO [train.py:1198] (0/2) Epoch 24, batch 3450, loss[loss=0.2591, ctc_loss=0.1777, cr_loss=0.4068, over 19550.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1566, cr_loss=0.3788, over 4100847.08 frames. ], batch size: 90, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:43:02,706 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.099e+02 2.187e+02 2.349e+02 8.336e+02, threshold=4.373e+02, percent-clipped=1.0 2024-09-16 09:43:45,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=426360.0, ans=0.125 2024-09-16 09:43:51,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=426360.0, ans=0.125 2024-09-16 09:43:58,924 INFO [train.py:1198] (0/2) Epoch 24, batch 3500, loss[loss=0.2201, ctc_loss=0.146, cr_loss=0.3707, over 21066.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.157, cr_loss=0.3794, over 4101173.50 frames. ], batch size: 56, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:44:37,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=426445.0, ans=0.125 2024-09-16 09:44:37,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=426445.0, ans=0.0 2024-09-16 09:45:21,122 INFO [train.py:1198] (0/2) Epoch 24, batch 3550, loss[loss=0.195, ctc_loss=0.1295, cr_loss=0.3274, over 20979.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1569, cr_loss=0.3786, over 4098247.20 frames. ], batch size: 50, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:45:41,387 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.095e+02 2.213e+02 2.381e+02 4.293e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-16 09:45:50,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=426586.6666666667, ans=0.125 2024-09-16 09:45:55,525 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 09:45:55,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=426586.6666666667, ans=0.125 2024-09-16 09:46:01,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=426586.6666666667, ans=0.025 2024-09-16 09:46:09,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2024-09-16 09:46:23,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=8.0 2024-09-16 09:46:37,621 INFO [train.py:1198] (0/2) Epoch 24, batch 3600, loss[loss=0.2139, ctc_loss=0.1446, cr_loss=0.3462, over 20652.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1564, cr_loss=0.3779, over 4101556.98 frames. ], batch size: 68, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:46:51,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=15.0 2024-09-16 09:47:00,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=426700.0, ans=0.0 2024-09-16 09:47:07,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2024-09-16 09:47:53,516 INFO [train.py:1198] (0/2) Epoch 24, batch 3650, loss[loss=0.2332, ctc_loss=0.157, cr_loss=0.3808, over 21036.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1564, cr_loss=0.3783, over 4107266.79 frames. ], batch size: 62, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:47:55,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=426813.3333333333, ans=0.125 2024-09-16 09:48:13,302 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.077e+02 2.192e+02 2.344e+02 2.962e+02, threshold=4.383e+02, percent-clipped=0.0 2024-09-16 09:48:27,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=426870.0, ans=0.125 2024-09-16 09:48:54,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=426926.6666666667, ans=0.125 2024-09-16 09:48:56,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=426926.6666666667, ans=0.125 2024-09-16 09:48:58,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2024-09-16 09:49:09,493 INFO [train.py:1198] (0/2) Epoch 24, batch 3700, loss[loss=0.2829, ctc_loss=0.1962, cr_loss=0.4339, over 20015.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1561, cr_loss=0.3782, over 4113106.34 frames. ], batch size: 80, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:49:52,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=427011.6666666667, ans=0.125 2024-09-16 09:50:03,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=22.5 2024-09-16 09:50:11,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427068.3333333333, ans=0.1 2024-09-16 09:50:26,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=427068.3333333333, ans=0.125 2024-09-16 09:50:30,924 INFO [train.py:1198] (0/2) Epoch 24, batch 3750, loss[loss=0.1886, ctc_loss=0.1233, cr_loss=0.3266, over 20953.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.156, cr_loss=0.3781, over 4115360.92 frames. ], batch size: 49, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:50:39,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-09-16 09:50:40,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=427096.6666666667, ans=0.2 2024-09-16 09:50:50,706 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.094e+02 2.217e+02 2.403e+02 3.482e+02, threshold=4.435e+02, percent-clipped=0.0 2024-09-16 09:51:10,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=427153.3333333333, ans=0.125 2024-09-16 09:51:13,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427153.3333333333, ans=0.1 2024-09-16 09:51:15,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=427181.6666666667, ans=0.07 2024-09-16 09:51:47,295 INFO [train.py:1198] (0/2) Epoch 24, batch 3800, loss[loss=0.2287, ctc_loss=0.1539, cr_loss=0.374, over 20833.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1556, cr_loss=0.378, over 4119716.67 frames. ], batch size: 59, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:51:50,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=427238.3333333333, ans=10.0 2024-09-16 09:51:58,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=427238.3333333333, ans=0.125 2024-09-16 09:52:36,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=427323.3333333333, ans=0.125 2024-09-16 09:53:02,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=427380.0, ans=0.0 2024-09-16 09:53:03,691 INFO [train.py:1198] (0/2) Epoch 24, batch 3850, loss[loss=0.2309, ctc_loss=0.1572, cr_loss=0.3685, over 21088.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1554, cr_loss=0.3781, over 4110897.03 frames. ], batch size: 59, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:53:11,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=427380.0, ans=0.125 2024-09-16 09:53:23,458 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.157e+02 2.309e+02 2.537e+02 3.875e+02, threshold=4.617e+02, percent-clipped=0.0 2024-09-16 09:53:25,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427408.3333333333, ans=0.1 2024-09-16 09:53:42,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=427436.6666666667, ans=0.125 2024-09-16 09:54:18,771 INFO [train.py:1198] (0/2) Epoch 24, batch 3900, loss[loss=0.2075, ctc_loss=0.1381, cr_loss=0.3472, over 20777.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3784, over 4118276.27 frames. ], batch size: 56, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:54:19,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=427521.6666666667, ans=0.2 2024-09-16 09:54:19,241 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 09:55:29,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=427635.0, ans=0.125 2024-09-16 09:55:34,285 INFO [train.py:1198] (0/2) Epoch 24, batch 3950, loss[loss=0.2352, ctc_loss=0.1621, cr_loss=0.3654, over 21044.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1567, cr_loss=0.3791, over 4100865.17 frames. ], batch size: 62, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:55:53,900 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.097e+02 2.213e+02 2.383e+02 4.274e+02, threshold=4.426e+02, percent-clipped=0.0 2024-09-16 09:55:54,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=427691.6666666667, ans=0.125 2024-09-16 09:56:17,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-09-16 09:56:54,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=427805.0, ans=0.125 2024-09-16 09:56:55,906 INFO [train.py:1198] (0/2) Epoch 24, batch 4000, loss[loss=0.2285, ctc_loss=0.1563, cr_loss=0.3608, over 21014.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1563, cr_loss=0.3784, over 4093606.10 frames. ], batch size: 61, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:57:20,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2024-09-16 09:57:32,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=427861.6666666667, ans=0.125 2024-09-16 09:57:48,266 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-09-16 09:57:58,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2024-09-16 09:58:10,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=427946.6666666667, ans=0.125 2024-09-16 09:58:11,401 INFO [train.py:1198] (0/2) Epoch 24, batch 4050, loss[loss=0.2401, ctc_loss=0.1587, cr_loss=0.4074, over 20888.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1575, cr_loss=0.3805, over 4083374.94 frames. ], batch size: 54, lr: 3.46e-03, grad_scale: 32.0 2024-09-16 09:58:14,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=427946.6666666667, ans=0.07 2024-09-16 09:58:15,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-16 09:58:15,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=427946.6666666667, ans=15.0 2024-09-16 09:58:19,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=427946.6666666667, ans=0.125 2024-09-16 09:58:31,065 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.109e+02 2.220e+02 2.395e+02 4.719e+02, threshold=4.440e+02, percent-clipped=1.0 2024-09-16 09:58:32,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=427975.0, ans=0.125 2024-09-16 09:58:34,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=427975.0, ans=0.0 2024-09-16 09:59:18,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=428060.0, ans=0.125 2024-09-16 09:59:27,429 INFO [train.py:1198] (0/2) Epoch 24, batch 4100, loss[loss=0.2407, ctc_loss=0.1607, cr_loss=0.3997, over 21016.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1575, cr_loss=0.3806, over 4089458.84 frames. ], batch size: 63, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:00:02,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=428145.0, ans=0.0 2024-09-16 10:00:40,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.49 vs. limit=5.0 2024-09-16 10:00:44,074 INFO [train.py:1198] (0/2) Epoch 24, batch 4150, loss[loss=0.2232, ctc_loss=0.1495, cr_loss=0.3683, over 20971.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.157, cr_loss=0.3805, over 4095814.04 frames. ], batch size: 58, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:00:48,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=428230.0, ans=0.125 2024-09-16 10:01:02,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=428258.3333333333, ans=0.125 2024-09-16 10:01:03,426 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.091e+02 2.200e+02 2.388e+02 3.185e+02, threshold=4.400e+02, percent-clipped=0.0 2024-09-16 10:01:09,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=428258.3333333333, ans=0.0 2024-09-16 10:01:09,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=428258.3333333333, ans=10.0 2024-09-16 10:02:04,847 INFO [train.py:1198] (0/2) Epoch 24, batch 4200, loss[loss=0.1801, ctc_loss=0.1179, cr_loss=0.3114, over 19869.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1564, cr_loss=0.379, over 4095688.15 frames. ], batch size: 44, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:03:09,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=428485.0, ans=0.0 2024-09-16 10:03:21,030 INFO [train.py:1198] (0/2) Epoch 24, batch 4250, loss[loss=0.2361, ctc_loss=0.1588, cr_loss=0.3866, over 20999.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3785, over 4107849.57 frames. ], batch size: 64, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:03:32,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=428513.3333333333, ans=0.025 2024-09-16 10:03:40,528 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.064e+02 2.201e+02 2.328e+02 3.119e+02, threshold=4.403e+02, percent-clipped=0.0 2024-09-16 10:03:56,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=428570.0, ans=0.0 2024-09-16 10:04:17,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=428598.3333333333, ans=0.0 2024-09-16 10:04:31,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428626.6666666667, ans=0.1 2024-09-16 10:04:36,696 INFO [train.py:1198] (0/2) Epoch 24, batch 4300, loss[loss=0.2545, ctc_loss=0.1706, cr_loss=0.4197, over 20850.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1563, cr_loss=0.3795, over 4105737.59 frames. ], batch size: 65, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:04:43,838 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.91 vs. limit=15.0 2024-09-16 10:05:11,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-09-16 10:05:25,841 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-16 10:05:41,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=428768.3333333333, ans=0.125 2024-09-16 10:05:45,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2024-09-16 10:05:51,863 INFO [train.py:1198] (0/2) Epoch 24, batch 4350, loss[loss=0.2312, ctc_loss=0.1573, cr_loss=0.3697, over 20655.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1567, cr_loss=0.38, over 4101250.62 frames. ], batch size: 66, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:06:11,953 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.156e+02 2.299e+02 2.517e+02 2.946e+02, threshold=4.598e+02, percent-clipped=0.0 2024-09-16 10:06:21,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=428853.3333333333, ans=0.0 2024-09-16 10:06:29,583 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.89 vs. limit=10.0 2024-09-16 10:06:42,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=428881.6666666667, ans=0.0 2024-09-16 10:06:59,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=428910.0, ans=0.2 2024-09-16 10:07:07,824 INFO [train.py:1198] (0/2) Epoch 24, batch 4400, loss[loss=0.2397, ctc_loss=0.1607, cr_loss=0.3951, over 21085.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1568, cr_loss=0.3797, over 4098849.64 frames. ], batch size: 59, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:07:23,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=428966.6666666667, ans=0.125 2024-09-16 10:07:40,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=428995.0, ans=0.0 2024-09-16 10:08:18,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=429051.6666666667, ans=0.0 2024-09-16 10:08:24,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2024-09-16 10:08:30,150 INFO [train.py:1198] (0/2) Epoch 24, batch 4450, loss[loss=0.2377, ctc_loss=0.1586, cr_loss=0.3954, over 20946.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1575, cr_loss=0.3802, over 4086421.53 frames. ], batch size: 55, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:08:51,197 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.121e+02 2.283e+02 2.420e+02 4.289e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-16 10:09:20,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=429165.0, ans=0.125 2024-09-16 10:09:45,653 INFO [train.py:1198] (0/2) Epoch 24, batch 4500, loss[loss=0.2442, ctc_loss=0.1677, cr_loss=0.3824, over 21042.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1571, cr_loss=0.3791, over 4080992.83 frames. ], batch size: 62, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:10:01,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=429250.0, ans=0.125 2024-09-16 10:10:14,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=429278.3333333333, ans=0.0 2024-09-16 10:10:20,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=22.5 2024-09-16 10:11:01,354 INFO [train.py:1198] (0/2) Epoch 24, batch 4550, loss[loss=0.2311, ctc_loss=0.1579, cr_loss=0.3662, over 20667.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1566, cr_loss=0.3786, over 4090066.99 frames. ], batch size: 68, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:11:23,845 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+02 2.134e+02 2.270e+02 2.418e+02 3.628e+02, threshold=4.539e+02, percent-clipped=0.0 2024-09-16 10:11:26,170 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-16 10:11:40,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=429420.0, ans=0.125 2024-09-16 10:11:50,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=429448.3333333333, ans=0.0 2024-09-16 10:12:06,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=429476.6666666667, ans=0.2 2024-09-16 10:12:16,960 INFO [train.py:1198] (0/2) Epoch 24, batch 4600, loss[loss=0.2494, ctc_loss=0.1657, cr_loss=0.4183, over 20146.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1566, cr_loss=0.3798, over 4100800.74 frames. ], batch size: 80, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:12:18,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=429505.0, ans=0.125 2024-09-16 10:12:24,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=429505.0, ans=0.125 2024-09-16 10:12:39,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=429533.3333333333, ans=0.125 2024-09-16 10:12:39,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=429533.3333333333, ans=0.125 2024-09-16 10:12:41,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=429533.3333333333, ans=0.0 2024-09-16 10:12:42,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=429533.3333333333, ans=0.125 2024-09-16 10:12:45,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=429561.6666666667, ans=0.1 2024-09-16 10:13:00,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=429590.0, ans=0.5 2024-09-16 10:13:12,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=429590.0, ans=0.125 2024-09-16 10:13:13,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=429590.0, ans=0.125 2024-09-16 10:13:31,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=429618.3333333333, ans=0.2 2024-09-16 10:13:36,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=12.0 2024-09-16 10:13:37,449 INFO [train.py:1198] (0/2) Epoch 24, batch 4650, loss[loss=0.2334, ctc_loss=0.1563, cr_loss=0.3855, over 20897.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1569, cr_loss=0.38, over 4112830.86 frames. ], batch size: 54, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:14:00,485 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.107e+02 2.214e+02 2.387e+02 3.192e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-16 10:14:00,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=429675.0, ans=0.0 2024-09-16 10:14:17,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=429703.3333333333, ans=0.125 2024-09-16 10:14:35,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=429731.6666666667, ans=0.125 2024-09-16 10:14:52,737 INFO [train.py:1198] (0/2) Epoch 24, batch 4700, loss[loss=0.2305, ctc_loss=0.1522, cr_loss=0.3915, over 20766.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1578, cr_loss=0.3807, over 4096586.01 frames. ], batch size: 53, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:15:08,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-16 10:15:11,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=429816.6666666667, ans=0.125 2024-09-16 10:15:38,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=429873.3333333333, ans=0.125 2024-09-16 10:16:09,157 INFO [train.py:1198] (0/2) Epoch 24, batch 4750, loss[loss=0.2321, ctc_loss=0.1539, cr_loss=0.3908, over 20852.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1583, cr_loss=0.3817, over 4088910.88 frames. ], batch size: 65, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:16:10,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=429930.0, ans=0.0 2024-09-16 10:16:18,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-09-16 10:16:24,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=429958.3333333333, ans=0.0 2024-09-16 10:16:31,631 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.092e+02 2.233e+02 2.402e+02 6.462e+02, threshold=4.467e+02, percent-clipped=1.0 2024-09-16 10:16:53,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=430015.0, ans=0.0 2024-09-16 10:17:01,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=430015.0, ans=0.125 2024-09-16 10:17:05,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=430015.0, ans=0.0 2024-09-16 10:17:24,704 INFO [train.py:1198] (0/2) Epoch 24, batch 4800, loss[loss=0.2885, ctc_loss=0.1958, cr_loss=0.4634, over 18388.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.157, cr_loss=0.3793, over 4096729.14 frames. ], batch size: 108, lr: 3.45e-03, grad_scale: 32.0 2024-09-16 10:17:35,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=430071.6666666667, ans=0.125 2024-09-16 10:17:50,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=430100.0, ans=0.0 2024-09-16 10:18:04,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430128.3333333333, ans=0.1 2024-09-16 10:18:40,570 INFO [train.py:1198] (0/2) Epoch 24, batch 4850, loss[loss=0.2341, ctc_loss=0.1585, cr_loss=0.3781, over 20881.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1569, cr_loss=0.3789, over 4090994.00 frames. ], batch size: 57, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:19:07,556 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.094e+02 2.221e+02 2.422e+02 4.398e+02, threshold=4.443e+02, percent-clipped=0.0 2024-09-16 10:20:01,139 INFO [train.py:1198] (0/2) Epoch 24, batch 4900, loss[loss=0.23, ctc_loss=0.1548, cr_loss=0.3759, over 21014.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1564, cr_loss=0.3781, over 4088611.73 frames. ], batch size: 62, lr: 3.45e-03, grad_scale: 16.0 2024-09-16 10:20:17,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=430383.3333333333, ans=0.125 2024-09-16 10:20:32,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430411.6666666667, ans=0.1 2024-09-16 10:21:04,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=430468.3333333333, ans=0.1 2024-09-16 10:21:05,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=430468.3333333333, ans=0.0 2024-09-16 10:21:10,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=430468.3333333333, ans=0.2 2024-09-16 10:21:15,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=430496.6666666667, ans=0.0 2024-09-16 10:21:16,147 INFO [train.py:1198] (0/2) Epoch 24, batch 4950, loss[loss=0.2672, ctc_loss=0.1899, cr_loss=0.3866, over 14747.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1566, cr_loss=0.3789, over 4084458.20 frames. ], batch size: 150, lr: 3.44e-03, grad_scale: 16.0 2024-09-16 10:21:26,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=430496.6666666667, ans=0.125 2024-09-16 10:21:28,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430496.6666666667, ans=0.1 2024-09-16 10:21:39,818 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.125e+02 2.261e+02 2.482e+02 4.962e+02, threshold=4.523e+02, percent-clipped=1.0 2024-09-16 10:21:40,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=430525.0, ans=0.125 2024-09-16 10:21:47,682 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:22:20,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=430610.0, ans=0.035 2024-09-16 10:22:31,069 INFO [train.py:1198] (0/2) Epoch 24, batch 5000, loss[loss=0.2215, ctc_loss=0.1471, cr_loss=0.3723, over 20985.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1566, cr_loss=0.3788, over 4081920.87 frames. ], batch size: 52, lr: 3.44e-03, grad_scale: 16.0 2024-09-16 10:22:34,307 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:22:44,832 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-152000.pt 2024-09-16 10:23:03,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=430695.0, ans=0.125 2024-09-16 10:23:17,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.62 vs. limit=10.0 2024-09-16 10:23:46,544 INFO [train.py:1198] (0/2) Epoch 24, batch 5050, loss[loss=0.1982, ctc_loss=0.1327, cr_loss=0.3273, over 20959.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1568, cr_loss=0.3792, over 4082669.83 frames. ], batch size: 49, lr: 3.44e-03, grad_scale: 16.0 2024-09-16 10:23:48,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-09-16 10:23:51,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=430780.0, ans=0.2 2024-09-16 10:24:10,393 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.122e+02 2.270e+02 2.410e+02 4.397e+02, threshold=4.540e+02, percent-clipped=0.0 2024-09-16 10:24:12,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=12.0 2024-09-16 10:25:01,556 INFO [train.py:1198] (0/2) Epoch 24, batch 5100, loss[loss=0.2521, ctc_loss=0.1667, cr_loss=0.4271, over 21031.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1568, cr_loss=0.3791, over 4088579.58 frames. ], batch size: 61, lr: 3.44e-03, grad_scale: 16.0 2024-09-16 10:25:03,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=430921.6666666667, ans=0.0 2024-09-16 10:25:34,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2024-09-16 10:26:15,831 INFO [train.py:1198] (0/2) Epoch 24, batch 5150, loss[loss=0.2002, ctc_loss=0.1336, cr_loss=0.3328, over 20951.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1559, cr_loss=0.3778, over 4084389.22 frames. ], batch size: 48, lr: 3.44e-03, grad_scale: 16.0 2024-09-16 10:26:22,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=431063.3333333333, ans=0.0 2024-09-16 10:26:26,593 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:26:39,436 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.087e+02 2.253e+02 2.421e+02 5.808e+02, threshold=4.505e+02, percent-clipped=1.0 2024-09-16 10:26:39,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=431091.6666666667, ans=0.0 2024-09-16 10:26:50,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=431120.0, ans=0.2 2024-09-16 10:27:07,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=431148.3333333333, ans=0.2 2024-09-16 10:27:18,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=431176.6666666667, ans=0.2 2024-09-16 10:27:29,931 INFO [train.py:1198] (0/2) Epoch 24, batch 5200, loss[loss=0.2268, ctc_loss=0.1527, cr_loss=0.3702, over 20714.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1563, cr_loss=0.3779, over 4066138.22 frames. ], batch size: 71, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:27:46,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=431233.3333333333, ans=0.0 2024-09-16 10:28:03,776 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.97 vs. limit=10.0 2024-09-16 10:28:46,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=431346.6666666667, ans=0.125 2024-09-16 10:28:47,830 INFO [train.py:1198] (0/2) Epoch 24, batch 5250, loss[loss=0.1917, ctc_loss=0.1249, cr_loss=0.3341, over 20956.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1558, cr_loss=0.3767, over 4066680.97 frames. ], batch size: 48, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:29:14,035 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.112e+02 2.223e+02 2.484e+02 5.133e+02, threshold=4.446e+02, percent-clipped=1.0 2024-09-16 10:29:15,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=431375.0, ans=0.2 2024-09-16 10:29:27,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431403.3333333333, ans=0.1 2024-09-16 10:29:32,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=431403.3333333333, ans=10.0 2024-09-16 10:29:50,166 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-16 10:30:04,247 INFO [train.py:1198] (0/2) Epoch 24, batch 5300, loss[loss=0.2227, ctc_loss=0.1477, cr_loss=0.375, over 21053.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1563, cr_loss=0.3787, over 4076800.95 frames. ], batch size: 62, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:30:13,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431488.3333333333, ans=0.1 2024-09-16 10:30:44,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=431545.0, ans=0.0 2024-09-16 10:31:12,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=431601.6666666667, ans=0.0 2024-09-16 10:31:18,442 INFO [train.py:1198] (0/2) Epoch 24, batch 5350, loss[loss=0.2136, ctc_loss=0.1419, cr_loss=0.3582, over 20957.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1566, cr_loss=0.3793, over 4081867.68 frames. ], batch size: 48, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:31:23,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=431630.0, ans=0.025 2024-09-16 10:31:28,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-09-16 10:31:42,382 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.060e+02 2.198e+02 2.307e+02 3.212e+02, threshold=4.396e+02, percent-clipped=0.0 2024-09-16 10:31:48,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=431686.6666666667, ans=0.0 2024-09-16 10:32:33,235 INFO [train.py:1198] (0/2) Epoch 24, batch 5400, loss[loss=0.2266, ctc_loss=0.1508, cr_loss=0.3786, over 21064.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1566, cr_loss=0.3793, over 4082591.13 frames. ], batch size: 56, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:33:47,701 INFO [train.py:1198] (0/2) Epoch 24, batch 5450, loss[loss=0.1993, ctc_loss=0.1302, cr_loss=0.3454, over 20937.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1568, cr_loss=0.3795, over 4091171.34 frames. ], batch size: 51, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:34:11,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.145e+02 2.255e+02 2.435e+02 3.827e+02, threshold=4.511e+02, percent-clipped=0.0 2024-09-16 10:35:01,824 INFO [train.py:1198] (0/2) Epoch 24, batch 5500, loss[loss=0.2543, ctc_loss=0.1702, cr_loss=0.4207, over 20705.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1564, cr_loss=0.3792, over 4102154.67 frames. ], batch size: 68, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:35:05,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=432055.0, ans=0.125 2024-09-16 10:35:06,687 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:35:40,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=432111.6666666667, ans=0.125 2024-09-16 10:35:52,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=432140.0, ans=0.0 2024-09-16 10:36:04,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=432168.3333333333, ans=0.09899494936611666 2024-09-16 10:36:04,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=432168.3333333333, ans=0.125 2024-09-16 10:36:10,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=432168.3333333333, ans=0.2 2024-09-16 10:36:14,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=432196.6666666667, ans=0.125 2024-09-16 10:36:15,848 INFO [train.py:1198] (0/2) Epoch 24, batch 5550, loss[loss=0.1841, ctc_loss=0.1223, cr_loss=0.3091, over 20952.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1573, cr_loss=0.3804, over 4096701.32 frames. ], batch size: 49, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:36:20,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432196.6666666667, ans=0.1 2024-09-16 10:36:36,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=15.0 2024-09-16 10:36:39,552 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.100e+02 2.253e+02 2.429e+02 5.430e+02, threshold=4.505e+02, percent-clipped=1.0 2024-09-16 10:36:47,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=432253.3333333333, ans=0.125 2024-09-16 10:37:12,319 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.67 vs. limit=15.0 2024-09-16 10:37:15,896 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:37:16,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=432310.0, ans=0.07 2024-09-16 10:37:29,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432338.3333333333, ans=0.1 2024-09-16 10:37:30,389 INFO [train.py:1198] (0/2) Epoch 24, batch 5600, loss[loss=0.2392, ctc_loss=0.163, cr_loss=0.3809, over 20257.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1567, cr_loss=0.3793, over 4100826.89 frames. ], batch size: 74, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:37:31,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-09-16 10:37:43,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=432338.3333333333, ans=0.0 2024-09-16 10:37:52,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=432366.6666666667, ans=0.125 2024-09-16 10:38:17,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=432395.0, ans=0.0 2024-09-16 10:38:46,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=432451.6666666667, ans=0.125 2024-09-16 10:38:49,676 INFO [train.py:1198] (0/2) Epoch 24, batch 5650, loss[loss=0.2844, ctc_loss=0.2026, cr_loss=0.4088, over 14407.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1565, cr_loss=0.3791, over 4090121.32 frames. ], batch size: 150, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:38:58,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2024-09-16 10:39:13,290 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.092e+02 2.271e+02 2.405e+02 5.610e+02, threshold=4.543e+02, percent-clipped=1.0 2024-09-16 10:39:51,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=432593.3333333333, ans=0.125 2024-09-16 10:40:04,196 INFO [train.py:1198] (0/2) Epoch 24, batch 5700, loss[loss=0.2152, ctc_loss=0.1444, cr_loss=0.3541, over 20908.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1565, cr_loss=0.3787, over 4095416.86 frames. ], batch size: 54, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:40:10,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=432621.6666666667, ans=0.0 2024-09-16 10:40:28,809 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:40:53,876 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:40:58,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=432706.6666666667, ans=0.2 2024-09-16 10:41:00,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=22.5 2024-09-16 10:41:06,406 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.95 vs. limit=10.0 2024-09-16 10:41:18,865 INFO [train.py:1198] (0/2) Epoch 24, batch 5750, loss[loss=0.2558, ctc_loss=0.1774, cr_loss=0.3924, over 21031.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1559, cr_loss=0.3773, over 4103487.47 frames. ], batch size: 62, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:41:43,280 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.109e+02 2.228e+02 2.426e+02 3.110e+02, threshold=4.455e+02, percent-clipped=0.0 2024-09-16 10:42:02,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=432848.3333333333, ans=0.125 2024-09-16 10:42:22,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=432876.6666666667, ans=0.025 2024-09-16 10:42:33,530 INFO [train.py:1198] (0/2) Epoch 24, batch 5800, loss[loss=0.2588, ctc_loss=0.1778, cr_loss=0.4048, over 20091.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1548, cr_loss=0.3759, over 4110179.39 frames. ], batch size: 80, lr: 3.44e-03, grad_scale: 32.0 2024-09-16 10:42:47,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=432933.3333333333, ans=0.025 2024-09-16 10:42:59,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-09-16 10:43:48,166 INFO [train.py:1198] (0/2) Epoch 24, batch 5850, loss[loss=0.2582, ctc_loss=0.176, cr_loss=0.4109, over 20616.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1552, cr_loss=0.3763, over 4116957.16 frames. ], batch size: 71, lr: 3.43e-03, grad_scale: 32.0 2024-09-16 10:43:48,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=433046.6666666667, ans=0.0 2024-09-16 10:43:54,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=433046.6666666667, ans=0.125 2024-09-16 10:44:07,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=433075.0, ans=0.125 2024-09-16 10:44:11,921 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.113e+02 2.236e+02 2.375e+02 3.033e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-16 10:44:30,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=433103.3333333333, ans=0.125 2024-09-16 10:44:52,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=433160.0, ans=0.0 2024-09-16 10:45:02,032 INFO [train.py:1198] (0/2) Epoch 24, batch 5900, loss[loss=0.2412, ctc_loss=0.1629, cr_loss=0.3917, over 21027.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1553, cr_loss=0.3763, over 4112887.44 frames. ], batch size: 61, lr: 3.43e-03, grad_scale: 32.0 2024-09-16 10:45:08,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=433188.3333333333, ans=0.2 2024-09-16 10:45:10,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=433188.3333333333, ans=15.0 2024-09-16 10:45:14,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=433188.3333333333, ans=0.125 2024-09-16 10:45:44,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-16 10:45:47,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=433273.3333333333, ans=0.035 2024-09-16 10:45:54,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433273.3333333333, ans=0.1 2024-09-16 10:46:16,522 INFO [train.py:1198] (0/2) Epoch 24, batch 5950, loss[loss=0.2501, ctc_loss=0.1691, cr_loss=0.4052, over 21039.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1551, cr_loss=0.3763, over 4105827.12 frames. ], batch size: 62, lr: 3.43e-03, grad_scale: 32.0 2024-09-16 10:46:45,288 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.094e+02 2.238e+02 2.450e+02 3.969e+02, threshold=4.477e+02, percent-clipped=0.0 2024-09-16 10:47:36,180 INFO [train.py:1198] (0/2) Epoch 24, batch 6000, loss[loss=0.2249, ctc_loss=0.1504, cr_loss=0.3729, over 20882.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1548, cr_loss=0.3767, over 4116141.30 frames. ], batch size: 54, lr: 3.43e-03, grad_scale: 32.0 2024-09-16 10:47:36,181 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 10:48:00,677 INFO [train.py:1230] (0/2) Epoch 24, validation: loss=0.04257, ctc_loss=0.04257, cr_loss=1.116e-14, over 944034.00 frames. 2024-09-16 10:48:00,677 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 10:48:09,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=433471.6666666667, ans=0.125 2024-09-16 10:48:25,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2024-09-16 10:48:37,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=433528.3333333333, ans=0.0 2024-09-16 10:49:07,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=433585.0, ans=0.125 2024-09-16 10:49:08,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=433585.0, ans=0.125 2024-09-16 10:49:09,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-16 10:49:15,521 INFO [train.py:1198] (0/2) Epoch 24, batch 6050, loss[loss=0.1934, ctc_loss=0.1273, cr_loss=0.3309, over 20976.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1553, cr_loss=0.3772, over 4116000.46 frames. ], batch size: 48, lr: 3.43e-03, grad_scale: 32.0 2024-09-16 10:49:30,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=433641.6666666667, ans=0.125 2024-09-16 10:49:36,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=433641.6666666667, ans=0.0 2024-09-16 10:49:41,718 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.070e+02 2.201e+02 2.383e+02 3.042e+02, threshold=4.402e+02, percent-clipped=0.0 2024-09-16 10:49:55,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=433670.0, ans=0.0 2024-09-16 10:49:57,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.76 vs. limit=6.0 2024-09-16 10:50:10,565 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:50:11,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=433698.3333333333, ans=0.125 2024-09-16 10:50:30,412 INFO [train.py:1198] (0/2) Epoch 24, batch 6100, loss[loss=0.2279, ctc_loss=0.1514, cr_loss=0.3828, over 21026.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.155, cr_loss=0.3766, over 4119929.74 frames. ], batch size: 63, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:50:39,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=433755.0, ans=0.125 2024-09-16 10:50:45,455 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:50:58,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=433811.6666666667, ans=0.125 2024-09-16 10:51:04,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=433811.6666666667, ans=0.125 2024-09-16 10:51:17,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=433840.0, ans=0.025 2024-09-16 10:51:43,593 INFO [train.py:1198] (0/2) Epoch 24, batch 6150, loss[loss=0.2414, ctc_loss=0.1643, cr_loss=0.3854, over 20935.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1567, cr_loss=0.379, over 4111827.33 frames. ], batch size: 64, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:51:54,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=433896.6666666667, ans=0.125 2024-09-16 10:52:07,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2024-09-16 10:52:08,612 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.136e+02 2.249e+02 2.391e+02 2.992e+02, threshold=4.498e+02, percent-clipped=0.0 2024-09-16 10:52:47,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=434010.0, ans=0.0 2024-09-16 10:52:49,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=434010.0, ans=0.025 2024-09-16 10:52:57,934 INFO [train.py:1198] (0/2) Epoch 24, batch 6200, loss[loss=0.2083, ctc_loss=0.1384, cr_loss=0.3496, over 20767.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1568, cr_loss=0.3784, over 4101523.54 frames. ], batch size: 53, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:53:30,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=434095.0, ans=0.0 2024-09-16 10:53:43,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=434123.3333333333, ans=0.0 2024-09-16 10:53:52,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434123.3333333333, ans=0.1 2024-09-16 10:54:12,637 INFO [train.py:1198] (0/2) Epoch 24, batch 6250, loss[loss=0.2045, ctc_loss=0.1331, cr_loss=0.3568, over 19960.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.157, cr_loss=0.3777, over 4066247.23 frames. ], batch size: 44, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:54:38,248 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.094e+02 2.247e+02 2.534e+02 5.643e+02, threshold=4.493e+02, percent-clipped=1.0 2024-09-16 10:54:42,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=434236.6666666667, ans=0.1 2024-09-16 10:55:09,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=434265.0, ans=0.2 2024-09-16 10:55:26,757 INFO [train.py:1198] (0/2) Epoch 24, batch 6300, loss[loss=0.3127, ctc_loss=0.2223, cr_loss=0.4519, over 13720.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1581, cr_loss=0.3777, over 3998686.13 frames. ], batch size: 149, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:55:34,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-09-16 10:55:54,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2024-09-16 10:56:24,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=434435.0, ans=0.125 2024-09-16 10:56:39,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.53 vs. limit=5.0 2024-09-16 10:56:39,640 INFO [train.py:1198] (0/2) Epoch 24, batch 6350, loss[loss=0.2849, ctc_loss=0.2037, cr_loss=0.4056, over 14527.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1628, cr_loss=0.3807, over 3834268.54 frames. ], batch size: 150, lr: 3.43e-03, grad_scale: 16.0 2024-09-16 10:56:55,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=12.0 2024-09-16 10:57:04,535 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.354e+02 2.611e+02 2.822e+02 4.059e+02, threshold=5.222e+02, percent-clipped=0.0 2024-09-16 10:57:26,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=434548.3333333333, ans=0.125 2024-09-16 10:57:30,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=434548.3333333333, ans=0.0 2024-09-16 10:57:37,466 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-24.pt 2024-09-16 10:58:28,736 INFO [train.py:1198] (0/2) Epoch 25, batch 0, loss[loss=0.2213, ctc_loss=0.1452, cr_loss=0.3805, over 20807.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1452, cr_loss=0.3805, over 20807.00 frames. ], batch size: 53, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 10:58:28,737 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 10:58:47,379 INFO [train.py:1230] (0/2) Epoch 25, validation: loss=0.04275, ctc_loss=0.04275, cr_loss=1.118e-14, over 944034.00 frames. 2024-09-16 10:58:47,380 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 10:58:56,796 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 10:59:39,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=434661.6666666667, ans=0.125 2024-09-16 10:59:51,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=434690.0, ans=0.125 2024-09-16 11:00:03,128 INFO [train.py:1198] (0/2) Epoch 25, batch 50, loss[loss=0.2219, ctc_loss=0.1462, cr_loss=0.3782, over 20859.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.155, cr_loss=0.3764, over 928252.54 frames. ], batch size: 57, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 11:00:44,083 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.172e+02 2.295e+02 2.567e+02 4.286e+02, threshold=4.590e+02, percent-clipped=0.0 2024-09-16 11:00:44,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-09-16 11:00:58,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=434803.3333333333, ans=0.125 2024-09-16 11:01:18,999 INFO [train.py:1198] (0/2) Epoch 25, batch 100, loss[loss=0.2308, ctc_loss=0.1561, cr_loss=0.3735, over 19858.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1559, cr_loss=0.3782, over 1628764.68 frames. ], batch size: 44, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 11:01:40,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-09-16 11:02:04,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=434916.6666666667, ans=0.125 2024-09-16 11:02:21,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.78 vs. limit=15.0 2024-09-16 11:02:40,829 INFO [train.py:1198] (0/2) Epoch 25, batch 150, loss[loss=0.2211, ctc_loss=0.1505, cr_loss=0.3532, over 21044.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1555, cr_loss=0.3774, over 2184007.62 frames. ], batch size: 56, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 11:03:08,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=435030.0, ans=0.125 2024-09-16 11:03:13,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=435058.3333333333, ans=0.125 2024-09-16 11:03:16,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-16 11:03:17,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2024-09-16 11:03:19,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=435058.3333333333, ans=0.035 2024-09-16 11:03:21,867 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.130e+02 2.244e+02 2.399e+02 4.281e+02, threshold=4.488e+02, percent-clipped=0.0 2024-09-16 11:03:29,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=435086.6666666667, ans=0.0 2024-09-16 11:03:47,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=435115.0, ans=0.125 2024-09-16 11:03:56,294 INFO [train.py:1198] (0/2) Epoch 25, batch 200, loss[loss=0.2317, ctc_loss=0.1577, cr_loss=0.3702, over 20980.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1553, cr_loss=0.3782, over 2618254.41 frames. ], batch size: 58, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 11:03:58,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=15.0 2024-09-16 11:04:31,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=435200.0, ans=0.025 2024-09-16 11:04:46,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=435228.3333333333, ans=0.0 2024-09-16 11:04:49,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=435228.3333333333, ans=0.125 2024-09-16 11:04:50,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=435228.3333333333, ans=0.09899494936611666 2024-09-16 11:05:11,567 INFO [train.py:1198] (0/2) Epoch 25, batch 250, loss[loss=0.2268, ctc_loss=0.1516, cr_loss=0.3761, over 21055.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1555, cr_loss=0.378, over 2948262.21 frames. ], batch size: 53, lr: 3.36e-03, grad_scale: 32.0 2024-09-16 11:05:16,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=435285.0, ans=0.125 2024-09-16 11:05:27,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=435313.3333333333, ans=0.0 2024-09-16 11:05:40,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=435341.6666666667, ans=0.125 2024-09-16 11:05:52,003 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.099e+02 2.209e+02 2.392e+02 3.192e+02, threshold=4.418e+02, percent-clipped=0.0 2024-09-16 11:06:10,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=435398.3333333333, ans=0.0 2024-09-16 11:06:17,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=435398.3333333333, ans=0.125 2024-09-16 11:06:21,487 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2024-09-16 11:06:26,749 INFO [train.py:1198] (0/2) Epoch 25, batch 300, loss[loss=0.231, ctc_loss=0.156, cr_loss=0.3746, over 20631.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1555, cr_loss=0.3774, over 3204022.40 frames. ], batch size: 66, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:06:45,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-09-16 11:06:57,401 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:07:22,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435511.6666666667, ans=0.1 2024-09-16 11:07:25,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=435540.0, ans=0.0 2024-09-16 11:07:34,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2024-09-16 11:07:35,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.08 vs. limit=10.0 2024-09-16 11:07:45,320 INFO [train.py:1198] (0/2) Epoch 25, batch 350, loss[loss=0.1653, ctc_loss=0.1079, cr_loss=0.287, over 20319.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1549, cr_loss=0.3762, over 3407351.49 frames. ], batch size: 45, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:07:47,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435568.3333333333, ans=0.1 2024-09-16 11:07:56,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=12.0 2024-09-16 11:08:25,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.060e+02 2.170e+02 2.334e+02 3.522e+02, threshold=4.341e+02, percent-clipped=0.0 2024-09-16 11:08:26,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2024-09-16 11:08:38,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=435653.3333333333, ans=0.05 2024-09-16 11:08:53,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=435681.6666666667, ans=0.2 2024-09-16 11:08:53,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=435681.6666666667, ans=0.125 2024-09-16 11:09:03,511 INFO [train.py:1198] (0/2) Epoch 25, batch 400, loss[loss=0.2357, ctc_loss=0.1565, cr_loss=0.3963, over 20840.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1543, cr_loss=0.3761, over 3568123.32 frames. ], batch size: 59, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:09:05,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=435710.0, ans=0.0 2024-09-16 11:09:07,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2024-09-16 11:09:49,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=435795.0, ans=0.2 2024-09-16 11:09:52,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=435795.0, ans=0.125 2024-09-16 11:09:56,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=435795.0, ans=0.0 2024-09-16 11:10:10,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=435823.3333333333, ans=0.125 2024-09-16 11:10:18,934 INFO [train.py:1198] (0/2) Epoch 25, batch 450, loss[loss=0.2371, ctc_loss=0.16, cr_loss=0.3857, over 20688.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1555, cr_loss=0.3779, over 3679735.82 frames. ], batch size: 71, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:10:28,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=435851.6666666667, ans=12.0 2024-09-16 11:10:35,695 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:10:58,925 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.127e+02 2.223e+02 2.369e+02 3.185e+02, threshold=4.447e+02, percent-clipped=0.0 2024-09-16 11:11:15,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=435936.6666666667, ans=0.0 2024-09-16 11:11:23,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=435965.0, ans=0.0 2024-09-16 11:11:33,820 INFO [train.py:1198] (0/2) Epoch 25, batch 500, loss[loss=0.2096, ctc_loss=0.1386, cr_loss=0.355, over 20814.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1549, cr_loss=0.3763, over 3769639.22 frames. ], batch size: 59, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:11:37,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=435993.3333333333, ans=0.125 2024-09-16 11:11:41,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=435993.3333333333, ans=0.125 2024-09-16 11:11:41,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=435993.3333333333, ans=0.0 2024-09-16 11:12:48,633 INFO [train.py:1198] (0/2) Epoch 25, batch 550, loss[loss=0.2198, ctc_loss=0.1461, cr_loss=0.3685, over 21009.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1547, cr_loss=0.3763, over 3856305.80 frames. ], batch size: 63, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:13:31,942 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.118e+02 2.233e+02 2.424e+02 4.104e+02, threshold=4.467e+02, percent-clipped=0.0 2024-09-16 11:13:38,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=436220.0, ans=0.125 2024-09-16 11:14:09,440 INFO [train.py:1198] (0/2) Epoch 25, batch 600, loss[loss=0.2632, ctc_loss=0.1776, cr_loss=0.428, over 20831.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1554, cr_loss=0.3774, over 3901478.26 frames. ], batch size: 65, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:14:38,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=436333.3333333333, ans=0.0 2024-09-16 11:14:53,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=436361.6666666667, ans=0.025 2024-09-16 11:15:02,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=436361.6666666667, ans=0.125 2024-09-16 11:15:15,582 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2024-09-16 11:15:18,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=436390.0, ans=0.025 2024-09-16 11:15:25,456 INFO [train.py:1198] (0/2) Epoch 25, batch 650, loss[loss=0.195, ctc_loss=0.1294, cr_loss=0.3279, over 19464.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.155, cr_loss=0.377, over 3936589.95 frames. ], batch size: 43, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:16:05,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.108e+02 2.225e+02 2.386e+02 3.938e+02, threshold=4.450e+02, percent-clipped=0.0 2024-09-16 11:16:10,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=436503.3333333333, ans=0.125 2024-09-16 11:16:19,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=436503.3333333333, ans=0.04949747468305833 2024-09-16 11:16:40,802 INFO [train.py:1198] (0/2) Epoch 25, batch 700, loss[loss=0.192, ctc_loss=0.1275, cr_loss=0.3224, over 20954.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1556, cr_loss=0.378, over 3976390.46 frames. ], batch size: 49, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:16:50,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=436560.0, ans=0.125 2024-09-16 11:16:53,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=436560.0, ans=0.0 2024-09-16 11:16:56,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=436588.3333333333, ans=0.2 2024-09-16 11:17:17,737 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-09-16 11:17:56,520 INFO [train.py:1198] (0/2) Epoch 25, batch 750, loss[loss=0.2424, ctc_loss=0.1634, cr_loss=0.3953, over 19370.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3785, over 4000662.15 frames. ], batch size: 90, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:17:58,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=436701.6666666667, ans=0.125 2024-09-16 11:17:58,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=436701.6666666667, ans=0.125 2024-09-16 11:18:19,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=436730.0, ans=0.0 2024-09-16 11:18:36,770 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.158e+02 2.259e+02 2.423e+02 8.420e+02, threshold=4.519e+02, percent-clipped=2.0 2024-09-16 11:19:13,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=436843.3333333333, ans=0.125 2024-09-16 11:19:14,882 INFO [train.py:1198] (0/2) Epoch 25, batch 800, loss[loss=0.1803, ctc_loss=0.1171, cr_loss=0.3161, over 19964.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1563, cr_loss=0.3796, over 4005135.13 frames. ], batch size: 44, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:19:19,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=436843.3333333333, ans=0.125 2024-09-16 11:20:14,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=436928.3333333333, ans=0.0 2024-09-16 11:20:33,591 INFO [train.py:1198] (0/2) Epoch 25, batch 850, loss[loss=0.2713, ctc_loss=0.1882, cr_loss=0.4152, over 20856.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1567, cr_loss=0.3792, over 4012737.22 frames. ], batch size: 65, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:20:47,543 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:20:55,339 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:20:59,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=437013.3333333333, ans=0.2 2024-09-16 11:21:14,559 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.155e+02 2.302e+02 2.474e+02 2.779e+02, threshold=4.604e+02, percent-clipped=0.0 2024-09-16 11:21:22,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=437070.0, ans=0.125 2024-09-16 11:21:25,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=437070.0, ans=0.125 2024-09-16 11:21:25,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437070.0, ans=0.1 2024-09-16 11:21:31,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437070.0, ans=0.1 2024-09-16 11:21:33,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2024-09-16 11:21:36,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=437098.3333333333, ans=0.2 2024-09-16 11:21:49,802 INFO [train.py:1198] (0/2) Epoch 25, batch 900, loss[loss=0.201, ctc_loss=0.1336, cr_loss=0.3373, over 20956.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1563, cr_loss=0.379, over 4037778.15 frames. ], batch size: 48, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:22:06,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=437155.0, ans=0.025 2024-09-16 11:22:39,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=437211.6666666667, ans=0.5 2024-09-16 11:22:58,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.87 vs. limit=10.0 2024-09-16 11:23:04,996 INFO [train.py:1198] (0/2) Epoch 25, batch 950, loss[loss=0.2247, ctc_loss=0.1518, cr_loss=0.3645, over 20753.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1559, cr_loss=0.378, over 4043519.14 frames. ], batch size: 71, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:23:05,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=437268.3333333333, ans=0.125 2024-09-16 11:23:11,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437268.3333333333, ans=0.1 2024-09-16 11:23:40,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=437325.0, ans=0.0 2024-09-16 11:23:42,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2024-09-16 11:23:45,581 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.128e+02 2.232e+02 2.411e+02 3.684e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-16 11:23:53,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=437353.3333333333, ans=0.125 2024-09-16 11:24:20,272 INFO [train.py:1198] (0/2) Epoch 25, batch 1000, loss[loss=0.1941, ctc_loss=0.129, cr_loss=0.3252, over 21064.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3785, over 4056830.17 frames. ], batch size: 53, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:24:29,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437410.0, ans=0.1 2024-09-16 11:24:38,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=437438.3333333333, ans=0.125 2024-09-16 11:24:43,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=437438.3333333333, ans=0.0 2024-09-16 11:25:05,031 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:25:06,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=437466.6666666667, ans=0.125 2024-09-16 11:25:10,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=437495.0, ans=0.125 2024-09-16 11:25:20,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=437495.0, ans=0.125 2024-09-16 11:25:26,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437523.3333333333, ans=0.1 2024-09-16 11:25:37,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=437523.3333333333, ans=0.125 2024-09-16 11:25:37,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=437523.3333333333, ans=0.0 2024-09-16 11:25:42,834 INFO [train.py:1198] (0/2) Epoch 25, batch 1050, loss[loss=0.2133, ctc_loss=0.1442, cr_loss=0.3455, over 20971.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1547, cr_loss=0.3767, over 4072598.08 frames. ], batch size: 55, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:25:46,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-09-16 11:26:23,720 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.109e+02 2.259e+02 2.389e+02 2.874e+02, threshold=4.518e+02, percent-clipped=0.0 2024-09-16 11:26:28,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=437636.6666666667, ans=0.2 2024-09-16 11:26:58,791 INFO [train.py:1198] (0/2) Epoch 25, batch 1100, loss[loss=0.1959, ctc_loss=0.128, cr_loss=0.3391, over 19017.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1546, cr_loss=0.3767, over 4075234.75 frames. ], batch size: 42, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:27:10,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=437693.3333333333, ans=0.0 2024-09-16 11:27:14,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=437721.6666666667, ans=0.125 2024-09-16 11:27:47,986 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:28:15,008 INFO [train.py:1198] (0/2) Epoch 25, batch 1150, loss[loss=0.1982, ctc_loss=0.1308, cr_loss=0.3366, over 19532.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1542, cr_loss=0.3758, over 4078060.44 frames. ], batch size: 43, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:28:17,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=437835.0, ans=0.0 2024-09-16 11:28:40,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2024-09-16 11:28:55,869 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.095e+02 2.231e+02 2.395e+02 4.651e+02, threshold=4.462e+02, percent-clipped=1.0 2024-09-16 11:29:02,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=437920.0, ans=0.125 2024-09-16 11:29:29,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=437976.6666666667, ans=0.125 2024-09-16 11:29:31,011 INFO [train.py:1198] (0/2) Epoch 25, batch 1200, loss[loss=0.2341, ctc_loss=0.1589, cr_loss=0.3761, over 20897.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1546, cr_loss=0.3762, over 4085743.30 frames. ], batch size: 54, lr: 3.35e-03, grad_scale: 32.0 2024-09-16 11:29:38,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=437976.6666666667, ans=0.0 2024-09-16 11:29:49,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=22.5 2024-09-16 11:30:00,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=438033.3333333333, ans=0.025 2024-09-16 11:30:01,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=438033.3333333333, ans=0.125 2024-09-16 11:30:04,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=438033.3333333333, ans=0.2 2024-09-16 11:30:24,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=438061.6666666667, ans=0.0 2024-09-16 11:30:49,245 INFO [train.py:1198] (0/2) Epoch 25, batch 1250, loss[loss=0.2363, ctc_loss=0.1598, cr_loss=0.3827, over 20837.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1547, cr_loss=0.3763, over 4081964.19 frames. ], batch size: 59, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:30:55,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=438118.3333333333, ans=0.125 2024-09-16 11:31:03,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=438146.6666666667, ans=0.1 2024-09-16 11:31:12,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=438146.6666666667, ans=0.125 2024-09-16 11:31:14,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=438146.6666666667, ans=0.0 2024-09-16 11:31:33,253 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.109e+02 2.249e+02 2.363e+02 4.221e+02, threshold=4.497e+02, percent-clipped=0.0 2024-09-16 11:31:35,487 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2024-09-16 11:32:08,115 INFO [train.py:1198] (0/2) Epoch 25, batch 1300, loss[loss=0.2066, ctc_loss=0.1359, cr_loss=0.3534, over 20961.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.155, cr_loss=0.377, over 4075178.77 frames. ], batch size: 50, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:32:32,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=438288.3333333333, ans=0.125 2024-09-16 11:32:43,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=438316.6666666667, ans=0.2 2024-09-16 11:32:50,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=438316.6666666667, ans=0.125 2024-09-16 11:33:20,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=438373.3333333333, ans=0.1 2024-09-16 11:33:23,586 INFO [train.py:1198] (0/2) Epoch 25, batch 1350, loss[loss=0.2304, ctc_loss=0.1575, cr_loss=0.3645, over 20790.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1554, cr_loss=0.378, over 4093306.92 frames. ], batch size: 53, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:33:24,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=438401.6666666667, ans=22.5 2024-09-16 11:33:46,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=438430.0, ans=0.025 2024-09-16 11:34:00,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=438458.3333333333, ans=0.2 2024-09-16 11:34:04,457 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.089e+02 2.196e+02 2.363e+02 3.106e+02, threshold=4.393e+02, percent-clipped=0.0 2024-09-16 11:34:39,043 INFO [train.py:1198] (0/2) Epoch 25, batch 1400, loss[loss=0.2387, ctc_loss=0.1608, cr_loss=0.3898, over 20877.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1549, cr_loss=0.3773, over 4094925.15 frames. ], batch size: 54, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:34:46,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=438543.3333333333, ans=0.125 2024-09-16 11:35:42,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2024-09-16 11:35:43,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=438656.6666666667, ans=0.125 2024-09-16 11:35:55,155 INFO [train.py:1198] (0/2) Epoch 25, batch 1450, loss[loss=0.2347, ctc_loss=0.1569, cr_loss=0.3888, over 21027.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1557, cr_loss=0.3783, over 4090354.50 frames. ], batch size: 63, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:36:36,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=438741.6666666667, ans=0.125 2024-09-16 11:36:38,991 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.692e+02 2.070e+02 2.226e+02 2.366e+02 7.202e+02, threshold=4.452e+02, percent-clipped=1.0 2024-09-16 11:36:56,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=438770.0, ans=0.0 2024-09-16 11:37:16,840 INFO [train.py:1198] (0/2) Epoch 25, batch 1500, loss[loss=0.2, ctc_loss=0.1309, cr_loss=0.3455, over 20936.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1551, cr_loss=0.3777, over 4105095.21 frames. ], batch size: 48, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:38:13,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=438911.6666666667, ans=0.0 2024-09-16 11:38:31,989 INFO [train.py:1198] (0/2) Epoch 25, batch 1550, loss[loss=0.2354, ctc_loss=0.16, cr_loss=0.3767, over 20796.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1546, cr_loss=0.3765, over 4113241.71 frames. ], batch size: 53, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:38:32,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=438968.3333333333, ans=0.125 2024-09-16 11:38:56,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=438996.6666666667, ans=0.125 2024-09-16 11:39:12,918 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.109e+02 2.219e+02 2.398e+02 2.849e+02, threshold=4.438e+02, percent-clipped=0.0 2024-09-16 11:39:25,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=439053.3333333333, ans=0.125 2024-09-16 11:39:41,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=439081.6666666667, ans=0.2 2024-09-16 11:39:48,413 INFO [train.py:1198] (0/2) Epoch 25, batch 1600, loss[loss=0.2139, ctc_loss=0.1399, cr_loss=0.37, over 20984.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1544, cr_loss=0.376, over 4112691.03 frames. ], batch size: 48, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:40:14,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=439138.3333333333, ans=0.0 2024-09-16 11:40:33,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=439195.0, ans=0.2 2024-09-16 11:40:34,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=439195.0, ans=0.125 2024-09-16 11:40:49,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=439223.3333333333, ans=10.0 2024-09-16 11:41:02,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.35 vs. limit=6.0 2024-09-16 11:41:04,793 INFO [train.py:1198] (0/2) Epoch 25, batch 1650, loss[loss=0.2312, ctc_loss=0.1565, cr_loss=0.3734, over 20976.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1541, cr_loss=0.3753, over 4095535.36 frames. ], batch size: 64, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:41:05,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=439251.6666666667, ans=0.125 2024-09-16 11:41:41,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=439308.3333333333, ans=0.04949747468305833 2024-09-16 11:41:43,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=439308.3333333333, ans=0.025 2024-09-16 11:41:45,819 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.122e+02 2.248e+02 2.389e+02 3.064e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-16 11:42:23,680 INFO [train.py:1198] (0/2) Epoch 25, batch 1700, loss[loss=0.2363, ctc_loss=0.1613, cr_loss=0.3754, over 19332.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1543, cr_loss=0.3751, over 4100183.87 frames. ], batch size: 90, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:42:36,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=439393.3333333333, ans=0.125 2024-09-16 11:42:42,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=439421.6666666667, ans=0.125 2024-09-16 11:42:45,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=439421.6666666667, ans=0.125 2024-09-16 11:42:58,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=439450.0, ans=0.0 2024-09-16 11:43:03,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=439450.0, ans=0.125 2024-09-16 11:43:04,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=439450.0, ans=0.125 2024-09-16 11:43:30,580 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:43:41,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=439535.0, ans=0.125 2024-09-16 11:43:42,314 INFO [train.py:1198] (0/2) Epoch 25, batch 1750, loss[loss=0.2164, ctc_loss=0.1454, cr_loss=0.3554, over 21008.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1554, cr_loss=0.3771, over 4101300.86 frames. ], batch size: 52, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:44:03,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=439563.3333333333, ans=0.125 2024-09-16 11:44:18,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=439591.6666666667, ans=0.025 2024-09-16 11:44:24,562 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.709e+02 2.091e+02 2.217e+02 2.370e+02 3.258e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-16 11:44:34,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=439620.0, ans=0.125 2024-09-16 11:44:58,088 INFO [train.py:1198] (0/2) Epoch 25, batch 1800, loss[loss=0.225, ctc_loss=0.1486, cr_loss=0.382, over 20781.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1564, cr_loss=0.3793, over 4100764.45 frames. ], batch size: 53, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:45:16,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=439705.0, ans=0.125 2024-09-16 11:45:19,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=439705.0, ans=0.125 2024-09-16 11:46:06,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=439790.0, ans=0.1 2024-09-16 11:46:14,149 INFO [train.py:1198] (0/2) Epoch 25, batch 1850, loss[loss=0.2079, ctc_loss=0.139, cr_loss=0.3446, over 21076.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1558, cr_loss=0.3782, over 4106881.98 frames. ], batch size: 53, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:46:17,933 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2024-09-16 11:46:19,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=439818.3333333333, ans=0.125 2024-09-16 11:46:55,505 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.554e-02 2024-09-16 11:46:56,569 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.129e+02 2.241e+02 2.368e+02 4.107e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-16 11:47:15,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-09-16 11:47:29,946 INFO [train.py:1198] (0/2) Epoch 25, batch 1900, loss[loss=0.2, ctc_loss=0.1335, cr_loss=0.3326, over 20952.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1552, cr_loss=0.3771, over 4106830.09 frames. ], batch size: 48, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:47:52,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=439988.3333333333, ans=0.025 2024-09-16 11:48:03,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-09-16 11:48:19,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=440045.0, ans=0.0 2024-09-16 11:48:24,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=440045.0, ans=0.07 2024-09-16 11:48:49,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=440101.6666666667, ans=0.125 2024-09-16 11:48:50,758 INFO [train.py:1198] (0/2) Epoch 25, batch 1950, loss[loss=0.2202, ctc_loss=0.1444, cr_loss=0.379, over 20966.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.155, cr_loss=0.3766, over 4102874.13 frames. ], batch size: 55, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:48:52,928 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-09-16 11:48:54,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=22.5 2024-09-16 11:48:57,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=440101.6666666667, ans=0.0 2024-09-16 11:49:02,095 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=15.0 2024-09-16 11:49:32,896 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.107e+02 2.217e+02 2.374e+02 3.462e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-16 11:49:48,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440186.6666666667, ans=0.1 2024-09-16 11:49:53,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2024-09-16 11:50:04,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=440243.3333333333, ans=0.0 2024-09-16 11:50:05,902 INFO [train.py:1198] (0/2) Epoch 25, batch 2000, loss[loss=0.1779, ctc_loss=0.1161, cr_loss=0.3086, over 19827.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1545, cr_loss=0.3759, over 4106257.41 frames. ], batch size: 44, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:50:19,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=440271.6666666667, ans=0.95 2024-09-16 11:50:44,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-16 11:51:15,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=440356.6666666667, ans=0.0 2024-09-16 11:51:18,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440356.6666666667, ans=0.1 2024-09-16 11:51:21,445 INFO [train.py:1198] (0/2) Epoch 25, batch 2050, loss[loss=0.1964, ctc_loss=0.1288, cr_loss=0.3378, over 20961.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1545, cr_loss=0.3764, over 4105528.05 frames. ], batch size: 49, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:51:53,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-09-16 11:51:56,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=440441.6666666667, ans=0.125 2024-09-16 11:52:03,710 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.081e+02 2.211e+02 2.389e+02 8.247e+02, threshold=4.422e+02, percent-clipped=1.0 2024-09-16 11:52:25,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2024-09-16 11:52:34,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=440498.3333333333, ans=0.035 2024-09-16 11:52:36,731 INFO [train.py:1198] (0/2) Epoch 25, batch 2100, loss[loss=0.2438, ctc_loss=0.1661, cr_loss=0.3884, over 20738.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1548, cr_loss=0.3773, over 4112684.63 frames. ], batch size: 71, lr: 3.34e-03, grad_scale: 32.0 2024-09-16 11:53:23,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=440611.6666666667, ans=0.05 2024-09-16 11:53:25,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440611.6666666667, ans=0.1 2024-09-16 11:53:54,004 INFO [train.py:1198] (0/2) Epoch 25, batch 2150, loss[loss=0.2401, ctc_loss=0.1624, cr_loss=0.3885, over 20814.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1557, cr_loss=0.3787, over 4112292.40 frames. ], batch size: 65, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 11:53:57,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=440668.3333333333, ans=0.125 2024-09-16 11:53:58,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=440668.3333333333, ans=0.0 2024-09-16 11:54:05,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=440668.3333333333, ans=0.2 2024-09-16 11:54:11,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=440696.6666666667, ans=0.2 2024-09-16 11:54:32,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=440725.0, ans=0.0 2024-09-16 11:54:33,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=440725.0, ans=0.025 2024-09-16 11:54:39,183 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.100e+02 2.245e+02 2.441e+02 3.108e+02, threshold=4.490e+02, percent-clipped=0.0 2024-09-16 11:55:12,559 INFO [train.py:1198] (0/2) Epoch 25, batch 2200, loss[loss=0.2738, ctc_loss=0.1886, cr_loss=0.4259, over 18415.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1556, cr_loss=0.378, over 4114633.59 frames. ], batch size: 108, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 11:55:12,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=440810.0, ans=0.125 2024-09-16 11:55:14,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=440810.0, ans=0.125 2024-09-16 11:55:28,246 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 11:56:07,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440895.0, ans=0.1 2024-09-16 11:56:28,281 INFO [train.py:1198] (0/2) Epoch 25, batch 2250, loss[loss=0.28, ctc_loss=0.1967, cr_loss=0.4163, over 18393.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1561, cr_loss=0.3795, over 4110579.14 frames. ], batch size: 108, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 11:56:53,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440980.0, ans=0.1 2024-09-16 11:57:11,200 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.095e+02 2.232e+02 2.386e+02 3.697e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-16 11:57:16,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=441036.6666666667, ans=0.125 2024-09-16 11:57:38,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=441065.0, ans=0.0 2024-09-16 11:57:44,490 INFO [train.py:1198] (0/2) Epoch 25, batch 2300, loss[loss=0.2533, ctc_loss=0.1675, cr_loss=0.4292, over 20929.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1556, cr_loss=0.3781, over 4104252.08 frames. ], batch size: 60, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 11:58:31,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441178.3333333333, ans=0.1 2024-09-16 11:58:50,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441206.6666666667, ans=0.1 2024-09-16 11:59:00,366 INFO [train.py:1198] (0/2) Epoch 25, batch 2350, loss[loss=0.2104, ctc_loss=0.1411, cr_loss=0.3465, over 20813.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1547, cr_loss=0.3767, over 4109831.80 frames. ], batch size: 53, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 11:59:13,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=441235.0, ans=0.04949747468305833 2024-09-16 11:59:21,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441263.3333333333, ans=0.1 2024-09-16 11:59:24,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=441263.3333333333, ans=0.125 2024-09-16 11:59:27,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=441263.3333333333, ans=0.0 2024-09-16 11:59:45,196 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.115e+02 2.271e+02 2.439e+02 3.151e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-16 11:59:48,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441320.0, ans=0.1 2024-09-16 11:59:57,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=441320.0, ans=0.2 2024-09-16 12:00:21,665 INFO [train.py:1198] (0/2) Epoch 25, batch 2400, loss[loss=0.2447, ctc_loss=0.1646, cr_loss=0.4005, over 20319.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1547, cr_loss=0.3763, over 4111143.48 frames. ], batch size: 74, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:00:42,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-09-16 12:01:06,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=441461.6666666667, ans=0.07 2024-09-16 12:01:37,867 INFO [train.py:1198] (0/2) Epoch 25, batch 2450, loss[loss=0.2608, ctc_loss=0.179, cr_loss=0.4092, over 18374.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1548, cr_loss=0.3761, over 4106879.16 frames. ], batch size: 108, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:01:39,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441518.3333333333, ans=0.1 2024-09-16 12:02:10,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=441575.0, ans=0.125 2024-09-16 12:02:11,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=441575.0, ans=0.0 2024-09-16 12:02:20,219 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.147e+02 2.262e+02 2.417e+02 3.603e+02, threshold=4.523e+02, percent-clipped=0.0 2024-09-16 12:02:31,384 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 12:02:39,522 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=22.5 2024-09-16 12:02:51,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=441631.6666666667, ans=0.035 2024-09-16 12:02:53,835 INFO [train.py:1198] (0/2) Epoch 25, batch 2500, loss[loss=0.2226, ctc_loss=0.1477, cr_loss=0.3749, over 21070.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1544, cr_loss=0.376, over 4112348.19 frames. ], batch size: 56, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:03:00,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.81 vs. limit=12.0 2024-09-16 12:03:06,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=441660.0, ans=0.0 2024-09-16 12:03:41,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-16 12:03:48,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=441745.0, ans=0.2 2024-09-16 12:04:09,688 INFO [train.py:1198] (0/2) Epoch 25, batch 2550, loss[loss=0.2437, ctc_loss=0.163, cr_loss=0.4034, over 20884.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1536, cr_loss=0.3748, over 4111807.78 frames. ], batch size: 54, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:04:47,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=12.0 2024-09-16 12:04:54,531 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.101e+02 2.226e+02 2.396e+02 3.958e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-16 12:04:56,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=441886.6666666667, ans=0.0 2024-09-16 12:05:01,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=441886.6666666667, ans=0.0 2024-09-16 12:05:28,252 INFO [train.py:1198] (0/2) Epoch 25, batch 2600, loss[loss=0.1927, ctc_loss=0.1255, cr_loss=0.3363, over 21044.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1539, cr_loss=0.3747, over 4104505.71 frames. ], batch size: 53, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:05:36,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=441943.3333333333, ans=0.125 2024-09-16 12:05:59,611 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-156000.pt 2024-09-16 12:06:18,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=442028.3333333333, ans=0.025 2024-09-16 12:06:43,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=442056.6666666667, ans=0.125 2024-09-16 12:06:47,605 INFO [train.py:1198] (0/2) Epoch 25, batch 2650, loss[loss=0.2309, ctc_loss=0.1556, cr_loss=0.3766, over 20658.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1539, cr_loss=0.3744, over 4096410.81 frames. ], batch size: 71, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:06:49,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=442085.0, ans=0.125 2024-09-16 12:06:56,070 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-09-16 12:06:57,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-16 12:07:06,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=442113.3333333333, ans=0.025 2024-09-16 12:07:18,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=442141.6666666667, ans=0.0 2024-09-16 12:07:22,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=442141.6666666667, ans=0.2 2024-09-16 12:07:29,777 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.116e+02 2.227e+02 2.422e+02 3.317e+02, threshold=4.454e+02, percent-clipped=0.0 2024-09-16 12:07:31,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=442170.0, ans=0.125 2024-09-16 12:07:43,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=22.5 2024-09-16 12:07:58,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=442198.3333333333, ans=0.125 2024-09-16 12:08:02,702 INFO [train.py:1198] (0/2) Epoch 25, batch 2700, loss[loss=0.2112, ctc_loss=0.1419, cr_loss=0.3464, over 21012.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1536, cr_loss=0.3739, over 4088255.97 frames. ], batch size: 61, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:08:07,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=442226.6666666667, ans=0.025 2024-09-16 12:08:34,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=442283.3333333333, ans=0.2 2024-09-16 12:09:18,659 INFO [train.py:1198] (0/2) Epoch 25, batch 2750, loss[loss=0.2, ctc_loss=0.132, cr_loss=0.3399, over 20976.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1528, cr_loss=0.3729, over 4104038.55 frames. ], batch size: 50, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:09:59,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=442425.0, ans=0.0 2024-09-16 12:10:00,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.706e+02 2.054e+02 2.140e+02 2.316e+02 5.148e+02, threshold=4.279e+02, percent-clipped=1.0 2024-09-16 12:10:03,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=15.0 2024-09-16 12:10:33,967 INFO [train.py:1198] (0/2) Epoch 25, batch 2800, loss[loss=0.2223, ctc_loss=0.1482, cr_loss=0.3701, over 20797.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1542, cr_loss=0.376, over 4102300.92 frames. ], batch size: 53, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:10:45,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.10 vs. limit=15.0 2024-09-16 12:10:55,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=442538.3333333333, ans=0.125 2024-09-16 12:10:55,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=442538.3333333333, ans=0.04949747468305833 2024-09-16 12:10:58,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=442538.3333333333, ans=0.125 2024-09-16 12:11:55,876 INFO [train.py:1198] (0/2) Epoch 25, batch 2850, loss[loss=0.2055, ctc_loss=0.1367, cr_loss=0.3439, over 21086.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1534, cr_loss=0.3757, over 4116371.67 frames. ], batch size: 53, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:12:12,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=442680.0, ans=0.125 2024-09-16 12:12:37,819 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.093e+02 2.267e+02 2.452e+02 5.204e+02, threshold=4.534e+02, percent-clipped=1.0 2024-09-16 12:12:53,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442736.6666666667, ans=0.1 2024-09-16 12:13:10,840 INFO [train.py:1198] (0/2) Epoch 25, batch 2900, loss[loss=0.2499, ctc_loss=0.1703, cr_loss=0.3981, over 20940.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1543, cr_loss=0.3763, over 4095894.02 frames. ], batch size: 60, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:13:18,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=442793.3333333333, ans=0.07 2024-09-16 12:13:38,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=442821.6666666667, ans=0.0 2024-09-16 12:13:49,263 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-16 12:14:08,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=442878.3333333333, ans=0.07 2024-09-16 12:14:26,077 INFO [train.py:1198] (0/2) Epoch 25, batch 2950, loss[loss=0.2636, ctc_loss=0.1802, cr_loss=0.4174, over 20975.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1544, cr_loss=0.3763, over 4102285.30 frames. ], batch size: 67, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:14:37,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=442935.0, ans=0.0 2024-09-16 12:15:08,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.045e+02 2.167e+02 2.296e+02 5.762e+02, threshold=4.335e+02, percent-clipped=1.0 2024-09-16 12:15:16,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=443020.0, ans=0.2 2024-09-16 12:15:16,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=443020.0, ans=0.125 2024-09-16 12:15:41,119 INFO [train.py:1198] (0/2) Epoch 25, batch 3000, loss[loss=0.2623, ctc_loss=0.1824, cr_loss=0.3996, over 20980.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1547, cr_loss=0.3764, over 4104768.21 frames. ], batch size: 64, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:15:41,120 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 12:16:05,253 INFO [train.py:1230] (0/2) Epoch 25, validation: loss=0.0425, ctc_loss=0.0425, cr_loss=1.166e-14, over 944034.00 frames. 2024-09-16 12:16:05,254 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 12:16:17,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=443076.6666666667, ans=0.2 2024-09-16 12:16:19,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=443076.6666666667, ans=0.125 2024-09-16 12:16:24,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=443105.0, ans=0.125 2024-09-16 12:16:37,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=443133.3333333333, ans=0.0 2024-09-16 12:16:45,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=443133.3333333333, ans=0.125 2024-09-16 12:17:21,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=443190.0, ans=0.025 2024-09-16 12:17:27,411 INFO [train.py:1198] (0/2) Epoch 25, batch 3050, loss[loss=0.2108, ctc_loss=0.1406, cr_loss=0.3511, over 20628.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1537, cr_loss=0.3752, over 4102507.05 frames. ], batch size: 68, lr: 3.33e-03, grad_scale: 32.0 2024-09-16 12:17:46,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.23 vs. limit=10.0 2024-09-16 12:18:01,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=443275.0, ans=0.2 2024-09-16 12:18:09,988 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.144e+02 2.277e+02 2.434e+02 3.518e+02, threshold=4.553e+02, percent-clipped=0.0 2024-09-16 12:18:13,557 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 12:18:43,444 INFO [train.py:1198] (0/2) Epoch 25, batch 3100, loss[loss=0.292, ctc_loss=0.209, cr_loss=0.4149, over 14355.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1544, cr_loss=0.3764, over 4088951.03 frames. ], batch size: 149, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:19:54,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=443473.3333333333, ans=0.125 2024-09-16 12:19:58,604 INFO [train.py:1198] (0/2) Epoch 25, batch 3150, loss[loss=0.1911, ctc_loss=0.1297, cr_loss=0.3069, over 21016.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1544, cr_loss=0.3759, over 4085110.55 frames. ], batch size: 52, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:20:26,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2024-09-16 12:20:32,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=443558.3333333333, ans=0.95 2024-09-16 12:20:33,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=443558.3333333333, ans=0.125 2024-09-16 12:20:40,910 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.079e+02 2.229e+02 2.428e+02 4.413e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-16 12:21:14,329 INFO [train.py:1198] (0/2) Epoch 25, batch 3200, loss[loss=0.2602, ctc_loss=0.1768, cr_loss=0.4168, over 19399.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1547, cr_loss=0.3763, over 4090710.88 frames. ], batch size: 90, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:22:13,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=443728.3333333333, ans=0.05 2024-09-16 12:22:14,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=443728.3333333333, ans=0.0 2024-09-16 12:22:30,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443756.6666666667, ans=0.1 2024-09-16 12:22:32,987 INFO [train.py:1198] (0/2) Epoch 25, batch 3250, loss[loss=0.1889, ctc_loss=0.1232, cr_loss=0.3285, over 21051.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1547, cr_loss=0.3761, over 4087532.19 frames. ], batch size: 53, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:22:40,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=443785.0, ans=0.125 2024-09-16 12:22:50,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=443813.3333333333, ans=0.125 2024-09-16 12:23:18,448 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.065e+02 2.203e+02 2.349e+02 3.896e+02, threshold=4.406e+02, percent-clipped=0.0 2024-09-16 12:23:30,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=443870.0, ans=0.2 2024-09-16 12:23:35,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443898.3333333333, ans=0.1 2024-09-16 12:23:35,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=443898.3333333333, ans=0.125 2024-09-16 12:23:39,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.68 vs. limit=12.0 2024-09-16 12:23:49,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=443898.3333333333, ans=0.125 2024-09-16 12:23:51,737 INFO [train.py:1198] (0/2) Epoch 25, batch 3300, loss[loss=0.2604, ctc_loss=0.1766, cr_loss=0.4191, over 20832.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1545, cr_loss=0.3764, over 4090948.75 frames. ], batch size: 65, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:24:08,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=443955.0, ans=0.125 2024-09-16 12:24:13,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=443955.0, ans=0.2 2024-09-16 12:24:37,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=444011.6666666667, ans=0.0 2024-09-16 12:24:45,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=444011.6666666667, ans=0.125 2024-09-16 12:24:51,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=12.0 2024-09-16 12:25:07,660 INFO [train.py:1198] (0/2) Epoch 25, batch 3350, loss[loss=0.2051, ctc_loss=0.1364, cr_loss=0.3436, over 21000.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1549, cr_loss=0.3768, over 4081774.90 frames. ], batch size: 48, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:25:50,290 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.098e+02 2.234e+02 2.376e+02 4.221e+02, threshold=4.469e+02, percent-clipped=0.0 2024-09-16 12:26:07,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=444181.6666666667, ans=0.0 2024-09-16 12:26:24,047 INFO [train.py:1198] (0/2) Epoch 25, batch 3400, loss[loss=0.238, ctc_loss=0.162, cr_loss=0.3795, over 20995.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1554, cr_loss=0.3784, over 4096561.15 frames. ], batch size: 55, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:27:10,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=444295.0, ans=0.125 2024-09-16 12:27:13,778 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2024-09-16 12:27:27,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-09-16 12:27:33,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=444323.3333333333, ans=0.2 2024-09-16 12:27:40,326 INFO [train.py:1198] (0/2) Epoch 25, batch 3450, loss[loss=0.2094, ctc_loss=0.1399, cr_loss=0.3477, over 20973.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1554, cr_loss=0.3792, over 4109395.58 frames. ], batch size: 51, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:28:18,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=444408.3333333333, ans=0.0 2024-09-16 12:28:25,552 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.081e+02 2.215e+02 2.442e+02 5.986e+02, threshold=4.430e+02, percent-clipped=1.0 2024-09-16 12:29:01,819 INFO [train.py:1198] (0/2) Epoch 25, batch 3500, loss[loss=0.1832, ctc_loss=0.1191, cr_loss=0.3204, over 20958.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1558, cr_loss=0.3795, over 4099184.66 frames. ], batch size: 49, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:29:02,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=444493.3333333333, ans=0.0 2024-09-16 12:29:04,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2024-09-16 12:29:14,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=444493.3333333333, ans=0.1 2024-09-16 12:29:36,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=444550.0, ans=0.125 2024-09-16 12:29:38,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=444550.0, ans=0.025 2024-09-16 12:29:39,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=444550.0, ans=0.125 2024-09-16 12:30:14,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=444606.6666666667, ans=10.0 2024-09-16 12:30:17,207 INFO [train.py:1198] (0/2) Epoch 25, batch 3550, loss[loss=0.1948, ctc_loss=0.1296, cr_loss=0.3258, over 20963.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1557, cr_loss=0.3786, over 4096783.11 frames. ], batch size: 48, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:30:39,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=444663.3333333333, ans=0.0 2024-09-16 12:30:45,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=444663.3333333333, ans=0.2 2024-09-16 12:30:45,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=444663.3333333333, ans=0.125 2024-09-16 12:30:49,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=444691.6666666667, ans=0.125 2024-09-16 12:31:00,006 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.123e+02 2.257e+02 2.460e+02 4.749e+02, threshold=4.514e+02, percent-clipped=1.0 2024-09-16 12:31:33,423 INFO [train.py:1198] (0/2) Epoch 25, batch 3600, loss[loss=0.2095, ctc_loss=0.141, cr_loss=0.3422, over 20964.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1555, cr_loss=0.3784, over 4099128.54 frames. ], batch size: 49, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:31:43,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=444776.6666666667, ans=0.1 2024-09-16 12:31:56,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=444805.0, ans=0.125 2024-09-16 12:32:28,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=444861.6666666667, ans=0.0 2024-09-16 12:32:49,381 INFO [train.py:1198] (0/2) Epoch 25, batch 3650, loss[loss=0.264, ctc_loss=0.1813, cr_loss=0.4137, over 19513.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1544, cr_loss=0.3769, over 4109162.61 frames. ], batch size: 90, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:33:00,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=12.0 2024-09-16 12:33:21,807 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=22.5 2024-09-16 12:33:34,691 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.094e+02 2.222e+02 2.356e+02 2.929e+02, threshold=4.444e+02, percent-clipped=0.0 2024-09-16 12:33:47,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=445003.3333333333, ans=0.2 2024-09-16 12:33:58,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=445031.6666666667, ans=0.0 2024-09-16 12:34:02,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-09-16 12:34:08,506 INFO [train.py:1198] (0/2) Epoch 25, batch 3700, loss[loss=0.1765, ctc_loss=0.1182, cr_loss=0.2914, over 20956.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1541, cr_loss=0.3763, over 4111730.94 frames. ], batch size: 50, lr: 3.32e-03, grad_scale: 64.0 2024-09-16 12:34:56,512 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-16 12:35:01,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2024-09-16 12:35:21,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=445173.3333333333, ans=0.0 2024-09-16 12:35:26,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=445201.6666666667, ans=0.1 2024-09-16 12:35:27,551 INFO [train.py:1198] (0/2) Epoch 25, batch 3750, loss[loss=0.2732, ctc_loss=0.1876, cr_loss=0.428, over 20699.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1543, cr_loss=0.3769, over 4119186.99 frames. ], batch size: 68, lr: 3.32e-03, grad_scale: 64.0 2024-09-16 12:35:44,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=445230.0, ans=0.035 2024-09-16 12:36:10,134 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.048e+02 2.174e+02 2.276e+02 2.887e+02, threshold=4.348e+02, percent-clipped=0.0 2024-09-16 12:36:11,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=445286.6666666667, ans=0.025 2024-09-16 12:36:15,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=22.5 2024-09-16 12:36:43,497 INFO [train.py:1198] (0/2) Epoch 25, batch 3800, loss[loss=0.2192, ctc_loss=0.1461, cr_loss=0.3655, over 20877.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.154, cr_loss=0.377, over 4121201.22 frames. ], batch size: 54, lr: 3.32e-03, grad_scale: 64.0 2024-09-16 12:36:45,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=445343.3333333333, ans=0.125 2024-09-16 12:36:55,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=445343.3333333333, ans=0.1 2024-09-16 12:37:39,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=445428.3333333333, ans=0.2 2024-09-16 12:37:59,384 INFO [train.py:1198] (0/2) Epoch 25, batch 3850, loss[loss=0.2388, ctc_loss=0.1578, cr_loss=0.4047, over 20895.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1551, cr_loss=0.3787, over 4112257.75 frames. ], batch size: 54, lr: 3.32e-03, grad_scale: 64.0 2024-09-16 12:38:04,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=445485.0, ans=0.2 2024-09-16 12:38:43,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.127e+02 2.257e+02 2.453e+02 4.128e+02, threshold=4.514e+02, percent-clipped=0.0 2024-09-16 12:38:56,419 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=22.5 2024-09-16 12:39:05,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445598.3333333333, ans=0.1 2024-09-16 12:39:06,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=445598.3333333333, ans=0.125 2024-09-16 12:39:18,223 INFO [train.py:1198] (0/2) Epoch 25, batch 3900, loss[loss=0.2362, ctc_loss=0.1591, cr_loss=0.3856, over 21020.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1548, cr_loss=0.3777, over 4113096.10 frames. ], batch size: 62, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:40:07,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.96 vs. limit=10.0 2024-09-16 12:40:36,449 INFO [train.py:1198] (0/2) Epoch 25, batch 3950, loss[loss=0.184, ctc_loss=0.1193, cr_loss=0.3233, over 20990.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1557, cr_loss=0.379, over 4106662.40 frames. ], batch size: 52, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:40:39,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=445768.3333333333, ans=0.1 2024-09-16 12:41:11,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445825.0, ans=0.1 2024-09-16 12:41:20,322 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.095e+02 2.214e+02 2.396e+02 3.784e+02, threshold=4.428e+02, percent-clipped=0.0 2024-09-16 12:41:43,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=445881.6666666667, ans=0.0 2024-09-16 12:41:52,097 INFO [train.py:1198] (0/2) Epoch 25, batch 4000, loss[loss=0.2655, ctc_loss=0.1819, cr_loss=0.4181, over 18544.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1554, cr_loss=0.3783, over 4094354.45 frames. ], batch size: 108, lr: 3.32e-03, grad_scale: 32.0 2024-09-16 12:42:01,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=445910.0, ans=0.1 2024-09-16 12:42:04,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=445910.0, ans=0.125 2024-09-16 12:42:27,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=445966.6666666667, ans=0.125 2024-09-16 12:43:07,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=446051.6666666667, ans=0.0 2024-09-16 12:43:08,188 INFO [train.py:1198] (0/2) Epoch 25, batch 4050, loss[loss=0.2727, ctc_loss=0.1934, cr_loss=0.3961, over 13927.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1553, cr_loss=0.3777, over 4095620.89 frames. ], batch size: 150, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:43:13,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=446051.6666666667, ans=0.125 2024-09-16 12:43:15,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2024-09-16 12:43:23,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=446080.0, ans=0.2 2024-09-16 12:43:42,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=446108.3333333333, ans=0.125 2024-09-16 12:43:51,911 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.099e+02 2.220e+02 2.385e+02 4.787e+02, threshold=4.440e+02, percent-clipped=1.0 2024-09-16 12:44:21,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2024-09-16 12:44:22,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=446193.3333333333, ans=0.125 2024-09-16 12:44:23,851 INFO [train.py:1198] (0/2) Epoch 25, batch 4100, loss[loss=0.2092, ctc_loss=0.1378, cr_loss=0.3567, over 20797.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.155, cr_loss=0.3774, over 4089609.92 frames. ], batch size: 53, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:44:46,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=446221.6666666667, ans=0.2 2024-09-16 12:45:02,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=446250.0, ans=0.0 2024-09-16 12:45:02,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=22.5 2024-09-16 12:45:23,127 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.24 vs. limit=22.5 2024-09-16 12:45:41,697 INFO [train.py:1198] (0/2) Epoch 25, batch 4150, loss[loss=0.2672, ctc_loss=0.1823, cr_loss=0.4247, over 20848.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1561, cr_loss=0.379, over 4090502.38 frames. ], batch size: 65, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:45:53,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446335.0, ans=0.1 2024-09-16 12:45:56,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=446363.3333333333, ans=0.125 2024-09-16 12:46:26,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446391.6666666667, ans=0.1 2024-09-16 12:46:27,810 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.112e+02 2.214e+02 2.356e+02 4.198e+02, threshold=4.429e+02, percent-clipped=0.0 2024-09-16 12:46:37,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=446420.0, ans=0.0 2024-09-16 12:47:00,041 INFO [train.py:1198] (0/2) Epoch 25, batch 4200, loss[loss=0.2494, ctc_loss=0.1696, cr_loss=0.3986, over 20645.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1566, cr_loss=0.3795, over 4089366.42 frames. ], batch size: 68, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:47:20,575 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.76 vs. limit=6.0 2024-09-16 12:47:31,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-16 12:47:43,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=446533.3333333333, ans=0.125 2024-09-16 12:48:16,070 INFO [train.py:1198] (0/2) Epoch 25, batch 4250, loss[loss=0.2274, ctc_loss=0.1538, cr_loss=0.3684, over 21085.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1566, cr_loss=0.3795, over 4091958.26 frames. ], batch size: 53, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:48:31,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446646.6666666667, ans=0.1 2024-09-16 12:49:00,337 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.107e+02 2.242e+02 2.391e+02 4.206e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-16 12:49:32,058 INFO [train.py:1198] (0/2) Epoch 25, batch 4300, loss[loss=0.2466, ctc_loss=0.1638, cr_loss=0.4139, over 20661.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1566, cr_loss=0.3803, over 4102308.90 frames. ], batch size: 66, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:49:46,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=446788.3333333333, ans=0.125 2024-09-16 12:50:19,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-09-16 12:50:50,641 INFO [train.py:1198] (0/2) Epoch 25, batch 4350, loss[loss=0.2845, ctc_loss=0.2008, cr_loss=0.4184, over 14333.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1556, cr_loss=0.3791, over 4109800.74 frames. ], batch size: 150, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:50:52,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=446901.6666666667, ans=0.5 2024-09-16 12:51:04,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=446930.0, ans=0.0 2024-09-16 12:51:08,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=22.5 2024-09-16 12:51:18,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=446930.0, ans=0.0 2024-09-16 12:51:34,923 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.281e+02 2.487e+02 2.742e+02 4.213e+02, threshold=4.974e+02, percent-clipped=0.0 2024-09-16 12:51:49,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=446986.6666666667, ans=0.025 2024-09-16 12:52:06,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-16 12:52:10,158 INFO [train.py:1198] (0/2) Epoch 25, batch 4400, loss[loss=0.2048, ctc_loss=0.1348, cr_loss=0.3501, over 21070.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.155, cr_loss=0.3785, over 4110160.42 frames. ], batch size: 53, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:52:25,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=447071.6666666667, ans=0.0 2024-09-16 12:52:58,258 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2024-09-16 12:52:59,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=447128.3333333333, ans=0.125 2024-09-16 12:53:26,417 INFO [train.py:1198] (0/2) Epoch 25, batch 4450, loss[loss=0.1858, ctc_loss=0.1256, cr_loss=0.3012, over 20965.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1551, cr_loss=0.3782, over 4114752.93 frames. ], batch size: 50, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:53:46,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=447213.3333333333, ans=0.2 2024-09-16 12:53:53,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=447213.3333333333, ans=10.0 2024-09-16 12:54:10,160 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.107e+02 2.220e+02 2.363e+02 4.587e+02, threshold=4.439e+02, percent-clipped=0.0 2024-09-16 12:54:27,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.91 vs. limit=10.0 2024-09-16 12:54:42,004 INFO [train.py:1198] (0/2) Epoch 25, batch 4500, loss[loss=0.2389, ctc_loss=0.1605, cr_loss=0.392, over 21062.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1547, cr_loss=0.3775, over 4115009.02 frames. ], batch size: 59, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:54:43,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447326.6666666667, ans=0.1 2024-09-16 12:55:15,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=447383.3333333333, ans=0.125 2024-09-16 12:55:44,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=447440.0, ans=22.5 2024-09-16 12:55:48,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=447440.0, ans=0.0 2024-09-16 12:55:57,387 INFO [train.py:1198] (0/2) Epoch 25, batch 4550, loss[loss=0.27, ctc_loss=0.1843, cr_loss=0.4287, over 18375.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1548, cr_loss=0.3776, over 4115031.61 frames. ], batch size: 108, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:56:04,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=447468.3333333333, ans=0.2 2024-09-16 12:56:44,780 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.085e+02 2.226e+02 2.370e+02 5.298e+02, threshold=4.452e+02, percent-clipped=1.0 2024-09-16 12:56:48,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=447553.3333333333, ans=0.025 2024-09-16 12:56:54,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=447553.3333333333, ans=0.125 2024-09-16 12:56:57,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447553.3333333333, ans=0.1 2024-09-16 12:57:16,793 INFO [train.py:1198] (0/2) Epoch 25, batch 4600, loss[loss=0.2521, ctc_loss=0.1726, cr_loss=0.3977, over 20651.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1545, cr_loss=0.3767, over 4120172.39 frames. ], batch size: 71, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:57:20,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=447610.0, ans=0.2 2024-09-16 12:57:44,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=447638.3333333333, ans=0.0 2024-09-16 12:57:50,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=447666.6666666667, ans=0.04949747468305833 2024-09-16 12:57:55,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=447666.6666666667, ans=0.0 2024-09-16 12:58:02,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=447666.6666666667, ans=0.125 2024-09-16 12:58:35,451 INFO [train.py:1198] (0/2) Epoch 25, batch 4650, loss[loss=0.2466, ctc_loss=0.1673, cr_loss=0.3968, over 20670.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1564, cr_loss=0.3796, over 4101129.73 frames. ], batch size: 68, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 12:59:19,501 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.129e+02 2.221e+02 2.386e+02 4.102e+02, threshold=4.442e+02, percent-clipped=0.0 2024-09-16 12:59:29,109 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 12:59:45,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=15.0 2024-09-16 12:59:49,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=447865.0, ans=0.0 2024-09-16 12:59:51,994 INFO [train.py:1198] (0/2) Epoch 25, batch 4700, loss[loss=0.235, ctc_loss=0.1578, cr_loss=0.3863, over 21004.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1561, cr_loss=0.3789, over 4098808.93 frames. ], batch size: 63, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 13:00:00,337 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-16 13:00:05,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=447921.6666666667, ans=0.125 2024-09-16 13:00:25,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=447950.0, ans=0.035 2024-09-16 13:01:07,896 INFO [train.py:1198] (0/2) Epoch 25, batch 4750, loss[loss=0.2626, ctc_loss=0.1759, cr_loss=0.4335, over 20809.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1564, cr_loss=0.3795, over 4084096.31 frames. ], batch size: 65, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 13:01:54,671 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.131e+02 2.228e+02 2.410e+02 5.492e+02, threshold=4.457e+02, percent-clipped=2.0 2024-09-16 13:02:18,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-09-16 13:02:26,645 INFO [train.py:1198] (0/2) Epoch 25, batch 4800, loss[loss=0.3011, ctc_loss=0.2163, cr_loss=0.424, over 14386.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1552, cr_loss=0.3778, over 4082749.11 frames. ], batch size: 149, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 13:02:28,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=448176.6666666667, ans=0.015 2024-09-16 13:02:39,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-09-16 13:02:51,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=448205.0, ans=0.125 2024-09-16 13:02:57,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=448233.3333333333, ans=0.0 2024-09-16 13:03:08,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=448233.3333333333, ans=0.125 2024-09-16 13:03:12,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=448261.6666666667, ans=0.07 2024-09-16 13:03:12,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=448261.6666666667, ans=0.125 2024-09-16 13:03:24,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2024-09-16 13:03:45,041 INFO [train.py:1198] (0/2) Epoch 25, batch 4850, loss[loss=0.2114, ctc_loss=0.1408, cr_loss=0.3531, over 20991.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1558, cr_loss=0.3788, over 4074874.80 frames. ], batch size: 63, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 13:04:05,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448346.6666666667, ans=0.1 2024-09-16 13:04:22,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448375.0, ans=0.1 2024-09-16 13:04:25,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=22.5 2024-09-16 13:04:29,493 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.092e+02 2.219e+02 2.385e+02 3.498e+02, threshold=4.437e+02, percent-clipped=0.0 2024-09-16 13:04:35,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=448403.3333333333, ans=0.025 2024-09-16 13:04:40,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=448403.3333333333, ans=0.125 2024-09-16 13:05:01,123 INFO [train.py:1198] (0/2) Epoch 25, batch 4900, loss[loss=0.2423, ctc_loss=0.1638, cr_loss=0.3925, over 21001.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1553, cr_loss=0.3775, over 4067067.35 frames. ], batch size: 63, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 13:05:06,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=448460.0, ans=0.2 2024-09-16 13:05:08,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=22.5 2024-09-16 13:05:39,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=448516.6666666667, ans=0.0 2024-09-16 13:05:59,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=15.0 2024-09-16 13:06:07,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=448573.3333333333, ans=0.2 2024-09-16 13:06:16,110 INFO [train.py:1198] (0/2) Epoch 25, batch 4950, loss[loss=0.2659, ctc_loss=0.1789, cr_loss=0.4353, over 20726.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.155, cr_loss=0.3777, over 4067677.47 frames. ], batch size: 71, lr: 3.31e-03, grad_scale: 32.0 2024-09-16 13:06:51,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=448658.3333333333, ans=0.125 2024-09-16 13:06:59,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.109e+02 2.215e+02 2.399e+02 3.808e+02, threshold=4.429e+02, percent-clipped=0.0 2024-09-16 13:07:16,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=448715.0, ans=0.09899494936611666 2024-09-16 13:07:31,342 INFO [train.py:1198] (0/2) Epoch 25, batch 5000, loss[loss=0.2387, ctc_loss=0.1594, cr_loss=0.3964, over 20630.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1538, cr_loss=0.3756, over 4081474.94 frames. ], batch size: 68, lr: 3.30e-03, grad_scale: 16.0 2024-09-16 13:07:37,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=448743.3333333333, ans=0.125 2024-09-16 13:08:10,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448800.0, ans=0.1 2024-09-16 13:08:14,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=448828.3333333333, ans=0.125 2024-09-16 13:08:45,683 INFO [train.py:1198] (0/2) Epoch 25, batch 5050, loss[loss=0.2609, ctc_loss=0.176, cr_loss=0.4246, over 20655.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1543, cr_loss=0.3764, over 4079234.82 frames. ], batch size: 68, lr: 3.30e-03, grad_scale: 16.0 2024-09-16 13:08:50,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=448885.0, ans=0.125 2024-09-16 13:08:53,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=448885.0, ans=0.125 2024-09-16 13:09:16,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.32 vs. limit=22.5 2024-09-16 13:09:16,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448941.6666666667, ans=0.1 2024-09-16 13:09:31,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=448970.0, ans=0.125 2024-09-16 13:09:32,822 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.077e+02 2.215e+02 2.352e+02 3.095e+02, threshold=4.431e+02, percent-clipped=0.0 2024-09-16 13:09:36,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=448970.0, ans=0.0 2024-09-16 13:10:02,561 INFO [train.py:1198] (0/2) Epoch 25, batch 5100, loss[loss=0.2309, ctc_loss=0.1576, cr_loss=0.3664, over 20824.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1552, cr_loss=0.3782, over 4076118.91 frames. ], batch size: 59, lr: 3.30e-03, grad_scale: 16.0 2024-09-16 13:10:04,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=449026.6666666667, ans=0.0 2024-09-16 13:10:07,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=449026.6666666667, ans=0.125 2024-09-16 13:10:21,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=449055.0, ans=0.125 2024-09-16 13:10:56,396 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-09-16 13:10:59,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.14 vs. limit=12.0 2024-09-16 13:11:02,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=449140.0, ans=0.0 2024-09-16 13:11:15,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=449168.3333333333, ans=0.125 2024-09-16 13:11:16,582 INFO [train.py:1198] (0/2) Epoch 25, batch 5150, loss[loss=0.2365, ctc_loss=0.1584, cr_loss=0.3903, over 21071.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1558, cr_loss=0.3789, over 4070150.99 frames. ], batch size: 59, lr: 3.30e-03, grad_scale: 16.0 2024-09-16 13:11:27,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=449168.3333333333, ans=0.125 2024-09-16 13:12:01,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=449225.0, ans=0.2 2024-09-16 13:12:04,638 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.149e+02 2.272e+02 2.416e+02 4.273e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-16 13:12:09,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=449253.3333333333, ans=0.125 2024-09-16 13:12:18,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=449281.6666666667, ans=0.125 2024-09-16 13:12:34,435 INFO [train.py:1198] (0/2) Epoch 25, batch 5200, loss[loss=0.243, ctc_loss=0.166, cr_loss=0.3848, over 19539.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1561, cr_loss=0.3794, over 4068116.53 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:13:00,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=449338.3333333333, ans=0.2 2024-09-16 13:13:40,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2024-09-16 13:13:48,967 INFO [train.py:1198] (0/2) Epoch 25, batch 5250, loss[loss=0.2494, ctc_loss=0.1692, cr_loss=0.4007, over 20987.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1566, cr_loss=0.3802, over 4063658.67 frames. ], batch size: 64, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:13:50,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=449451.6666666667, ans=0.0 2024-09-16 13:13:55,211 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:14:11,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=449480.0, ans=0.2 2024-09-16 13:14:28,321 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:14:29,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=449508.3333333333, ans=0.125 2024-09-16 13:14:29,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=449508.3333333333, ans=0.125 2024-09-16 13:14:31,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=449508.3333333333, ans=0.0 2024-09-16 13:14:33,886 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.126e+02 2.233e+02 2.413e+02 3.021e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-16 13:14:46,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=15.0 2024-09-16 13:14:58,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=449565.0, ans=0.125 2024-09-16 13:14:58,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=449565.0, ans=0.125 2024-09-16 13:15:02,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=449593.3333333333, ans=0.0 2024-09-16 13:15:03,828 INFO [train.py:1198] (0/2) Epoch 25, batch 5300, loss[loss=0.2244, ctc_loss=0.1483, cr_loss=0.3805, over 21067.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1563, cr_loss=0.3797, over 4065498.81 frames. ], batch size: 56, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:15:08,631 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:16:18,503 INFO [train.py:1198] (0/2) Epoch 25, batch 5350, loss[loss=0.2363, ctc_loss=0.156, cr_loss=0.4016, over 20695.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1566, cr_loss=0.3806, over 4071170.91 frames. ], batch size: 68, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:16:24,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=449735.0, ans=0.025 2024-09-16 13:16:33,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=449763.3333333333, ans=0.2 2024-09-16 13:17:03,095 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.149e+02 2.241e+02 2.476e+02 2.922e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-16 13:17:04,938 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:17:05,461 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-09-16 13:17:33,387 INFO [train.py:1198] (0/2) Epoch 25, batch 5400, loss[loss=0.2449, ctc_loss=0.1641, cr_loss=0.404, over 20962.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1573, cr_loss=0.3813, over 4061856.72 frames. ], batch size: 55, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:17:35,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=449876.6666666667, ans=0.09899494936611666 2024-09-16 13:17:45,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=449876.6666666667, ans=0.125 2024-09-16 13:18:22,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=449961.6666666667, ans=0.0 2024-09-16 13:18:23,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=449961.6666666667, ans=0.125 2024-09-16 13:18:24,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=449961.6666666667, ans=0.0 2024-09-16 13:18:25,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=12.0 2024-09-16 13:18:33,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=22.5 2024-09-16 13:18:35,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=449990.0, ans=0.0 2024-09-16 13:18:50,593 INFO [train.py:1198] (0/2) Epoch 25, batch 5450, loss[loss=0.2466, ctc_loss=0.1649, cr_loss=0.4083, over 21056.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1566, cr_loss=0.3807, over 4058911.93 frames. ], batch size: 56, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:19:19,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=450075.0, ans=0.125 2024-09-16 13:19:26,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=450075.0, ans=0.125 2024-09-16 13:19:29,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=450075.0, ans=0.125 2024-09-16 13:19:32,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=450075.0, ans=0.125 2024-09-16 13:19:32,646 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:19:35,288 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.122e+02 2.248e+02 2.443e+02 5.856e+02, threshold=4.495e+02, percent-clipped=2.0 2024-09-16 13:20:04,802 INFO [train.py:1198] (0/2) Epoch 25, batch 5500, loss[loss=0.2691, ctc_loss=0.1849, cr_loss=0.4213, over 20662.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1565, cr_loss=0.3802, over 4063632.79 frames. ], batch size: 66, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:20:27,646 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-16 13:21:15,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2024-09-16 13:21:21,975 INFO [train.py:1198] (0/2) Epoch 25, batch 5550, loss[loss=0.2158, ctc_loss=0.1439, cr_loss=0.3593, over 20984.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1565, cr_loss=0.3801, over 4064517.11 frames. ], batch size: 55, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:22:06,594 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.103e+02 2.265e+02 2.441e+02 4.541e+02, threshold=4.530e+02, percent-clipped=1.0 2024-09-16 13:22:19,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=450386.6666666667, ans=0.125 2024-09-16 13:22:33,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450415.0, ans=0.1 2024-09-16 13:22:36,848 INFO [train.py:1198] (0/2) Epoch 25, batch 5600, loss[loss=0.2334, ctc_loss=0.1557, cr_loss=0.3886, over 21021.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1562, cr_loss=0.3801, over 4068975.94 frames. ], batch size: 61, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:22:37,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=450443.3333333333, ans=0.05 2024-09-16 13:22:41,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=450443.3333333333, ans=0.125 2024-09-16 13:22:59,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=450471.6666666667, ans=0.0 2024-09-16 13:23:28,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=450528.3333333333, ans=0.05 2024-09-16 13:23:35,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=450556.6666666667, ans=0.0 2024-09-16 13:23:51,954 INFO [train.py:1198] (0/2) Epoch 25, batch 5650, loss[loss=0.2367, ctc_loss=0.1584, cr_loss=0.3914, over 20887.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.155, cr_loss=0.3785, over 4082937.39 frames. ], batch size: 54, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:24:08,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=450613.3333333333, ans=0.09899494936611666 2024-09-16 13:24:32,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=450641.6666666667, ans=0.125 2024-09-16 13:24:36,778 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.096e+02 2.208e+02 2.368e+02 3.546e+02, threshold=4.416e+02, percent-clipped=0.0 2024-09-16 13:24:56,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=450698.3333333333, ans=0.125 2024-09-16 13:25:06,371 INFO [train.py:1198] (0/2) Epoch 25, batch 5700, loss[loss=0.1958, ctc_loss=0.1302, cr_loss=0.3281, over 20504.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1559, cr_loss=0.3794, over 4073928.98 frames. ], batch size: 45, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:25:21,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=450755.0, ans=0.125 2024-09-16 13:25:28,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=450755.0, ans=0.125 2024-09-16 13:25:39,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=450783.3333333333, ans=0.0 2024-09-16 13:26:05,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=450840.0, ans=0.125 2024-09-16 13:26:20,374 INFO [train.py:1198] (0/2) Epoch 25, batch 5750, loss[loss=0.2761, ctc_loss=0.1887, cr_loss=0.437, over 20217.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1565, cr_loss=0.3801, over 4072808.21 frames. ], batch size: 80, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:27:02,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450925.0, ans=0.1 2024-09-16 13:27:05,243 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.700e+02 2.115e+02 2.237e+02 2.403e+02 6.193e+02, threshold=4.473e+02, percent-clipped=1.0 2024-09-16 13:27:10,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=450953.3333333333, ans=0.125 2024-09-16 13:27:32,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=450981.6666666667, ans=0.125 2024-09-16 13:27:37,708 INFO [train.py:1198] (0/2) Epoch 25, batch 5800, loss[loss=0.231, ctc_loss=0.1509, cr_loss=0.4005, over 20799.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1574, cr_loss=0.3814, over 4063644.16 frames. ], batch size: 53, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:27:49,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=451010.0, ans=0.125 2024-09-16 13:28:05,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2024-09-16 13:28:31,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=451095.0, ans=0.025 2024-09-16 13:28:33,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=22.5 2024-09-16 13:28:42,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2024-09-16 13:28:47,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2024-09-16 13:28:52,517 INFO [train.py:1198] (0/2) Epoch 25, batch 5850, loss[loss=0.2535, ctc_loss=0.1714, cr_loss=0.4102, over 20057.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.158, cr_loss=0.382, over 4064591.86 frames. ], batch size: 80, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:29:06,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=451180.0, ans=0.04949747468305833 2024-09-16 13:29:12,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=451180.0, ans=15.0 2024-09-16 13:29:33,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=12.0 2024-09-16 13:29:39,740 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.148e+02 2.347e+02 2.593e+02 4.296e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-16 13:29:55,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=451265.0, ans=0.0 2024-09-16 13:30:09,797 INFO [train.py:1198] (0/2) Epoch 25, batch 5900, loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3673, over 20874.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1581, cr_loss=0.3815, over 4062243.95 frames. ], batch size: 54, lr: 3.30e-03, grad_scale: 32.0 2024-09-16 13:30:26,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=451321.6666666667, ans=10.0 2024-09-16 13:30:26,553 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.91 vs. limit=10.0 2024-09-16 13:31:24,045 INFO [train.py:1198] (0/2) Epoch 25, batch 5950, loss[loss=0.2072, ctc_loss=0.1372, cr_loss=0.3498, over 20943.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.157, cr_loss=0.3793, over 4062092.99 frames. ], batch size: 49, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:31:38,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=451463.3333333333, ans=0.125 2024-09-16 13:31:52,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=451491.6666666667, ans=0.125 2024-09-16 13:32:08,541 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.067e+02 2.207e+02 2.317e+02 3.232e+02, threshold=4.413e+02, percent-clipped=0.0 2024-09-16 13:32:35,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=451548.3333333333, ans=0.0 2024-09-16 13:32:36,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-09-16 13:32:38,382 INFO [train.py:1198] (0/2) Epoch 25, batch 6000, loss[loss=0.204, ctc_loss=0.1327, cr_loss=0.3566, over 20977.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1576, cr_loss=0.3797, over 4051629.68 frames. ], batch size: 52, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:32:38,383 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 13:33:01,622 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3847, 4.5924, 4.3796, 4.9213], device='cuda:0') 2024-09-16 13:33:02,493 INFO [train.py:1230] (0/2) Epoch 25, validation: loss=0.04198, ctc_loss=0.04198, cr_loss=1.159e-14, over 944034.00 frames. 2024-09-16 13:33:02,493 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 13:33:09,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=15.0 2024-09-16 13:33:17,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451605.0, ans=0.1 2024-09-16 13:33:33,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=451633.3333333333, ans=0.125 2024-09-16 13:33:45,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=451661.6666666667, ans=0.0 2024-09-16 13:34:05,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=451690.0, ans=0.0 2024-09-16 13:34:16,884 INFO [train.py:1198] (0/2) Epoch 25, batch 6050, loss[loss=0.2006, ctc_loss=0.1307, cr_loss=0.3495, over 20365.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1575, cr_loss=0.3802, over 4059664.53 frames. ], batch size: 45, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:34:30,101 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:35:02,531 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.106e+02 2.238e+02 2.471e+02 5.382e+02, threshold=4.476e+02, percent-clipped=1.0 2024-09-16 13:35:02,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=451803.3333333333, ans=0.125 2024-09-16 13:35:17,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=451831.6666666667, ans=0.125 2024-09-16 13:35:33,559 INFO [train.py:1198] (0/2) Epoch 25, batch 6100, loss[loss=0.2009, ctc_loss=0.1368, cr_loss=0.3208, over 20964.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1565, cr_loss=0.3784, over 4071474.52 frames. ], batch size: 48, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:35:33,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=451860.0, ans=0.125 2024-09-16 13:36:06,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2024-09-16 13:36:47,707 INFO [train.py:1198] (0/2) Epoch 25, batch 6150, loss[loss=0.2173, ctc_loss=0.1478, cr_loss=0.3477, over 21048.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1559, cr_loss=0.3773, over 4069620.52 frames. ], batch size: 62, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:36:58,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=452001.6666666667, ans=0.0 2024-09-16 13:37:09,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=452030.0, ans=0.125 2024-09-16 13:37:23,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=15.0 2024-09-16 13:37:31,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=452086.6666666667, ans=0.125 2024-09-16 13:37:33,213 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.668e+02 2.108e+02 2.276e+02 2.462e+02 3.641e+02, threshold=4.552e+02, percent-clipped=0.0 2024-09-16 13:37:44,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2024-09-16 13:38:02,949 INFO [train.py:1198] (0/2) Epoch 25, batch 6200, loss[loss=0.2929, ctc_loss=0.2049, cr_loss=0.4399, over 14960.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1568, cr_loss=0.379, over 4068010.58 frames. ], batch size: 149, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:38:07,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.51 vs. limit=10.0 2024-09-16 13:38:19,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=452171.6666666667, ans=0.125 2024-09-16 13:39:16,040 INFO [train.py:1198] (0/2) Epoch 25, batch 6250, loss[loss=0.261, ctc_loss=0.1809, cr_loss=0.4004, over 18219.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.158, cr_loss=0.3804, over 4043950.67 frames. ], batch size: 108, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:39:37,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=452313.3333333333, ans=0.2 2024-09-16 13:39:46,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=452341.6666666667, ans=0.025 2024-09-16 13:39:47,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=452341.6666666667, ans=15.0 2024-09-16 13:40:01,198 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.156e+02 2.310e+02 2.565e+02 4.257e+02, threshold=4.620e+02, percent-clipped=0.0 2024-09-16 13:40:01,880 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-09-16 13:40:10,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=22.5 2024-09-16 13:40:28,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=452426.6666666667, ans=0.125 2024-09-16 13:40:29,801 INFO [train.py:1198] (0/2) Epoch 25, batch 6300, loss[loss=0.2295, ctc_loss=0.1535, cr_loss=0.3801, over 20675.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1601, cr_loss=0.3835, over 4008377.31 frames. ], batch size: 68, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:40:33,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452426.6666666667, ans=0.1 2024-09-16 13:40:36,383 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-09-16 13:40:53,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=452455.0, ans=0.0 2024-09-16 13:41:09,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=452483.3333333333, ans=0.025 2024-09-16 13:41:15,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=15.0 2024-09-16 13:41:17,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=452511.6666666667, ans=0.0 2024-09-16 13:41:41,484 INFO [train.py:1198] (0/2) Epoch 25, batch 6350, loss[loss=0.2737, ctc_loss=0.1953, cr_loss=0.3921, over 14302.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1656, cr_loss=0.3871, over 3827515.72 frames. ], batch size: 150, lr: 3.29e-03, grad_scale: 32.0 2024-09-16 13:41:41,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=452568.3333333333, ans=0.0 2024-09-16 13:41:46,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=452568.3333333333, ans=0.0 2024-09-16 13:41:56,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=452596.6666666667, ans=0.125 2024-09-16 13:42:06,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=452596.6666666667, ans=0.0 2024-09-16 13:42:24,622 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.439e+02 2.653e+02 2.916e+02 4.379e+02, threshold=5.307e+02, percent-clipped=0.0 2024-09-16 13:42:35,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=452653.3333333333, ans=0.125 2024-09-16 13:42:35,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=452653.3333333333, ans=0.125 2024-09-16 13:42:40,264 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-25.pt 2024-09-16 13:43:34,259 INFO [train.py:1198] (0/2) Epoch 26, batch 0, loss[loss=0.2243, ctc_loss=0.1485, cr_loss=0.3792, over 21003.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1485, cr_loss=0.3792, over 21003.00 frames. ], batch size: 61, lr: 3.23e-03, grad_scale: 32.0 2024-09-16 13:43:34,260 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 13:43:55,333 INFO [train.py:1230] (0/2) Epoch 26, validation: loss=0.04202, ctc_loss=0.04202, cr_loss=1.14e-14, over 944034.00 frames. 2024-09-16 13:43:55,333 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 13:44:00,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=452684.5, ans=6.0 2024-09-16 13:44:06,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-09-16 13:44:12,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-09-16 13:44:13,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=452712.8333333333, ans=0.125 2024-09-16 13:44:19,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=452712.8333333333, ans=0.0 2024-09-16 13:44:33,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=452741.1666666667, ans=0.0 2024-09-16 13:45:10,598 INFO [train.py:1198] (0/2) Epoch 26, batch 50, loss[loss=0.2579, ctc_loss=0.1815, cr_loss=0.382, over 14170.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.15, cr_loss=0.3703, over 925429.18 frames. ], batch size: 149, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:45:13,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=452826.1666666667, ans=0.04949747468305833 2024-09-16 13:45:13,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=452826.1666666667, ans=0.125 2024-09-16 13:45:24,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=452854.5, ans=0.125 2024-09-16 13:45:24,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=452854.5, ans=0.0 2024-09-16 13:45:30,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=452854.5, ans=0.2 2024-09-16 13:46:09,432 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.095e+02 2.264e+02 2.491e+02 4.123e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-16 13:46:11,790 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=12.0 2024-09-16 13:46:25,731 INFO [train.py:1198] (0/2) Epoch 26, batch 100, loss[loss=0.2442, ctc_loss=0.1647, cr_loss=0.3975, over 20834.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1535, cr_loss=0.3762, over 1627712.88 frames. ], batch size: 65, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:46:50,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=452996.1666666667, ans=0.2 2024-09-16 13:46:53,710 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:46:56,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=453024.5, ans=0.2 2024-09-16 13:47:20,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=453052.8333333333, ans=0.1 2024-09-16 13:47:44,145 INFO [train.py:1198] (0/2) Epoch 26, batch 150, loss[loss=0.2315, ctc_loss=0.157, cr_loss=0.3726, over 20887.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.156, cr_loss=0.3781, over 2164365.38 frames. ], batch size: 57, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:47:59,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=453137.8333333333, ans=0.0 2024-09-16 13:48:26,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=453166.1666666667, ans=0.125 2024-09-16 13:48:37,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=453194.5, ans=0.125 2024-09-16 13:48:42,708 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.079e+02 2.215e+02 2.385e+02 3.003e+02, threshold=4.430e+02, percent-clipped=0.0 2024-09-16 13:48:49,344 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-16 13:48:51,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=453222.8333333333, ans=0.2 2024-09-16 13:48:59,345 INFO [train.py:1198] (0/2) Epoch 26, batch 200, loss[loss=0.2072, ctc_loss=0.1369, cr_loss=0.3513, over 20983.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1545, cr_loss=0.376, over 2595029.11 frames. ], batch size: 51, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:49:02,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=453251.1666666667, ans=0.125 2024-09-16 13:49:24,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=453279.5, ans=0.125 2024-09-16 13:49:45,126 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-160000.pt 2024-09-16 13:49:47,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=453336.1666666667, ans=0.1 2024-09-16 13:50:11,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=453364.5, ans=0.1 2024-09-16 13:50:16,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=453364.5, ans=0.04949747468305833 2024-09-16 13:50:19,175 INFO [train.py:1198] (0/2) Epoch 26, batch 250, loss[loss=0.2552, ctc_loss=0.1717, cr_loss=0.4177, over 20850.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1549, cr_loss=0.3776, over 2911496.40 frames. ], batch size: 65, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:50:29,532 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.51 vs. limit=5.0 2024-09-16 13:50:45,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-16 13:50:52,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=453449.5, ans=0.0 2024-09-16 13:51:03,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=453477.8333333333, ans=0.125 2024-09-16 13:51:18,230 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.099e+02 2.209e+02 2.337e+02 3.735e+02, threshold=4.418e+02, percent-clipped=0.0 2024-09-16 13:51:29,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=453506.1666666667, ans=0.125 2024-09-16 13:51:30,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=453506.1666666667, ans=0.125 2024-09-16 13:51:34,885 INFO [train.py:1198] (0/2) Epoch 26, batch 300, loss[loss=0.252, ctc_loss=0.1734, cr_loss=0.3931, over 20007.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1542, cr_loss=0.3771, over 3173241.66 frames. ], batch size: 80, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:51:40,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=22.5 2024-09-16 13:51:59,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-09-16 13:52:36,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=453647.8333333333, ans=0.125 2024-09-16 13:52:47,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=453647.8333333333, ans=0.04949747468305833 2024-09-16 13:52:50,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=453647.8333333333, ans=0.125 2024-09-16 13:52:50,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=453647.8333333333, ans=0.025 2024-09-16 13:52:53,453 INFO [train.py:1198] (0/2) Epoch 26, batch 350, loss[loss=0.2145, ctc_loss=0.1432, cr_loss=0.3567, over 21062.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1536, cr_loss=0.376, over 3379516.92 frames. ], batch size: 59, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:53:51,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.099e+02 2.203e+02 2.392e+02 3.716e+02, threshold=4.405e+02, percent-clipped=0.0 2024-09-16 13:53:59,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=453789.5, ans=0.125 2024-09-16 13:54:05,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=453789.5, ans=0.125 2024-09-16 13:54:07,994 INFO [train.py:1198] (0/2) Epoch 26, batch 400, loss[loss=0.2533, ctc_loss=0.1756, cr_loss=0.389, over 19436.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1543, cr_loss=0.377, over 3539341.22 frames. ], batch size: 90, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:54:16,128 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-16 13:55:02,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=453902.8333333333, ans=0.125 2024-09-16 13:55:04,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-16 13:55:26,722 INFO [train.py:1198] (0/2) Epoch 26, batch 450, loss[loss=0.2126, ctc_loss=0.1386, cr_loss=0.37, over 20217.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.155, cr_loss=0.3785, over 3667208.38 frames. ], batch size: 45, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:55:45,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=453987.8333333333, ans=0.0 2024-09-16 13:55:47,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=22.5 2024-09-16 13:56:25,188 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.074e+02 2.167e+02 2.316e+02 2.993e+02, threshold=4.334e+02, percent-clipped=0.0 2024-09-16 13:56:42,003 INFO [train.py:1198] (0/2) Epoch 26, batch 500, loss[loss=0.2398, ctc_loss=0.1598, cr_loss=0.4, over 21032.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1552, cr_loss=0.3783, over 3766518.23 frames. ], batch size: 62, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:57:08,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454129.5, ans=0.1 2024-09-16 13:57:11,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=454157.8333333333, ans=0.0 2024-09-16 13:57:22,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454157.8333333333, ans=0.1 2024-09-16 13:57:25,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-09-16 13:57:27,070 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=22.5 2024-09-16 13:57:43,836 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2024-09-16 13:57:58,512 INFO [train.py:1198] (0/2) Epoch 26, batch 550, loss[loss=0.2375, ctc_loss=0.1643, cr_loss=0.3661, over 21041.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.155, cr_loss=0.3777, over 3830320.91 frames. ], batch size: 62, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 13:58:36,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=454299.5, ans=0.0 2024-09-16 13:59:00,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.132e+02 2.359e+02 2.534e+02 5.840e+02, threshold=4.717e+02, percent-clipped=1.0 2024-09-16 13:59:09,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454356.1666666667, ans=0.1 2024-09-16 13:59:17,475 INFO [train.py:1198] (0/2) Epoch 26, batch 600, loss[loss=0.2117, ctc_loss=0.1427, cr_loss=0.3449, over 20916.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1554, cr_loss=0.3782, over 3891011.99 frames. ], batch size: 50, lr: 3.22e-03, grad_scale: 64.0 2024-09-16 13:59:54,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=454441.1666666667, ans=0.0 2024-09-16 14:00:01,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=454469.5, ans=0.0 2024-09-16 14:00:14,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=454469.5, ans=0.2 2024-09-16 14:00:33,477 INFO [train.py:1198] (0/2) Epoch 26, batch 650, loss[loss=0.2253, ctc_loss=0.151, cr_loss=0.3711, over 20758.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1552, cr_loss=0.3779, over 3940482.60 frames. ], batch size: 56, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:00:42,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=454526.1666666667, ans=0.125 2024-09-16 14:00:44,659 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:00:56,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=454554.5, ans=0.125 2024-09-16 14:01:13,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454582.8333333333, ans=0.1 2024-09-16 14:01:14,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=22.5 2024-09-16 14:01:22,580 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-16 14:01:36,710 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.079e+02 2.247e+02 2.388e+02 2.917e+02, threshold=4.493e+02, percent-clipped=0.0 2024-09-16 14:01:43,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=454639.5, ans=0.125 2024-09-16 14:01:47,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=454639.5, ans=0.125 2024-09-16 14:01:51,880 INFO [train.py:1198] (0/2) Epoch 26, batch 700, loss[loss=0.2328, ctc_loss=0.1578, cr_loss=0.3753, over 21017.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1555, cr_loss=0.3787, over 3973019.13 frames. ], batch size: 63, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:01:52,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=454667.8333333333, ans=0.025 2024-09-16 14:01:53,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=454667.8333333333, ans=0.125 2024-09-16 14:01:58,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=454667.8333333333, ans=0.1 2024-09-16 14:02:39,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.54 vs. limit=15.0 2024-09-16 14:03:07,609 INFO [train.py:1198] (0/2) Epoch 26, batch 750, loss[loss=0.2243, ctc_loss=0.1517, cr_loss=0.3631, over 21002.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1551, cr_loss=0.378, over 3994047.93 frames. ], batch size: 61, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:03:09,497 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:03:33,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=454837.8333333333, ans=0.2 2024-09-16 14:03:41,876 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=22.5 2024-09-16 14:03:42,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=454866.1666666667, ans=0.0 2024-09-16 14:03:44,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=454866.1666666667, ans=0.125 2024-09-16 14:03:54,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=454894.5, ans=0.1 2024-09-16 14:03:54,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=454894.5, ans=0.0 2024-09-16 14:04:07,783 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.154e+02 2.284e+02 2.459e+02 3.150e+02, threshold=4.567e+02, percent-clipped=0.0 2024-09-16 14:04:25,944 INFO [train.py:1198] (0/2) Epoch 26, batch 800, loss[loss=0.2282, ctc_loss=0.1524, cr_loss=0.379, over 21084.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1556, cr_loss=0.3786, over 4012265.53 frames. ], batch size: 59, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:04:30,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=22.5 2024-09-16 14:04:41,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=454979.5, ans=0.125 2024-09-16 14:04:44,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=454979.5, ans=0.0 2024-09-16 14:04:47,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=454979.5, ans=0.125 2024-09-16 14:05:03,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2024-09-16 14:05:10,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=455036.1666666667, ans=0.0 2024-09-16 14:05:16,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=455036.1666666667, ans=0.125 2024-09-16 14:05:22,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=455036.1666666667, ans=0.125 2024-09-16 14:05:35,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=455064.5, ans=0.0 2024-09-16 14:05:41,529 INFO [train.py:1198] (0/2) Epoch 26, batch 850, loss[loss=0.2375, ctc_loss=0.1597, cr_loss=0.3892, over 20937.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1544, cr_loss=0.3771, over 4036310.72 frames. ], batch size: 60, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:06:15,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=455149.5, ans=0.125 2024-09-16 14:06:29,225 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:06:42,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=455177.8333333333, ans=10.0 2024-09-16 14:06:45,222 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.151e+02 2.266e+02 2.484e+02 4.339e+02, threshold=4.531e+02, percent-clipped=0.0 2024-09-16 14:06:50,285 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:07:00,624 INFO [train.py:1198] (0/2) Epoch 26, batch 900, loss[loss=0.2257, ctc_loss=0.1511, cr_loss=0.3729, over 21066.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1537, cr_loss=0.3756, over 4048573.56 frames. ], batch size: 59, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:07:06,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=455234.5, ans=0.125 2024-09-16 14:07:23,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=455262.8333333333, ans=0.025 2024-09-16 14:07:43,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=455291.1666666667, ans=0.0 2024-09-16 14:07:46,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=455319.5, ans=0.0 2024-09-16 14:07:59,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=455347.8333333333, ans=0.125 2024-09-16 14:08:11,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=455347.8333333333, ans=0.2 2024-09-16 14:08:16,231 INFO [train.py:1198] (0/2) Epoch 26, batch 950, loss[loss=0.2102, ctc_loss=0.1387, cr_loss=0.3574, over 20950.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1543, cr_loss=0.3766, over 4062225.27 frames. ], batch size: 50, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:08:23,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-09-16 14:09:17,114 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.059e+02 2.176e+02 2.334e+02 8.044e+02, threshold=4.353e+02, percent-clipped=2.0 2024-09-16 14:09:26,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=455489.5, ans=0.125 2024-09-16 14:09:28,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=455489.5, ans=0.125 2024-09-16 14:09:32,392 INFO [train.py:1198] (0/2) Epoch 26, batch 1000, loss[loss=0.2318, ctc_loss=0.156, cr_loss=0.3789, over 21066.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1547, cr_loss=0.3766, over 4041536.70 frames. ], batch size: 56, lr: 3.22e-03, grad_scale: 32.0 2024-09-16 14:10:22,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=455602.8333333333, ans=0.125 2024-09-16 14:10:26,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=455602.8333333333, ans=0.0 2024-09-16 14:10:37,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=455631.1666666667, ans=0.0 2024-09-16 14:10:50,385 INFO [train.py:1198] (0/2) Epoch 26, batch 1050, loss[loss=0.2512, ctc_loss=0.1714, cr_loss=0.3995, over 20968.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1549, cr_loss=0.3771, over 4053760.16 frames. ], batch size: 64, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:10:50,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=455659.5, ans=0.0 2024-09-16 14:11:14,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=455687.8333333333, ans=0.025 2024-09-16 14:11:50,910 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.137e+02 2.261e+02 2.471e+02 3.167e+02, threshold=4.521e+02, percent-clipped=0.0 2024-09-16 14:12:06,256 INFO [train.py:1198] (0/2) Epoch 26, batch 1100, loss[loss=0.2563, ctc_loss=0.1734, cr_loss=0.4145, over 20060.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1554, cr_loss=0.3784, over 4068077.45 frames. ], batch size: 80, lr: 3.21e-03, grad_scale: 16.0 2024-09-16 14:12:25,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2024-09-16 14:12:32,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=455829.5, ans=0.125 2024-09-16 14:12:36,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=455829.5, ans=0.0 2024-09-16 14:12:54,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=455886.1666666667, ans=0.125 2024-09-16 14:12:59,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=455886.1666666667, ans=0.07 2024-09-16 14:13:07,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=455886.1666666667, ans=0.125 2024-09-16 14:13:08,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-09-16 14:13:11,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=455914.5, ans=0.125 2024-09-16 14:13:15,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=455914.5, ans=0.0 2024-09-16 14:13:24,588 INFO [train.py:1198] (0/2) Epoch 26, batch 1150, loss[loss=0.2252, ctc_loss=0.1475, cr_loss=0.3883, over 20802.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1555, cr_loss=0.3794, over 4076873.00 frames. ], batch size: 53, lr: 3.21e-03, grad_scale: 16.0 2024-09-16 14:13:55,122 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:14:03,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=455999.5, ans=0.2 2024-09-16 14:14:08,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=456027.8333333333, ans=0.035 2024-09-16 14:14:21,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=456027.8333333333, ans=0.025 2024-09-16 14:14:26,110 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.109e+02 2.299e+02 2.482e+02 5.988e+02, threshold=4.598e+02, percent-clipped=1.0 2024-09-16 14:14:39,906 INFO [train.py:1198] (0/2) Epoch 26, batch 1200, loss[loss=0.2735, ctc_loss=0.1858, cr_loss=0.4387, over 20857.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.155, cr_loss=0.3781, over 4077495.12 frames. ], batch size: 65, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:14:43,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=456084.5, ans=0.025 2024-09-16 14:15:05,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=456112.8333333333, ans=0.2 2024-09-16 14:15:13,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=456141.1666666667, ans=0.125 2024-09-16 14:15:17,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=456141.1666666667, ans=0.04949747468305833 2024-09-16 14:15:58,316 INFO [train.py:1198] (0/2) Epoch 26, batch 1250, loss[loss=0.2176, ctc_loss=0.1463, cr_loss=0.3568, over 20874.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1541, cr_loss=0.3762, over 4075311.61 frames. ], batch size: 54, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:16:09,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=456226.1666666667, ans=0.04949747468305833 2024-09-16 14:16:16,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=456254.5, ans=0.125 2024-09-16 14:16:29,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=22.5 2024-09-16 14:16:54,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=456311.1666666667, ans=0.0 2024-09-16 14:17:00,344 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.063e+02 2.198e+02 2.380e+02 3.087e+02, threshold=4.396e+02, percent-clipped=0.0 2024-09-16 14:17:13,806 INFO [train.py:1198] (0/2) Epoch 26, batch 1300, loss[loss=0.2386, ctc_loss=0.1567, cr_loss=0.4096, over 21065.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1541, cr_loss=0.3763, over 4076930.61 frames. ], batch size: 56, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:17:44,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=456424.5, ans=0.1 2024-09-16 14:18:32,954 INFO [train.py:1198] (0/2) Epoch 26, batch 1350, loss[loss=0.2057, ctc_loss=0.1357, cr_loss=0.3503, over 20964.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.154, cr_loss=0.3763, over 4081492.14 frames. ], batch size: 55, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:18:44,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=456509.5, ans=0.07 2024-09-16 14:18:44,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=456509.5, ans=0.0 2024-09-16 14:19:18,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=456594.5, ans=0.0 2024-09-16 14:19:25,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=456594.5, ans=0.0 2024-09-16 14:19:35,317 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.159e+02 2.328e+02 2.603e+02 5.755e+02, threshold=4.657e+02, percent-clipped=2.0 2024-09-16 14:19:46,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=12.0 2024-09-16 14:19:46,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2024-09-16 14:19:48,949 INFO [train.py:1198] (0/2) Epoch 26, batch 1400, loss[loss=0.2022, ctc_loss=0.1326, cr_loss=0.3484, over 21027.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1541, cr_loss=0.3767, over 4081062.59 frames. ], batch size: 52, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:19:53,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=456651.1666666667, ans=0.125 2024-09-16 14:20:19,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=456707.8333333333, ans=0.2 2024-09-16 14:20:26,118 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.31 vs. limit=15.0 2024-09-16 14:20:27,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=456707.8333333333, ans=0.2 2024-09-16 14:20:32,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=456736.1666666667, ans=0.2 2024-09-16 14:20:34,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=456736.1666666667, ans=0.0 2024-09-16 14:21:04,572 INFO [train.py:1198] (0/2) Epoch 26, batch 1450, loss[loss=0.2392, ctc_loss=0.1608, cr_loss=0.3919, over 21082.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.154, cr_loss=0.3765, over 4081947.53 frames. ], batch size: 59, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:21:14,161 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2024-09-16 14:21:55,198 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:22:09,856 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.102e+02 2.215e+02 2.442e+02 3.788e+02, threshold=4.430e+02, percent-clipped=0.0 2024-09-16 14:22:23,405 INFO [train.py:1198] (0/2) Epoch 26, batch 1500, loss[loss=0.2514, ctc_loss=0.1711, cr_loss=0.4012, over 18588.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1537, cr_loss=0.3769, over 4094549.55 frames. ], batch size: 108, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:22:28,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=456934.5, ans=0.2 2024-09-16 14:22:31,596 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-16 14:22:36,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-09-16 14:22:49,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=456962.8333333333, ans=0.1 2024-09-16 14:23:13,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=457019.5, ans=0.125 2024-09-16 14:23:41,550 INFO [train.py:1198] (0/2) Epoch 26, batch 1550, loss[loss=0.2134, ctc_loss=0.1386, cr_loss=0.3738, over 20993.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1534, cr_loss=0.3764, over 4090134.10 frames. ], batch size: 55, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:23:51,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=457076.1666666667, ans=0.0 2024-09-16 14:24:10,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=457132.8333333333, ans=0.125 2024-09-16 14:24:43,626 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.098e+02 2.271e+02 2.395e+02 3.546e+02, threshold=4.541e+02, percent-clipped=0.0 2024-09-16 14:24:57,067 INFO [train.py:1198] (0/2) Epoch 26, batch 1600, loss[loss=0.2257, ctc_loss=0.1507, cr_loss=0.3749, over 21060.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1537, cr_loss=0.3773, over 4095717.02 frames. ], batch size: 56, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:25:10,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=457246.1666666667, ans=0.125 2024-09-16 14:25:14,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=457246.1666666667, ans=0.125 2024-09-16 14:25:18,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=457246.1666666667, ans=0.125 2024-09-16 14:25:47,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=457302.8333333333, ans=0.05 2024-09-16 14:26:12,593 INFO [train.py:1198] (0/2) Epoch 26, batch 1650, loss[loss=0.2497, ctc_loss=0.1724, cr_loss=0.3861, over 20287.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1538, cr_loss=0.3775, over 4096570.34 frames. ], batch size: 74, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:26:20,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.41 vs. limit=22.5 2024-09-16 14:26:32,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=457387.8333333333, ans=0.125 2024-09-16 14:26:41,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457416.1666666667, ans=0.1 2024-09-16 14:27:06,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457444.5, ans=0.1 2024-09-16 14:27:12,185 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-09-16 14:27:17,380 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.106e+02 2.247e+02 2.451e+02 3.783e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-16 14:27:30,920 INFO [train.py:1198] (0/2) Epoch 26, batch 1700, loss[loss=0.279, ctc_loss=0.195, cr_loss=0.4204, over 18109.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1534, cr_loss=0.3758, over 4084023.12 frames. ], batch size: 108, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:27:46,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=457529.5, ans=0.025 2024-09-16 14:28:39,532 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2024-09-16 14:28:46,454 INFO [train.py:1198] (0/2) Epoch 26, batch 1750, loss[loss=0.2484, ctc_loss=0.1664, cr_loss=0.41, over 20844.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1544, cr_loss=0.3775, over 4082864.28 frames. ], batch size: 65, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:29:01,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=457671.1666666667, ans=0.2 2024-09-16 14:29:18,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=457699.5, ans=0.02 2024-09-16 14:29:36,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=457727.8333333333, ans=0.0 2024-09-16 14:29:38,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2024-09-16 14:29:51,913 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.071e+02 2.231e+02 2.467e+02 4.552e+02, threshold=4.462e+02, percent-clipped=1.0 2024-09-16 14:30:05,727 INFO [train.py:1198] (0/2) Epoch 26, batch 1800, loss[loss=0.2233, ctc_loss=0.152, cr_loss=0.3562, over 20663.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1547, cr_loss=0.378, over 4088287.93 frames. ], batch size: 66, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:30:19,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=457812.8333333333, ans=0.0 2024-09-16 14:30:51,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=457869.5, ans=0.125 2024-09-16 14:31:18,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=457897.8333333333, ans=0.2 2024-09-16 14:31:21,334 INFO [train.py:1198] (0/2) Epoch 26, batch 1850, loss[loss=0.2007, ctc_loss=0.1333, cr_loss=0.3372, over 20780.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1544, cr_loss=0.3769, over 4085160.08 frames. ], batch size: 53, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:31:28,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=457926.1666666667, ans=0.05 2024-09-16 14:31:41,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=457954.5, ans=0.0 2024-09-16 14:31:45,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=457954.5, ans=0.1 2024-09-16 14:31:47,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=457954.5, ans=0.125 2024-09-16 14:31:49,344 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2024-09-16 14:32:05,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=458011.1666666667, ans=0.1 2024-09-16 14:32:23,316 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.117e+02 2.290e+02 2.418e+02 4.215e+02, threshold=4.579e+02, percent-clipped=0.0 2024-09-16 14:32:37,037 INFO [train.py:1198] (0/2) Epoch 26, batch 1900, loss[loss=0.2627, ctc_loss=0.183, cr_loss=0.3985, over 18225.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1557, cr_loss=0.3785, over 4082951.60 frames. ], batch size: 108, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:32:42,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=458067.8333333333, ans=0.125 2024-09-16 14:32:53,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=458096.1666666667, ans=0.1 2024-09-16 14:33:00,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=458096.1666666667, ans=0.125 2024-09-16 14:33:14,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2024-09-16 14:33:21,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=458124.5, ans=0.125 2024-09-16 14:33:55,731 INFO [train.py:1198] (0/2) Epoch 26, batch 1950, loss[loss=0.242, ctc_loss=0.1628, cr_loss=0.396, over 20960.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1556, cr_loss=0.3778, over 4078663.79 frames. ], batch size: 58, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:33:57,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=458209.5, ans=0.1 2024-09-16 14:34:03,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=458209.5, ans=0.125 2024-09-16 14:34:21,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=458237.8333333333, ans=0.0 2024-09-16 14:34:44,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2024-09-16 14:34:57,594 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.102e+02 2.228e+02 2.440e+02 6.596e+02, threshold=4.457e+02, percent-clipped=1.0 2024-09-16 14:34:58,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=458322.8333333333, ans=0.125 2024-09-16 14:35:14,262 INFO [train.py:1198] (0/2) Epoch 26, batch 2000, loss[loss=0.235, ctc_loss=0.1619, cr_loss=0.3659, over 21006.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.155, cr_loss=0.3765, over 4076695.11 frames. ], batch size: 62, lr: 3.21e-03, grad_scale: 32.0 2024-09-16 14:35:17,944 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=22.5 2024-09-16 14:35:22,683 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-16 14:35:34,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-16 14:35:39,698 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.64 vs. limit=10.0 2024-09-16 14:36:30,878 INFO [train.py:1198] (0/2) Epoch 26, batch 2050, loss[loss=0.2079, ctc_loss=0.1381, cr_loss=0.3489, over 20981.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1547, cr_loss=0.3759, over 4074883.93 frames. ], batch size: 49, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:36:50,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=458521.1666666667, ans=0.125 2024-09-16 14:37:07,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=458549.5, ans=0.07 2024-09-16 14:37:33,044 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.127e+02 2.251e+02 2.406e+02 3.527e+02, threshold=4.503e+02, percent-clipped=0.0 2024-09-16 14:37:46,634 INFO [train.py:1198] (0/2) Epoch 26, batch 2100, loss[loss=0.2162, ctc_loss=0.147, cr_loss=0.3459, over 20993.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1539, cr_loss=0.3749, over 4084784.41 frames. ], batch size: 52, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:38:15,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=458691.1666666667, ans=0.125 2024-09-16 14:38:23,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.57 vs. limit=10.0 2024-09-16 14:38:51,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=458747.8333333333, ans=0.0 2024-09-16 14:38:52,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-09-16 14:39:05,703 INFO [train.py:1198] (0/2) Epoch 26, batch 2150, loss[loss=0.2334, ctc_loss=0.1594, cr_loss=0.3699, over 21063.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1531, cr_loss=0.3735, over 4086953.92 frames. ], batch size: 62, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:39:24,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=458804.5, ans=0.0 2024-09-16 14:39:29,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-09-16 14:40:07,870 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.062e+02 2.213e+02 2.329e+02 3.338e+02, threshold=4.427e+02, percent-clipped=0.0 2024-09-16 14:40:21,582 INFO [train.py:1198] (0/2) Epoch 26, batch 2200, loss[loss=0.2039, ctc_loss=0.1366, cr_loss=0.3363, over 20880.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1526, cr_loss=0.3731, over 4093361.63 frames. ], batch size: 57, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:40:29,741 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:40:31,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=458917.8333333333, ans=0.125 2024-09-16 14:40:34,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=458917.8333333333, ans=0.125 2024-09-16 14:40:57,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2024-09-16 14:41:04,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-16 14:41:21,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=459002.8333333333, ans=0.2 2024-09-16 14:41:33,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-09-16 14:41:39,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-16 14:41:40,434 INFO [train.py:1198] (0/2) Epoch 26, batch 2250, loss[loss=0.2368, ctc_loss=0.1593, cr_loss=0.3879, over 20975.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1543, cr_loss=0.3754, over 4072302.46 frames. ], batch size: 58, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:41:57,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=459087.8333333333, ans=0.125 2024-09-16 14:41:58,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=459087.8333333333, ans=0.0 2024-09-16 14:42:43,954 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.074e+02 2.201e+02 2.373e+02 4.119e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-16 14:42:56,051 INFO [train.py:1198] (0/2) Epoch 26, batch 2300, loss[loss=0.1865, ctc_loss=0.1238, cr_loss=0.3135, over 20958.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1535, cr_loss=0.3743, over 4082756.14 frames. ], batch size: 51, lr: 3.20e-03, grad_scale: 16.0 2024-09-16 14:42:59,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=459201.1666666667, ans=0.0 2024-09-16 14:43:13,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=459229.5, ans=0.1 2024-09-16 14:43:36,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=459257.8333333333, ans=0.1 2024-09-16 14:44:12,598 INFO [train.py:1198] (0/2) Epoch 26, batch 2350, loss[loss=0.2393, ctc_loss=0.1622, cr_loss=0.3853, over 20638.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1543, cr_loss=0.3762, over 4085618.44 frames. ], batch size: 68, lr: 3.20e-03, grad_scale: 16.0 2024-09-16 14:44:13,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=459342.8333333333, ans=0.125 2024-09-16 14:45:09,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=459427.8333333333, ans=0.0 2024-09-16 14:45:19,890 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.064e+02 2.189e+02 2.370e+02 5.401e+02, threshold=4.377e+02, percent-clipped=1.0 2024-09-16 14:45:32,241 INFO [train.py:1198] (0/2) Epoch 26, batch 2400, loss[loss=0.2038, ctc_loss=0.1318, cr_loss=0.3601, over 20978.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.154, cr_loss=0.3757, over 4091121.09 frames. ], batch size: 50, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:45:44,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=459484.5, ans=0.1 2024-09-16 14:45:45,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2024-09-16 14:46:00,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2024-09-16 14:46:27,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=459569.5, ans=0.1 2024-09-16 14:46:35,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=459597.8333333333, ans=0.125 2024-09-16 14:46:42,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=459597.8333333333, ans=0.125 2024-09-16 14:46:51,617 INFO [train.py:1198] (0/2) Epoch 26, batch 2450, loss[loss=0.184, ctc_loss=0.1186, cr_loss=0.3267, over 20953.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.154, cr_loss=0.3761, over 4087985.93 frames. ], batch size: 48, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:47:16,332 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:47:46,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=459711.1666666667, ans=0.0 2024-09-16 14:47:55,427 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.096e+02 2.219e+02 2.449e+02 4.273e+02, threshold=4.439e+02, percent-clipped=0.0 2024-09-16 14:48:07,564 INFO [train.py:1198] (0/2) Epoch 26, batch 2500, loss[loss=0.2276, ctc_loss=0.1509, cr_loss=0.3837, over 20782.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1542, cr_loss=0.3763, over 4088870.93 frames. ], batch size: 56, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:48:34,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=459796.1666666667, ans=0.2 2024-09-16 14:49:08,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2024-09-16 14:49:23,159 INFO [train.py:1198] (0/2) Epoch 26, batch 2550, loss[loss=0.2575, ctc_loss=0.1758, cr_loss=0.4086, over 20322.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1543, cr_loss=0.3763, over 4081265.85 frames. ], batch size: 74, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:50:18,915 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.92 vs. limit=10.0 2024-09-16 14:50:29,888 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.150e+02 2.315e+02 2.557e+02 3.804e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-16 14:50:39,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=460022.8333333333, ans=0.0 2024-09-16 14:50:42,041 INFO [train.py:1198] (0/2) Epoch 26, batch 2600, loss[loss=0.2626, ctc_loss=0.1786, cr_loss=0.4198, over 19564.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1546, cr_loss=0.376, over 4082469.88 frames. ], batch size: 90, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:50:46,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=460051.1666666667, ans=0.125 2024-09-16 14:50:53,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2024-09-16 14:50:56,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2024-09-16 14:51:10,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=460107.8333333333, ans=0.0 2024-09-16 14:51:10,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=460107.8333333333, ans=0.0 2024-09-16 14:51:12,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=460107.8333333333, ans=0.125 2024-09-16 14:51:45,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=460164.5, ans=0.0 2024-09-16 14:51:57,399 INFO [train.py:1198] (0/2) Epoch 26, batch 2650, loss[loss=0.2059, ctc_loss=0.1388, cr_loss=0.3356, over 21064.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1556, cr_loss=0.3781, over 4077202.84 frames. ], batch size: 53, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:52:22,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=460221.1666666667, ans=0.125 2024-09-16 14:52:28,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=22.5 2024-09-16 14:52:32,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=460249.5, ans=0.0 2024-09-16 14:53:03,972 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.148e+02 2.328e+02 2.504e+02 3.630e+02, threshold=4.656e+02, percent-clipped=0.0 2024-09-16 14:53:08,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=460306.1666666667, ans=0.125 2024-09-16 14:53:16,069 INFO [train.py:1198] (0/2) Epoch 26, batch 2700, loss[loss=0.2152, ctc_loss=0.1452, cr_loss=0.3499, over 21055.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3786, over 4079532.93 frames. ], batch size: 53, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:54:23,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=460447.8333333333, ans=0.0 2024-09-16 14:54:31,823 INFO [train.py:1198] (0/2) Epoch 26, batch 2750, loss[loss=0.2098, ctc_loss=0.1374, cr_loss=0.3618, over 20979.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1557, cr_loss=0.3786, over 4066379.28 frames. ], batch size: 52, lr: 3.20e-03, grad_scale: 16.0 2024-09-16 14:55:05,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=12.0 2024-09-16 14:55:25,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-09-16 14:55:36,833 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.176e+02 2.274e+02 2.489e+02 3.965e+02, threshold=4.548e+02, percent-clipped=0.0 2024-09-16 14:55:50,348 INFO [train.py:1198] (0/2) Epoch 26, batch 2800, loss[loss=0.2029, ctc_loss=0.1346, cr_loss=0.3415, over 20938.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1556, cr_loss=0.3781, over 4070018.74 frames. ], batch size: 50, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:56:25,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=460674.5, ans=0.1 2024-09-16 14:56:41,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=22.5 2024-09-16 14:56:51,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.54 vs. limit=15.0 2024-09-16 14:56:57,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=460731.1666666667, ans=0.2 2024-09-16 14:57:06,130 INFO [train.py:1198] (0/2) Epoch 26, batch 2850, loss[loss=0.2355, ctc_loss=0.1616, cr_loss=0.3691, over 20312.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.156, cr_loss=0.3789, over 4079460.24 frames. ], batch size: 74, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:57:08,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2024-09-16 14:57:11,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=460759.5, ans=0.125 2024-09-16 14:57:34,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=460816.1666666667, ans=0.125 2024-09-16 14:57:52,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=460844.5, ans=0.5 2024-09-16 14:58:13,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.101e+02 2.214e+02 2.418e+02 3.989e+02, threshold=4.429e+02, percent-clipped=0.0 2024-09-16 14:58:20,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=460872.8333333333, ans=0.125 2024-09-16 14:58:21,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=460872.8333333333, ans=0.025 2024-09-16 14:58:24,441 INFO [train.py:1198] (0/2) Epoch 26, batch 2900, loss[loss=0.2277, ctc_loss=0.1548, cr_loss=0.3644, over 21024.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1551, cr_loss=0.3779, over 4087324.47 frames. ], batch size: 63, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 14:58:36,037 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.15 vs. limit=22.5 2024-09-16 14:58:41,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=460929.5, ans=0.1 2024-09-16 14:59:05,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=460957.8333333333, ans=0.125 2024-09-16 14:59:10,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=460986.1666666667, ans=0.025 2024-09-16 14:59:16,645 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2024-09-16 14:59:26,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=461014.5, ans=0.125 2024-09-16 14:59:40,283 INFO [train.py:1198] (0/2) Epoch 26, batch 2950, loss[loss=0.213, ctc_loss=0.1403, cr_loss=0.3635, over 21062.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1549, cr_loss=0.3775, over 4086728.01 frames. ], batch size: 56, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 15:00:18,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=461099.5, ans=0.04949747468305833 2024-09-16 15:00:26,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=461127.8333333333, ans=0.5 2024-09-16 15:00:45,476 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.151e+02 2.273e+02 2.442e+02 6.156e+02, threshold=4.546e+02, percent-clipped=1.0 2024-09-16 15:00:49,074 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:00:56,234 INFO [train.py:1198] (0/2) Epoch 26, batch 3000, loss[loss=0.2194, ctc_loss=0.1446, cr_loss=0.374, over 21048.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1544, cr_loss=0.377, over 4103968.79 frames. ], batch size: 53, lr: 3.20e-03, grad_scale: 32.0 2024-09-16 15:00:56,235 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 15:01:22,055 INFO [train.py:1230] (0/2) Epoch 26, validation: loss=0.04181, ctc_loss=0.04181, cr_loss=1.185e-14, over 944034.00 frames. 2024-09-16 15:01:22,055 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 15:01:45,683 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2024-09-16 15:01:56,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=461241.1666666667, ans=0.0 2024-09-16 15:02:07,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=461269.5, ans=0.0 2024-09-16 15:02:18,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=461269.5, ans=0.125 2024-09-16 15:02:37,127 INFO [train.py:1198] (0/2) Epoch 26, batch 3050, loss[loss=0.244, ctc_loss=0.1633, cr_loss=0.4032, over 20103.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1548, cr_loss=0.3777, over 4099113.90 frames. ], batch size: 80, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:02:39,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=22.5 2024-09-16 15:02:57,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=461354.5, ans=0.125 2024-09-16 15:03:02,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=22.5 2024-09-16 15:03:05,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=461354.5, ans=0.0 2024-09-16 15:03:23,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=461411.1666666667, ans=0.125 2024-09-16 15:03:39,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=461439.5, ans=0.0 2024-09-16 15:03:47,161 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.072e+02 2.240e+02 2.442e+02 3.474e+02, threshold=4.481e+02, percent-clipped=0.0 2024-09-16 15:03:56,063 INFO [train.py:1198] (0/2) Epoch 26, batch 3100, loss[loss=0.2601, ctc_loss=0.1767, cr_loss=0.4169, over 20856.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1537, cr_loss=0.3763, over 4109986.32 frames. ], batch size: 65, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:04:25,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=12.0 2024-09-16 15:04:32,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=461524.5, ans=0.125 2024-09-16 15:05:04,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=461581.1666666667, ans=0.125 2024-09-16 15:05:11,838 INFO [train.py:1198] (0/2) Epoch 26, batch 3150, loss[loss=0.2515, ctc_loss=0.1703, cr_loss=0.4062, over 20079.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1537, cr_loss=0.3761, over 4100612.46 frames. ], batch size: 80, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:05:42,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=461666.1666666667, ans=0.125 2024-09-16 15:06:18,785 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.175e+02 2.280e+02 2.487e+02 8.229e+02, threshold=4.561e+02, percent-clipped=2.0 2024-09-16 15:06:26,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=461751.1666666667, ans=0.125 2024-09-16 15:06:28,065 INFO [train.py:1198] (0/2) Epoch 26, batch 3200, loss[loss=0.2576, ctc_loss=0.1752, cr_loss=0.4121, over 20850.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1543, cr_loss=0.3766, over 4103975.64 frames. ], batch size: 65, lr: 3.19e-03, grad_scale: 32.0 2024-09-16 15:06:33,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-09-16 15:06:35,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=461751.1666666667, ans=0.125 2024-09-16 15:07:29,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=461836.1666666667, ans=0.125 2024-09-16 15:07:33,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-16 15:07:46,854 INFO [train.py:1198] (0/2) Epoch 26, batch 3250, loss[loss=0.2335, ctc_loss=0.1567, cr_loss=0.3841, over 20960.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.154, cr_loss=0.3765, over 4086286.54 frames. ], batch size: 64, lr: 3.19e-03, grad_scale: 32.0 2024-09-16 15:08:41,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=461977.8333333333, ans=0.125 2024-09-16 15:08:53,458 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.126e+02 2.282e+02 2.441e+02 3.355e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-16 15:08:55,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=462006.1666666667, ans=0.125 2024-09-16 15:09:02,406 INFO [train.py:1198] (0/2) Epoch 26, batch 3300, loss[loss=0.2299, ctc_loss=0.1531, cr_loss=0.3842, over 21038.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1531, cr_loss=0.3754, over 4102729.40 frames. ], batch size: 62, lr: 3.19e-03, grad_scale: 32.0 2024-09-16 15:09:05,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=462034.5, ans=0.0 2024-09-16 15:09:05,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=462034.5, ans=0.0 2024-09-16 15:09:14,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=462034.5, ans=0.125 2024-09-16 15:09:24,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=462062.8333333333, ans=22.5 2024-09-16 15:10:16,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=462147.8333333333, ans=15.0 2024-09-16 15:10:20,712 INFO [train.py:1198] (0/2) Epoch 26, batch 3350, loss[loss=0.2423, ctc_loss=0.1625, cr_loss=0.399, over 21069.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1528, cr_loss=0.3755, over 4118275.45 frames. ], batch size: 59, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:10:33,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=462176.1666666667, ans=0.125 2024-09-16 15:11:25,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=462289.5, ans=0.125 2024-09-16 15:11:28,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.761e+02 2.084e+02 2.230e+02 2.427e+02 3.051e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-16 15:11:35,999 INFO [train.py:1198] (0/2) Epoch 26, batch 3400, loss[loss=0.2368, ctc_loss=0.1591, cr_loss=0.3886, over 20978.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3761, over 4126863.15 frames. ], batch size: 55, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:12:34,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=462402.8333333333, ans=0.2 2024-09-16 15:12:47,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=462431.1666666667, ans=15.0 2024-09-16 15:12:52,462 INFO [train.py:1198] (0/2) Epoch 26, batch 3450, loss[loss=0.2644, ctc_loss=0.1822, cr_loss=0.4113, over 19248.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1525, cr_loss=0.3754, over 4116757.60 frames. ], batch size: 90, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:12:52,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=462459.5, ans=0.125 2024-09-16 15:13:29,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=462516.1666666667, ans=0.2 2024-09-16 15:13:34,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=462516.1666666667, ans=0.05 2024-09-16 15:14:03,200 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.154e+02 2.307e+02 2.519e+02 3.827e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-16 15:14:10,894 INFO [train.py:1198] (0/2) Epoch 26, batch 3500, loss[loss=0.2094, ctc_loss=0.1401, cr_loss=0.3465, over 20800.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.152, cr_loss=0.3739, over 4124060.94 frames. ], batch size: 53, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:14:11,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=462601.1666666667, ans=0.125 2024-09-16 15:14:29,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-09-16 15:14:38,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=462629.5, ans=0.0 2024-09-16 15:14:52,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=462657.8333333333, ans=0.125 2024-09-16 15:14:58,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=462686.1666666667, ans=0.0 2024-09-16 15:15:13,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=462714.5, ans=0.0 2024-09-16 15:15:20,198 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-16 15:15:29,939 INFO [train.py:1198] (0/2) Epoch 26, batch 3550, loss[loss=0.2173, ctc_loss=0.1448, cr_loss=0.3626, over 20758.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1524, cr_loss=0.3743, over 4114602.47 frames. ], batch size: 53, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:15:31,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=462742.8333333333, ans=0.0 2024-09-16 15:16:01,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=15.0 2024-09-16 15:16:37,869 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.125e+02 2.236e+02 2.400e+02 6.141e+02, threshold=4.472e+02, percent-clipped=1.0 2024-09-16 15:16:45,234 INFO [train.py:1198] (0/2) Epoch 26, batch 3600, loss[loss=0.247, ctc_loss=0.1651, cr_loss=0.4095, over 20318.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1529, cr_loss=0.3752, over 4116515.98 frames. ], batch size: 74, lr: 3.19e-03, grad_scale: 32.0 2024-09-16 15:16:47,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2024-09-16 15:16:51,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462884.5, ans=0.1 2024-09-16 15:17:02,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=22.5 2024-09-16 15:17:03,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=462912.8333333333, ans=0.125 2024-09-16 15:17:09,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=462912.8333333333, ans=0.125 2024-09-16 15:17:14,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=462941.1666666667, ans=0.125 2024-09-16 15:17:20,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=462941.1666666667, ans=10.0 2024-09-16 15:18:01,026 INFO [train.py:1198] (0/2) Epoch 26, batch 3650, loss[loss=0.2058, ctc_loss=0.1364, cr_loss=0.3468, over 20988.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1537, cr_loss=0.377, over 4110333.13 frames. ], batch size: 52, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:18:09,545 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.54 vs. limit=15.0 2024-09-16 15:18:23,268 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.31 vs. limit=10.0 2024-09-16 15:18:26,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=463054.5, ans=0.0 2024-09-16 15:19:01,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-09-16 15:19:12,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=463139.5, ans=0.025 2024-09-16 15:19:13,813 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.145e+02 2.278e+02 2.496e+02 3.091e+02, threshold=4.555e+02, percent-clipped=0.0 2024-09-16 15:19:19,973 INFO [train.py:1198] (0/2) Epoch 26, batch 3700, loss[loss=0.2449, ctc_loss=0.1677, cr_loss=0.3858, over 20662.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1545, cr_loss=0.3775, over 4106643.44 frames. ], batch size: 66, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:19:26,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463167.8333333333, ans=0.1 2024-09-16 15:19:35,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=463196.1666666667, ans=0.125 2024-09-16 15:19:38,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=463196.1666666667, ans=0.125 2024-09-16 15:20:21,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=463281.1666666667, ans=0.125 2024-09-16 15:20:23,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=463281.1666666667, ans=0.125 2024-09-16 15:20:29,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=463281.1666666667, ans=0.1 2024-09-16 15:20:34,906 INFO [train.py:1198] (0/2) Epoch 26, batch 3750, loss[loss=0.2069, ctc_loss=0.1365, cr_loss=0.3524, over 21084.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1546, cr_loss=0.3778, over 4100488.45 frames. ], batch size: 53, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:20:46,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=22.5 2024-09-16 15:21:02,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=463337.8333333333, ans=0.125 2024-09-16 15:21:13,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=463366.1666666667, ans=0.0 2024-09-16 15:21:19,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=463366.1666666667, ans=0.125 2024-09-16 15:21:28,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=463394.5, ans=0.1 2024-09-16 15:21:47,258 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.062e+02 2.213e+02 2.366e+02 2.977e+02, threshold=4.426e+02, percent-clipped=0.0 2024-09-16 15:21:53,247 INFO [train.py:1198] (0/2) Epoch 26, batch 3800, loss[loss=0.2434, ctc_loss=0.1658, cr_loss=0.3881, over 19410.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1543, cr_loss=0.3768, over 4110152.97 frames. ], batch size: 90, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:22:37,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=463536.1666666667, ans=0.0 2024-09-16 15:23:01,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463564.5, ans=0.1 2024-09-16 15:23:08,660 INFO [train.py:1198] (0/2) Epoch 26, batch 3850, loss[loss=0.2409, ctc_loss=0.1622, cr_loss=0.3937, over 21033.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.154, cr_loss=0.3769, over 4111724.64 frames. ], batch size: 62, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:23:36,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=463621.1666666667, ans=0.025 2024-09-16 15:23:42,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=463649.5, ans=0.07 2024-09-16 15:23:48,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=463649.5, ans=0.2 2024-09-16 15:24:11,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=463706.1666666667, ans=0.0 2024-09-16 15:24:21,265 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.136e+02 2.274e+02 2.464e+02 7.165e+02, threshold=4.548e+02, percent-clipped=1.0 2024-09-16 15:24:27,160 INFO [train.py:1198] (0/2) Epoch 26, batch 3900, loss[loss=0.2355, ctc_loss=0.1578, cr_loss=0.3884, over 20999.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1536, cr_loss=0.376, over 4114984.12 frames. ], batch size: 63, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:24:47,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=463762.8333333333, ans=10.0 2024-09-16 15:24:58,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-09-16 15:25:04,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=463791.1666666667, ans=0.1 2024-09-16 15:25:43,473 INFO [train.py:1198] (0/2) Epoch 26, batch 3950, loss[loss=0.2088, ctc_loss=0.1383, cr_loss=0.3522, over 20863.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1531, cr_loss=0.3757, over 4119687.32 frames. ], batch size: 57, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:26:04,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=463904.5, ans=0.125 2024-09-16 15:26:33,744 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:26:37,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.45 vs. limit=10.0 2024-09-16 15:26:39,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=463961.1666666667, ans=0.125 2024-09-16 15:26:56,225 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.103e+02 2.186e+02 2.306e+02 3.813e+02, threshold=4.373e+02, percent-clipped=0.0 2024-09-16 15:27:01,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=464017.8333333333, ans=0.125 2024-09-16 15:27:02,316 INFO [train.py:1198] (0/2) Epoch 26, batch 4000, loss[loss=0.2268, ctc_loss=0.153, cr_loss=0.3692, over 21018.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1525, cr_loss=0.3743, over 4119335.72 frames. ], batch size: 62, lr: 3.19e-03, grad_scale: 32.0 2024-09-16 15:27:09,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=464017.8333333333, ans=0.07 2024-09-16 15:27:12,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=464017.8333333333, ans=0.125 2024-09-16 15:27:30,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=464046.1666666667, ans=0.125 2024-09-16 15:27:37,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=464074.5, ans=0.125 2024-09-16 15:27:51,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=464102.8333333333, ans=0.025 2024-09-16 15:27:56,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=464102.8333333333, ans=0.125 2024-09-16 15:28:13,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=464131.1666666667, ans=0.035 2024-09-16 15:28:13,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=464131.1666666667, ans=0.0 2024-09-16 15:28:18,847 INFO [train.py:1198] (0/2) Epoch 26, batch 4050, loss[loss=0.1962, ctc_loss=0.1282, cr_loss=0.34, over 20299.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.153, cr_loss=0.3755, over 4122957.61 frames. ], batch size: 45, lr: 3.19e-03, grad_scale: 16.0 2024-09-16 15:28:26,743 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:28:39,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.94 vs. limit=10.0 2024-09-16 15:28:43,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=464187.8333333333, ans=0.0 2024-09-16 15:28:46,454 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:28:52,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=15.0 2024-09-16 15:28:58,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=464216.1666666667, ans=0.125 2024-09-16 15:29:27,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464272.8333333333, ans=0.1 2024-09-16 15:29:29,880 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.121e+02 2.242e+02 2.394e+02 2.748e+02, threshold=4.485e+02, percent-clipped=0.0 2024-09-16 15:29:34,337 INFO [train.py:1198] (0/2) Epoch 26, batch 4100, loss[loss=0.221, ctc_loss=0.1503, cr_loss=0.3534, over 20650.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1537, cr_loss=0.3764, over 4116767.61 frames. ], batch size: 68, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:30:36,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=464414.5, ans=0.0 2024-09-16 15:30:53,242 INFO [train.py:1198] (0/2) Epoch 26, batch 4150, loss[loss=0.2351, ctc_loss=0.1582, cr_loss=0.3849, over 20631.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1537, cr_loss=0.3767, over 4112759.72 frames. ], batch size: 66, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:30:56,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=464442.8333333333, ans=0.0 2024-09-16 15:31:27,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=464499.5, ans=0.0 2024-09-16 15:31:29,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=464499.5, ans=0.125 2024-09-16 15:31:52,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=464556.1666666667, ans=0.025 2024-09-16 15:31:58,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=464556.1666666667, ans=0.0 2024-09-16 15:31:59,951 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-16 15:32:01,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=464556.1666666667, ans=0.1 2024-09-16 15:32:03,748 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.115e+02 2.247e+02 2.423e+02 3.297e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-16 15:32:08,315 INFO [train.py:1198] (0/2) Epoch 26, batch 4200, loss[loss=0.1679, ctc_loss=0.1092, cr_loss=0.2936, over 19842.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1538, cr_loss=0.3764, over 4104246.04 frames. ], batch size: 44, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:32:52,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=464641.1666666667, ans=0.125 2024-09-16 15:32:53,883 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-164000.pt 2024-09-16 15:33:05,919 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.14 vs. limit=10.0 2024-09-16 15:33:25,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=464697.8333333333, ans=0.125 2024-09-16 15:33:27,972 INFO [train.py:1198] (0/2) Epoch 26, batch 4250, loss[loss=0.2553, ctc_loss=0.1694, cr_loss=0.4293, over 20154.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1535, cr_loss=0.3765, over 4106124.99 frames. ], batch size: 80, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:33:37,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=464726.1666666667, ans=0.1 2024-09-16 15:34:18,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464811.1666666667, ans=0.1 2024-09-16 15:34:20,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=464811.1666666667, ans=0.04949747468305833 2024-09-16 15:34:32,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=464839.5, ans=0.2 2024-09-16 15:34:39,010 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.159e+02 2.292e+02 2.417e+02 4.857e+02, threshold=4.584e+02, percent-clipped=1.0 2024-09-16 15:34:42,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464867.8333333333, ans=0.1 2024-09-16 15:34:43,335 INFO [train.py:1198] (0/2) Epoch 26, batch 4300, loss[loss=0.2376, ctc_loss=0.1616, cr_loss=0.38, over 20354.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1537, cr_loss=0.3762, over 4105076.33 frames. ], batch size: 74, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:34:48,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=464867.8333333333, ans=0.125 2024-09-16 15:34:58,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464896.1666666667, ans=0.1 2024-09-16 15:35:10,179 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.78 vs. limit=10.0 2024-09-16 15:35:11,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.22 vs. limit=15.0 2024-09-16 15:35:12,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=464924.5, ans=0.1 2024-09-16 15:35:23,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464924.5, ans=0.1 2024-09-16 15:35:27,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=464952.8333333333, ans=0.04949747468305833 2024-09-16 15:35:41,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=464952.8333333333, ans=0.125 2024-09-16 15:35:56,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=464981.1666666667, ans=0.125 2024-09-16 15:36:02,495 INFO [train.py:1198] (0/2) Epoch 26, batch 4350, loss[loss=0.2459, ctc_loss=0.1634, cr_loss=0.4125, over 20371.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1541, cr_loss=0.3769, over 4099784.32 frames. ], batch size: 74, lr: 3.18e-03, grad_scale: 16.0 2024-09-16 15:36:26,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2024-09-16 15:36:32,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465066.1666666667, ans=0.1 2024-09-16 15:36:33,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=465066.1666666667, ans=0.125 2024-09-16 15:36:45,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=465066.1666666667, ans=0.125 2024-09-16 15:37:00,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=465094.5, ans=0.125 2024-09-16 15:37:12,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=465122.8333333333, ans=0.125 2024-09-16 15:37:13,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-16 15:37:13,807 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.115e+02 2.278e+02 2.465e+02 4.083e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-16 15:37:18,464 INFO [train.py:1198] (0/2) Epoch 26, batch 4400, loss[loss=0.2189, ctc_loss=0.1439, cr_loss=0.3749, over 21003.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1543, cr_loss=0.3766, over 4095660.30 frames. ], batch size: 63, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:37:24,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=465151.1666666667, ans=0.125 2024-09-16 15:37:26,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=465151.1666666667, ans=0.125 2024-09-16 15:37:43,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=465179.5, ans=0.125 2024-09-16 15:37:59,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=465207.8333333333, ans=0.0 2024-09-16 15:38:15,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=465236.1666666667, ans=0.2 2024-09-16 15:38:37,938 INFO [train.py:1198] (0/2) Epoch 26, batch 4450, loss[loss=0.2373, ctc_loss=0.1556, cr_loss=0.4086, over 21065.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1541, cr_loss=0.377, over 4100640.87 frames. ], batch size: 59, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:39:49,130 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.121e+02 2.254e+02 2.431e+02 3.646e+02, threshold=4.507e+02, percent-clipped=0.0 2024-09-16 15:39:53,812 INFO [train.py:1198] (0/2) Epoch 26, batch 4500, loss[loss=0.2383, ctc_loss=0.1631, cr_loss=0.3765, over 20045.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1535, cr_loss=0.3766, over 4111471.58 frames. ], batch size: 80, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:40:59,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2024-09-16 15:41:05,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=465547.8333333333, ans=0.125 2024-09-16 15:41:09,454 INFO [train.py:1198] (0/2) Epoch 26, batch 4550, loss[loss=0.2241, ctc_loss=0.1499, cr_loss=0.3708, over 20793.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1539, cr_loss=0.3775, over 4117590.73 frames. ], batch size: 53, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:41:11,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=12.0 2024-09-16 15:41:19,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=465576.1666666667, ans=0.125 2024-09-16 15:41:28,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-09-16 15:42:00,528 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=12.0 2024-09-16 15:42:10,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=465661.1666666667, ans=0.05 2024-09-16 15:42:23,829 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.165e+02 2.290e+02 2.435e+02 3.795e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-16 15:42:28,286 INFO [train.py:1198] (0/2) Epoch 26, batch 4600, loss[loss=0.2252, ctc_loss=0.1543, cr_loss=0.3548, over 21084.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1527, cr_loss=0.375, over 4107824.30 frames. ], batch size: 59, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:42:43,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=465746.1666666667, ans=0.025 2024-09-16 15:43:23,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=465802.8333333333, ans=0.125 2024-09-16 15:43:25,233 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-16 15:43:26,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=465802.8333333333, ans=0.0 2024-09-16 15:43:35,601 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:43:44,115 INFO [train.py:1198] (0/2) Epoch 26, batch 4650, loss[loss=0.2576, ctc_loss=0.1759, cr_loss=0.4084, over 20843.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1532, cr_loss=0.3761, over 4110132.81 frames. ], batch size: 65, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:44:04,116 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:44:05,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=465887.8333333333, ans=0.025 2024-09-16 15:44:58,467 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.162e+02 2.291e+02 2.463e+02 5.803e+02, threshold=4.581e+02, percent-clipped=1.0 2024-09-16 15:45:00,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2024-09-16 15:45:03,079 INFO [train.py:1198] (0/2) Epoch 26, batch 4700, loss[loss=0.2621, ctc_loss=0.1837, cr_loss=0.3923, over 20058.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.153, cr_loss=0.3752, over 4113395.76 frames. ], batch size: 80, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:45:03,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2024-09-16 15:45:17,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=466029.5, ans=0.0 2024-09-16 15:45:47,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2024-09-16 15:46:02,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=466114.5, ans=0.0 2024-09-16 15:46:19,020 INFO [train.py:1198] (0/2) Epoch 26, batch 4750, loss[loss=0.2423, ctc_loss=0.1628, cr_loss=0.3973, over 20834.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1542, cr_loss=0.3769, over 4100376.66 frames. ], batch size: 59, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:46:53,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=15.0 2024-09-16 15:47:09,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=466227.8333333333, ans=0.125 2024-09-16 15:47:33,263 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.136e+02 2.273e+02 2.486e+02 3.452e+02, threshold=4.546e+02, percent-clipped=0.0 2024-09-16 15:47:37,851 INFO [train.py:1198] (0/2) Epoch 26, batch 4800, loss[loss=0.2573, ctc_loss=0.1756, cr_loss=0.4083, over 21065.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1541, cr_loss=0.3768, over 4102061.73 frames. ], batch size: 59, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:47:45,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=466284.5, ans=0.125 2024-09-16 15:48:35,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=22.5 2024-09-16 15:48:43,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=466397.8333333333, ans=0.0 2024-09-16 15:48:49,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=466397.8333333333, ans=0.0 2024-09-16 15:48:54,242 INFO [train.py:1198] (0/2) Epoch 26, batch 4850, loss[loss=0.1969, ctc_loss=0.13, cr_loss=0.3347, over 20966.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1543, cr_loss=0.3775, over 4104279.87 frames. ], batch size: 48, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:48:57,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=466426.1666666667, ans=0.0 2024-09-16 15:48:59,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=12.0 2024-09-16 15:49:17,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=466454.5, ans=0.0 2024-09-16 15:49:43,675 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.43 vs. limit=15.0 2024-09-16 15:49:52,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=466511.1666666667, ans=0.125 2024-09-16 15:50:08,409 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.094e+02 2.202e+02 2.350e+02 3.828e+02, threshold=4.405e+02, percent-clipped=0.0 2024-09-16 15:50:12,884 INFO [train.py:1198] (0/2) Epoch 26, batch 4900, loss[loss=0.229, ctc_loss=0.153, cr_loss=0.3803, over 21083.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1537, cr_loss=0.3768, over 4111120.24 frames. ], batch size: 59, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:51:11,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.72 vs. limit=10.0 2024-09-16 15:51:26,622 INFO [train.py:1198] (0/2) Epoch 26, batch 4950, loss[loss=0.2247, ctc_loss=0.1474, cr_loss=0.3867, over 21028.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1531, cr_loss=0.3754, over 4108615.58 frames. ], batch size: 62, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:51:54,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=466766.1666666667, ans=0.0 2024-09-16 15:52:11,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=466794.5, ans=0.0 2024-09-16 15:52:35,978 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.122e+02 2.242e+02 2.372e+02 3.200e+02, threshold=4.484e+02, percent-clipped=0.0 2024-09-16 15:52:40,618 INFO [train.py:1198] (0/2) Epoch 26, batch 5000, loss[loss=0.2586, ctc_loss=0.1736, cr_loss=0.4248, over 20961.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1537, cr_loss=0.3768, over 4103773.67 frames. ], batch size: 58, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:52:42,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466851.1666666667, ans=0.1 2024-09-16 15:52:44,108 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:52:56,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=466879.5, ans=0.07 2024-09-16 15:53:08,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466879.5, ans=0.1 2024-09-16 15:53:24,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=466936.1666666667, ans=0.0 2024-09-16 15:53:29,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=466936.1666666667, ans=0.125 2024-09-16 15:53:49,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=466964.5, ans=0.0 2024-09-16 15:53:55,258 INFO [train.py:1198] (0/2) Epoch 26, batch 5050, loss[loss=0.2315, ctc_loss=0.1599, cr_loss=0.358, over 20294.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1544, cr_loss=0.3777, over 4101401.48 frames. ], batch size: 74, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:54:22,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=467021.1666666667, ans=0.025 2024-09-16 15:54:27,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=467049.5, ans=0.0 2024-09-16 15:54:42,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=467077.8333333333, ans=0.2 2024-09-16 15:55:04,765 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.108e+02 2.244e+02 2.429e+02 3.114e+02, threshold=4.489e+02, percent-clipped=0.0 2024-09-16 15:55:09,346 INFO [train.py:1198] (0/2) Epoch 26, batch 5100, loss[loss=0.2451, ctc_loss=0.1651, cr_loss=0.4002, over 20701.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1547, cr_loss=0.3781, over 4096409.99 frames. ], batch size: 71, lr: 3.18e-03, grad_scale: 32.0 2024-09-16 15:55:18,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=467134.5, ans=0.0 2024-09-16 15:55:26,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.67 vs. limit=15.0 2024-09-16 15:55:32,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=467162.8333333333, ans=0.125 2024-09-16 15:55:50,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=22.5 2024-09-16 15:55:51,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=467191.1666666667, ans=0.0 2024-09-16 15:56:26,768 INFO [train.py:1198] (0/2) Epoch 26, batch 5150, loss[loss=0.2423, ctc_loss=0.164, cr_loss=0.3916, over 21044.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.154, cr_loss=0.3769, over 4098686.31 frames. ], batch size: 56, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 15:56:30,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=467276.1666666667, ans=0.0 2024-09-16 15:57:10,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=467361.1666666667, ans=0.125 2024-09-16 15:57:36,420 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.070e+02 2.195e+02 2.383e+02 3.364e+02, threshold=4.389e+02, percent-clipped=0.0 2024-09-16 15:57:41,016 INFO [train.py:1198] (0/2) Epoch 26, batch 5200, loss[loss=0.2525, ctc_loss=0.1685, cr_loss=0.4201, over 20766.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1549, cr_loss=0.3789, over 4084090.64 frames. ], batch size: 56, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 15:58:25,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=467502.8333333333, ans=0.025 2024-09-16 15:58:55,831 INFO [train.py:1198] (0/2) Epoch 26, batch 5250, loss[loss=0.2318, ctc_loss=0.1539, cr_loss=0.3895, over 20777.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1548, cr_loss=0.3781, over 4066294.56 frames. ], batch size: 56, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 15:59:21,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=467587.8333333333, ans=0.2 2024-09-16 15:59:27,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=467616.1666666667, ans=0.125 2024-09-16 15:59:33,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=467616.1666666667, ans=0.0 2024-09-16 15:59:39,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=467616.1666666667, ans=0.04949747468305833 2024-09-16 15:59:45,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=467644.5, ans=0.125 2024-09-16 16:00:08,683 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.171e+02 2.295e+02 2.457e+02 5.717e+02, threshold=4.590e+02, percent-clipped=1.0 2024-09-16 16:00:13,237 INFO [train.py:1198] (0/2) Epoch 26, batch 5300, loss[loss=0.2327, ctc_loss=0.1557, cr_loss=0.3849, over 21028.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1543, cr_loss=0.3773, over 4073921.58 frames. ], batch size: 63, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:01:02,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=467786.1666666667, ans=0.125 2024-09-16 16:01:27,960 INFO [train.py:1198] (0/2) Epoch 26, batch 5350, loss[loss=0.2471, ctc_loss=0.1677, cr_loss=0.3968, over 20714.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1544, cr_loss=0.3775, over 4083109.95 frames. ], batch size: 71, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:01:28,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=467842.8333333333, ans=0.0 2024-09-16 16:02:27,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=467956.1666666667, ans=0.0 2024-09-16 16:02:37,687 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.128e+02 2.209e+02 2.382e+02 3.490e+02, threshold=4.419e+02, percent-clipped=0.0 2024-09-16 16:02:42,222 INFO [train.py:1198] (0/2) Epoch 26, batch 5400, loss[loss=0.2654, ctc_loss=0.1772, cr_loss=0.441, over 20689.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1537, cr_loss=0.377, over 4089892.30 frames. ], batch size: 71, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:02:47,308 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-09-16 16:02:49,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=467984.5, ans=0.125 2024-09-16 16:03:33,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=468069.5, ans=0.025 2024-09-16 16:03:57,089 INFO [train.py:1198] (0/2) Epoch 26, batch 5450, loss[loss=0.2205, ctc_loss=0.1493, cr_loss=0.3559, over 21045.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1538, cr_loss=0.3769, over 4088404.36 frames. ], batch size: 56, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:04:04,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=468126.1666666667, ans=0.2 2024-09-16 16:04:39,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-09-16 16:04:46,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-16 16:05:00,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=468239.5, ans=0.0 2024-09-16 16:05:08,761 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.091e+02 2.226e+02 2.391e+02 3.014e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-16 16:05:13,222 INFO [train.py:1198] (0/2) Epoch 26, batch 5500, loss[loss=0.2879, ctc_loss=0.2058, cr_loss=0.4107, over 14544.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1542, cr_loss=0.3777, over 4089581.25 frames. ], batch size: 149, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:05:13,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=468267.8333333333, ans=0.125 2024-09-16 16:05:25,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468267.8333333333, ans=0.1 2024-09-16 16:05:25,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=468267.8333333333, ans=0.0 2024-09-16 16:05:59,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=468352.8333333333, ans=0.125 2024-09-16 16:06:12,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468381.1666666667, ans=0.1 2024-09-16 16:06:27,205 INFO [train.py:1198] (0/2) Epoch 26, batch 5550, loss[loss=0.223, ctc_loss=0.1493, cr_loss=0.3685, over 21042.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1548, cr_loss=0.3784, over 4086634.71 frames. ], batch size: 62, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:07:22,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=468494.5, ans=0.025 2024-09-16 16:07:25,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=468522.8333333333, ans=0.125 2024-09-16 16:07:36,782 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.102e+02 2.239e+02 2.400e+02 3.458e+02, threshold=4.477e+02, percent-clipped=0.0 2024-09-16 16:07:40,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=468551.1666666667, ans=0.2 2024-09-16 16:07:41,361 INFO [train.py:1198] (0/2) Epoch 26, batch 5600, loss[loss=0.1798, ctc_loss=0.1175, cr_loss=0.3116, over 20969.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1551, cr_loss=0.3789, over 4080318.78 frames. ], batch size: 51, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:07:55,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-09-16 16:08:05,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=468579.5, ans=0.04949747468305833 2024-09-16 16:08:40,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=468636.1666666667, ans=0.2 2024-09-16 16:08:58,192 INFO [train.py:1198] (0/2) Epoch 26, batch 5650, loss[loss=0.2647, ctc_loss=0.1779, cr_loss=0.4339, over 20999.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1543, cr_loss=0.3779, over 4090978.52 frames. ], batch size: 64, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:09:16,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468721.1666666667, ans=0.1 2024-09-16 16:09:25,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=468721.1666666667, ans=0.125 2024-09-16 16:10:02,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=468806.1666666667, ans=0.1 2024-09-16 16:10:08,273 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.097e+02 2.214e+02 2.376e+02 4.979e+02, threshold=4.428e+02, percent-clipped=2.0 2024-09-16 16:10:12,673 INFO [train.py:1198] (0/2) Epoch 26, batch 5700, loss[loss=0.2575, ctc_loss=0.1741, cr_loss=0.4167, over 20983.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1547, cr_loss=0.3787, over 4093934.86 frames. ], batch size: 67, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:10:13,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-09-16 16:10:14,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=468834.5, ans=0.5 2024-09-16 16:11:01,916 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:11:26,925 INFO [train.py:1198] (0/2) Epoch 26, batch 5750, loss[loss=0.1974, ctc_loss=0.1287, cr_loss=0.3432, over 21039.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1538, cr_loss=0.3765, over 4094935.27 frames. ], batch size: 53, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:11:56,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=469032.8333333333, ans=0.125 2024-09-16 16:12:02,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=469032.8333333333, ans=0.1 2024-09-16 16:12:03,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=12.0 2024-09-16 16:12:05,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=469032.8333333333, ans=0.025 2024-09-16 16:12:14,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=469061.1666666667, ans=0.125 2024-09-16 16:12:16,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=469061.1666666667, ans=0.125 2024-09-16 16:12:17,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=469061.1666666667, ans=0.0 2024-09-16 16:12:32,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=469089.5, ans=0.125 2024-09-16 16:12:36,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.114e+02 2.240e+02 2.384e+02 3.175e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-16 16:12:40,977 INFO [train.py:1198] (0/2) Epoch 26, batch 5800, loss[loss=0.1978, ctc_loss=0.1306, cr_loss=0.3361, over 20969.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1537, cr_loss=0.3764, over 4098710.81 frames. ], batch size: 51, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:12:44,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=469117.8333333333, ans=0.0 2024-09-16 16:13:06,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469146.1666666667, ans=0.1 2024-09-16 16:13:57,610 INFO [train.py:1198] (0/2) Epoch 26, batch 5850, loss[loss=0.2234, ctc_loss=0.1507, cr_loss=0.3633, over 20973.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1534, cr_loss=0.3758, over 4101113.21 frames. ], batch size: 52, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:14:15,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=469287.8333333333, ans=0.0 2024-09-16 16:14:18,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=469287.8333333333, ans=0.0 2024-09-16 16:14:20,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=469287.8333333333, ans=0.125 2024-09-16 16:14:32,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=469316.1666666667, ans=0.125 2024-09-16 16:15:07,818 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.069e+02 2.237e+02 2.386e+02 3.435e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-16 16:15:12,152 INFO [train.py:1198] (0/2) Epoch 26, batch 5900, loss[loss=0.1938, ctc_loss=0.1265, cr_loss=0.336, over 20904.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1534, cr_loss=0.3758, over 4108837.26 frames. ], batch size: 54, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:15:12,578 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:15:15,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=469401.1666666667, ans=0.0 2024-09-16 16:15:20,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=469401.1666666667, ans=0.1 2024-09-16 16:15:27,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=469429.5, ans=0.125 2024-09-16 16:15:35,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=469429.5, ans=0.025 2024-09-16 16:16:26,409 INFO [train.py:1198] (0/2) Epoch 26, batch 5950, loss[loss=0.2328, ctc_loss=0.1545, cr_loss=0.3915, over 20647.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1533, cr_loss=0.3757, over 4111320.55 frames. ], batch size: 68, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:16:32,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=469542.8333333333, ans=0.2 2024-09-16 16:16:37,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.61 vs. limit=15.0 2024-09-16 16:16:51,325 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=12.0 2024-09-16 16:16:54,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=469571.1666666667, ans=0.125 2024-09-16 16:17:10,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=469599.5, ans=0.2 2024-09-16 16:17:34,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=469656.1666666667, ans=0.025 2024-09-16 16:17:34,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=469656.1666666667, ans=0.0 2024-09-16 16:17:37,881 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-16 16:17:38,525 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.121e+02 2.246e+02 2.355e+02 2.859e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-16 16:17:43,109 INFO [train.py:1198] (0/2) Epoch 26, batch 6000, loss[loss=0.2578, ctc_loss=0.179, cr_loss=0.3942, over 14600.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1533, cr_loss=0.3756, over 4113781.72 frames. ], batch size: 149, lr: 3.17e-03, grad_scale: 32.0 2024-09-16 16:17:43,110 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 16:18:05,664 INFO [train.py:1230] (0/2) Epoch 26, validation: loss=0.0423, ctc_loss=0.0423, cr_loss=1.197e-14, over 944034.00 frames. 2024-09-16 16:18:05,665 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 16:18:17,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=22.5 2024-09-16 16:18:24,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=12.0 2024-09-16 16:19:03,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=469769.5, ans=0.0 2024-09-16 16:19:20,373 INFO [train.py:1198] (0/2) Epoch 26, batch 6050, loss[loss=0.2632, ctc_loss=0.1785, cr_loss=0.4235, over 21009.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1537, cr_loss=0.3761, over 4107664.25 frames. ], batch size: 63, lr: 3.17e-03, grad_scale: 64.0 2024-09-16 16:19:26,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=469826.1666666667, ans=0.125 2024-09-16 16:19:38,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2024-09-16 16:20:31,743 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.107e+02 2.248e+02 2.460e+02 4.227e+02, threshold=4.495e+02, percent-clipped=0.0 2024-09-16 16:20:36,266 INFO [train.py:1198] (0/2) Epoch 26, batch 6100, loss[loss=0.2657, ctc_loss=0.1785, cr_loss=0.4362, over 20977.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1539, cr_loss=0.3765, over 4110452.17 frames. ], batch size: 64, lr: 3.17e-03, grad_scale: 64.0 2024-09-16 16:20:54,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=469996.1666666667, ans=0.07 2024-09-16 16:21:12,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2024-09-16 16:21:35,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-16 16:21:51,273 INFO [train.py:1198] (0/2) Epoch 26, batch 6150, loss[loss=0.1996, ctc_loss=0.1308, cr_loss=0.3439, over 20964.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1545, cr_loss=0.3774, over 4091877.71 frames. ], batch size: 50, lr: 3.16e-03, grad_scale: 64.0 2024-09-16 16:21:55,089 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2024-09-16 16:22:12,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2024-09-16 16:22:19,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=470166.1666666667, ans=0.2 2024-09-16 16:22:29,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=470166.1666666667, ans=0.2 2024-09-16 16:23:00,416 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.124e+02 2.281e+02 2.478e+02 3.468e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-16 16:23:04,728 INFO [train.py:1198] (0/2) Epoch 26, batch 6200, loss[loss=0.2206, ctc_loss=0.1461, cr_loss=0.3723, over 20975.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.155, cr_loss=0.3785, over 4099307.26 frames. ], batch size: 55, lr: 3.16e-03, grad_scale: 64.0 2024-09-16 16:23:20,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=470279.5, ans=0.1 2024-09-16 16:24:18,210 INFO [train.py:1198] (0/2) Epoch 26, batch 6250, loss[loss=0.2513, ctc_loss=0.1705, cr_loss=0.404, over 20655.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1575, cr_loss=0.3816, over 4059943.06 frames. ], batch size: 68, lr: 3.16e-03, grad_scale: 32.0 2024-09-16 16:24:31,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=470392.8333333333, ans=0.2 2024-09-16 16:24:34,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=470421.1666666667, ans=0.125 2024-09-16 16:24:38,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=470421.1666666667, ans=0.125 2024-09-16 16:24:40,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=470421.1666666667, ans=0.125 2024-09-16 16:25:01,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=470449.5, ans=0.1 2024-09-16 16:25:29,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=470506.1666666667, ans=0.025 2024-09-16 16:25:30,093 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.173e+02 2.338e+02 2.531e+02 3.280e+02, threshold=4.676e+02, percent-clipped=0.0 2024-09-16 16:25:32,974 INFO [train.py:1198] (0/2) Epoch 26, batch 6300, loss[loss=0.2461, ctc_loss=0.1649, cr_loss=0.4061, over 20673.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1593, cr_loss=0.3831, over 4006269.61 frames. ], batch size: 71, lr: 3.16e-03, grad_scale: 32.0 2024-09-16 16:25:53,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=470562.8333333333, ans=0.0 2024-09-16 16:26:09,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=470591.1666666667, ans=0.2 2024-09-16 16:26:30,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=470647.8333333333, ans=0.0 2024-09-16 16:26:32,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=470647.8333333333, ans=0.025 2024-09-16 16:26:32,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=470647.8333333333, ans=0.2 2024-09-16 16:26:42,691 INFO [train.py:1198] (0/2) Epoch 26, batch 6350, loss[loss=0.2775, ctc_loss=0.1996, cr_loss=0.3896, over 14525.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1646, cr_loss=0.3868, over 3841489.77 frames. ], batch size: 151, lr: 3.16e-03, grad_scale: 32.0 2024-09-16 16:26:45,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=470676.1666666667, ans=0.125 2024-09-16 16:26:54,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=470676.1666666667, ans=0.125 2024-09-16 16:27:40,245 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-26.pt 2024-09-16 16:28:30,738 INFO [train.py:1198] (0/2) Epoch 27, batch 0, loss[loss=0.2123, ctc_loss=0.14, cr_loss=0.3614, over 21052.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.14, cr_loss=0.3614, over 21052.00 frames. ], batch size: 53, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:28:30,739 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 16:28:41,697 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7233, 4.2986, 3.2514, 3.8330], device='cuda:0') 2024-09-16 16:28:49,097 INFO [train.py:1230] (0/2) Epoch 27, validation: loss=0.04179, ctc_loss=0.04179, cr_loss=1.195e-14, over 944034.00 frames. 2024-09-16 16:28:49,098 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 16:29:00,840 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.404e+02 2.581e+02 2.766e+02 3.710e+02, threshold=5.163e+02, percent-clipped=0.0 2024-09-16 16:29:31,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=470846.1666666667, ans=0.125 2024-09-16 16:29:37,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=470874.5, ans=0.0 2024-09-16 16:29:41,203 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-09-16 16:29:54,813 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=22.5 2024-09-16 16:30:04,700 INFO [train.py:1198] (0/2) Epoch 27, batch 50, loss[loss=0.1947, ctc_loss=0.1305, cr_loss=0.321, over 20946.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1532, cr_loss=0.3742, over 922494.27 frames. ], batch size: 49, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:30:23,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=12.0 2024-09-16 16:30:26,786 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-16 16:30:30,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=470959.5, ans=0.125 2024-09-16 16:30:53,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-16 16:30:54,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=471016.1666666667, ans=0.05 2024-09-16 16:31:14,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-09-16 16:31:20,290 INFO [train.py:1198] (0/2) Epoch 27, batch 100, loss[loss=0.2517, ctc_loss=0.1692, cr_loss=0.4126, over 20329.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1558, cr_loss=0.3785, over 1605044.65 frames. ], batch size: 74, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:31:28,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=471072.8333333333, ans=0.125 2024-09-16 16:31:32,691 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.089e+02 2.219e+02 2.330e+02 2.882e+02, threshold=4.437e+02, percent-clipped=0.0 2024-09-16 16:31:58,733 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2024-09-16 16:32:13,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-16 16:32:17,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=471157.8333333333, ans=0.125 2024-09-16 16:32:18,401 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-16 16:32:41,981 INFO [train.py:1198] (0/2) Epoch 27, batch 150, loss[loss=0.2668, ctc_loss=0.1816, cr_loss=0.4258, over 18064.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1554, cr_loss=0.3786, over 2149480.70 frames. ], batch size: 108, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:33:43,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=471327.8333333333, ans=0.95 2024-09-16 16:33:46,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471327.8333333333, ans=0.1 2024-09-16 16:33:51,354 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.09 vs. limit=15.0 2024-09-16 16:33:58,078 INFO [train.py:1198] (0/2) Epoch 27, batch 200, loss[loss=0.2415, ctc_loss=0.1625, cr_loss=0.3951, over 21063.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1542, cr_loss=0.3774, over 2585735.53 frames. ], batch size: 59, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:34:10,081 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.154e+02 2.299e+02 2.446e+02 3.804e+02, threshold=4.598e+02, percent-clipped=0.0 2024-09-16 16:34:19,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471384.5, ans=0.1 2024-09-16 16:34:23,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=471384.5, ans=0.0 2024-09-16 16:34:51,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=15.0 2024-09-16 16:35:13,753 INFO [train.py:1198] (0/2) Epoch 27, batch 250, loss[loss=0.2641, ctc_loss=0.183, cr_loss=0.4053, over 14057.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1528, cr_loss=0.3751, over 2930118.87 frames. ], batch size: 149, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:35:30,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=471526.1666666667, ans=0.2 2024-09-16 16:35:33,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471526.1666666667, ans=0.1 2024-09-16 16:36:06,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=471582.8333333333, ans=0.125 2024-09-16 16:36:18,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=471611.1666666667, ans=0.0 2024-09-16 16:36:29,082 INFO [train.py:1198] (0/2) Epoch 27, batch 300, loss[loss=0.2126, ctc_loss=0.141, cr_loss=0.3578, over 20993.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1522, cr_loss=0.3746, over 3205078.93 frames. ], batch size: 55, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:36:34,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=471639.5, ans=0.125 2024-09-16 16:36:41,319 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.087e+02 2.225e+02 2.331e+02 3.250e+02, threshold=4.449e+02, percent-clipped=0.0 2024-09-16 16:37:13,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471724.5, ans=0.1 2024-09-16 16:37:40,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=471752.8333333333, ans=0.02 2024-09-16 16:37:47,523 INFO [train.py:1198] (0/2) Epoch 27, batch 350, loss[loss=0.2395, ctc_loss=0.1614, cr_loss=0.3904, over 21062.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1532, cr_loss=0.3756, over 3395165.46 frames. ], batch size: 62, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:38:01,675 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=22.5 2024-09-16 16:38:27,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=471837.8333333333, ans=0.0 2024-09-16 16:38:27,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=471837.8333333333, ans=0.2 2024-09-16 16:38:37,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=471866.1666666667, ans=0.125 2024-09-16 16:38:59,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=22.5 2024-09-16 16:39:05,954 INFO [train.py:1198] (0/2) Epoch 27, batch 400, loss[loss=0.2241, ctc_loss=0.1519, cr_loss=0.3608, over 20966.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1531, cr_loss=0.376, over 3539963.77 frames. ], batch size: 58, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:39:13,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=471922.8333333333, ans=0.125 2024-09-16 16:39:17,626 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.163e+02 2.278e+02 2.427e+02 3.875e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-16 16:39:19,915 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=22.5 2024-09-16 16:39:50,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=472007.8333333333, ans=0.125 2024-09-16 16:39:53,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=472007.8333333333, ans=0.125 2024-09-16 16:40:20,487 INFO [train.py:1198] (0/2) Epoch 27, batch 450, loss[loss=0.1911, ctc_loss=0.1264, cr_loss=0.3236, over 20973.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1527, cr_loss=0.375, over 3658612.68 frames. ], batch size: 51, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:40:22,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=472064.5, ans=0.2 2024-09-16 16:40:33,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=472064.5, ans=0.0 2024-09-16 16:41:02,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-09-16 16:41:24,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=472177.8333333333, ans=0.125 2024-09-16 16:41:36,076 INFO [train.py:1198] (0/2) Epoch 27, batch 500, loss[loss=0.2536, ctc_loss=0.1719, cr_loss=0.4087, over 20688.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1531, cr_loss=0.3762, over 3761530.12 frames. ], batch size: 68, lr: 3.10e-03, grad_scale: 32.0 2024-09-16 16:41:47,942 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.127e+02 2.285e+02 2.456e+02 4.358e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-16 16:41:49,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=472234.5, ans=0.125 2024-09-16 16:42:17,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=472262.8333333333, ans=0.0 2024-09-16 16:42:18,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=472262.8333333333, ans=0.025 2024-09-16 16:42:23,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472291.1666666667, ans=0.1 2024-09-16 16:42:30,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=472291.1666666667, ans=0.125 2024-09-16 16:42:35,845 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=22.5 2024-09-16 16:42:36,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=472319.5, ans=0.125 2024-09-16 16:42:44,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=472319.5, ans=0.125 2024-09-16 16:42:51,555 INFO [train.py:1198] (0/2) Epoch 27, batch 550, loss[loss=0.2715, ctc_loss=0.1794, cr_loss=0.4608, over 20963.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1526, cr_loss=0.375, over 3844105.47 frames. ], batch size: 64, lr: 3.10e-03, grad_scale: 16.0 2024-09-16 16:43:08,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=472376.1666666667, ans=0.125 2024-09-16 16:43:21,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=472376.1666666667, ans=0.125 2024-09-16 16:43:45,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-09-16 16:44:03,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=472461.1666666667, ans=0.0 2024-09-16 16:44:13,511 INFO [train.py:1198] (0/2) Epoch 27, batch 600, loss[loss=0.2275, ctc_loss=0.1535, cr_loss=0.3702, over 20631.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1528, cr_loss=0.3763, over 3905171.31 frames. ], batch size: 68, lr: 3.10e-03, grad_scale: 16.0 2024-09-16 16:44:27,297 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.149e+02 2.267e+02 2.509e+02 3.416e+02, threshold=4.534e+02, percent-clipped=0.0 2024-09-16 16:44:56,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=472546.1666666667, ans=0.1 2024-09-16 16:44:58,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=472574.5, ans=0.1 2024-09-16 16:45:09,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-09-16 16:45:29,874 INFO [train.py:1198] (0/2) Epoch 27, batch 650, loss[loss=0.2018, ctc_loss=0.1335, cr_loss=0.3415, over 20996.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1523, cr_loss=0.3751, over 3947921.08 frames. ], batch size: 52, lr: 3.10e-03, grad_scale: 16.0 2024-09-16 16:45:33,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=472631.1666666667, ans=0.125 2024-09-16 16:45:44,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=472659.5, ans=0.0 2024-09-16 16:45:47,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=472659.5, ans=0.125 2024-09-16 16:46:05,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=472687.8333333333, ans=0.025 2024-09-16 16:46:46,017 INFO [train.py:1198] (0/2) Epoch 27, batch 700, loss[loss=0.2325, ctc_loss=0.1576, cr_loss=0.3745, over 20659.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1526, cr_loss=0.3751, over 3976825.03 frames. ], batch size: 66, lr: 3.10e-03, grad_scale: 16.0 2024-09-16 16:46:49,904 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-16 16:46:59,552 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.094e+02 2.211e+02 2.360e+02 4.010e+02, threshold=4.422e+02, percent-clipped=0.0 2024-09-16 16:47:29,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-16 16:47:55,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=472886.1666666667, ans=0.025 2024-09-16 16:47:56,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=472886.1666666667, ans=0.2 2024-09-16 16:47:59,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=472886.1666666667, ans=0.025 2024-09-16 16:48:02,354 INFO [train.py:1198] (0/2) Epoch 27, batch 750, loss[loss=0.2647, ctc_loss=0.1849, cr_loss=0.3989, over 14655.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1531, cr_loss=0.3765, over 3995948.41 frames. ], batch size: 150, lr: 3.10e-03, grad_scale: 16.0 2024-09-16 16:48:10,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=22.5 2024-09-16 16:48:11,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=12.0 2024-09-16 16:49:21,019 INFO [train.py:1198] (0/2) Epoch 27, batch 800, loss[loss=0.2003, ctc_loss=0.1324, cr_loss=0.3395, over 20974.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1527, cr_loss=0.3762, over 4031026.68 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 16:49:34,708 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.116e+02 2.263e+02 2.433e+02 6.148e+02, threshold=4.526e+02, percent-clipped=1.0 2024-09-16 16:50:29,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=473169.5, ans=0.125 2024-09-16 16:50:40,199 INFO [train.py:1198] (0/2) Epoch 27, batch 850, loss[loss=0.2345, ctc_loss=0.1572, cr_loss=0.3866, over 21030.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1528, cr_loss=0.3767, over 4053647.69 frames. ], batch size: 63, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 16:50:57,714 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:50:58,127 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2024-09-16 16:51:00,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=473226.1666666667, ans=0.125 2024-09-16 16:51:36,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=473282.8333333333, ans=22.5 2024-09-16 16:51:56,261 INFO [train.py:1198] (0/2) Epoch 27, batch 900, loss[loss=0.2444, ctc_loss=0.165, cr_loss=0.3969, over 21055.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1525, cr_loss=0.3754, over 4066358.23 frames. ], batch size: 62, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 16:52:03,071 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=22.5 2024-09-16 16:52:09,865 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.068e+02 2.184e+02 2.300e+02 3.115e+02, threshold=4.368e+02, percent-clipped=0.0 2024-09-16 16:52:49,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=473424.5, ans=0.125 2024-09-16 16:52:52,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=473424.5, ans=0.0 2024-09-16 16:52:57,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473452.8333333333, ans=0.1 2024-09-16 16:53:03,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473452.8333333333, ans=0.1 2024-09-16 16:53:12,026 INFO [train.py:1198] (0/2) Epoch 27, batch 950, loss[loss=0.2101, ctc_loss=0.1378, cr_loss=0.3612, over 20867.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.153, cr_loss=0.3764, over 4068365.02 frames. ], batch size: 54, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 16:53:27,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=473509.5, ans=0.125 2024-09-16 16:54:03,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=473566.1666666667, ans=0.04949747468305833 2024-09-16 16:54:15,895 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-16 16:54:19,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.88 vs. limit=10.0 2024-09-16 16:54:27,005 INFO [train.py:1198] (0/2) Epoch 27, batch 1000, loss[loss=0.2558, ctc_loss=0.1791, cr_loss=0.3839, over 19442.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1537, cr_loss=0.3774, over 4061489.25 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 16:54:42,182 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.157e+02 2.304e+02 2.519e+02 3.560e+02, threshold=4.609e+02, percent-clipped=0.0 2024-09-16 16:54:59,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=473679.5, ans=0.0 2024-09-16 16:55:16,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-09-16 16:55:28,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=473707.8333333333, ans=0.125 2024-09-16 16:55:35,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=473736.1666666667, ans=0.0 2024-09-16 16:55:48,923 INFO [train.py:1198] (0/2) Epoch 27, batch 1050, loss[loss=0.2281, ctc_loss=0.1518, cr_loss=0.3819, over 20987.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1541, cr_loss=0.3786, over 4071564.83 frames. ], batch size: 64, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 16:55:50,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=473764.5, ans=0.035 2024-09-16 16:56:06,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=473792.8333333333, ans=0.125 2024-09-16 16:56:57,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=473877.8333333333, ans=0.07 2024-09-16 16:56:59,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=473877.8333333333, ans=0.125 2024-09-16 16:57:00,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=473877.8333333333, ans=0.035 2024-09-16 16:57:03,651 INFO [train.py:1198] (0/2) Epoch 27, batch 1100, loss[loss=0.232, ctc_loss=0.1518, cr_loss=0.401, over 20960.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1548, cr_loss=0.3801, over 4082889.26 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 16:57:18,430 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.177e+02 2.310e+02 2.412e+02 3.391e+02, threshold=4.620e+02, percent-clipped=0.0 2024-09-16 16:57:20,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=473934.5, ans=0.125 2024-09-16 16:57:26,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=473934.5, ans=0.2 2024-09-16 16:57:59,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=473991.1666666667, ans=0.125 2024-09-16 16:58:18,495 INFO [train.py:1198] (0/2) Epoch 27, batch 1150, loss[loss=0.1937, ctc_loss=0.1263, cr_loss=0.3371, over 20950.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1544, cr_loss=0.3795, over 4099939.06 frames. ], batch size: 50, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 16:58:57,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=474104.5, ans=10.0 2024-09-16 16:59:02,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=22.5 2024-09-16 16:59:07,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-09-16 16:59:11,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=474132.8333333333, ans=0.2 2024-09-16 16:59:25,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=474161.1666666667, ans=0.0 2024-09-16 16:59:33,822 INFO [train.py:1198] (0/2) Epoch 27, batch 1200, loss[loss=0.2363, ctc_loss=0.1595, cr_loss=0.3837, over 20333.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1548, cr_loss=0.38, over 4108965.24 frames. ], batch size: 74, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 16:59:49,085 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.106e+02 2.226e+02 2.407e+02 2.911e+02, threshold=4.451e+02, percent-clipped=0.0 2024-09-16 16:59:55,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=474217.8333333333, ans=0.025 2024-09-16 17:00:03,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=474246.1666666667, ans=0.2 2024-09-16 17:00:06,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=474246.1666666667, ans=0.2 2024-09-16 17:00:30,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=474274.5, ans=0.0 2024-09-16 17:00:33,726 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=12.0 2024-09-16 17:00:37,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=474302.8333333333, ans=0.125 2024-09-16 17:00:52,593 INFO [train.py:1198] (0/2) Epoch 27, batch 1250, loss[loss=0.2199, ctc_loss=0.1488, cr_loss=0.3552, over 21078.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1547, cr_loss=0.38, over 4116064.71 frames. ], batch size: 53, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:01:17,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=474359.5, ans=0.0 2024-09-16 17:01:21,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=474387.8333333333, ans=0.125 2024-09-16 17:01:23,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=474387.8333333333, ans=0.0 2024-09-16 17:01:40,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=474416.1666666667, ans=0.2 2024-09-16 17:02:10,564 INFO [train.py:1198] (0/2) Epoch 27, batch 1300, loss[loss=0.2405, ctc_loss=0.1611, cr_loss=0.3971, over 21024.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1549, cr_loss=0.3801, over 4113871.17 frames. ], batch size: 63, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:02:21,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=474472.8333333333, ans=0.2 2024-09-16 17:02:25,750 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.106e+02 2.284e+02 2.442e+02 3.899e+02, threshold=4.567e+02, percent-clipped=0.0 2024-09-16 17:03:05,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2024-09-16 17:03:26,326 INFO [train.py:1198] (0/2) Epoch 27, batch 1350, loss[loss=0.2603, ctc_loss=0.1782, cr_loss=0.4106, over 20137.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1554, cr_loss=0.3802, over 4111070.49 frames. ], batch size: 80, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:03:37,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2024-09-16 17:03:40,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=474642.8333333333, ans=0.0 2024-09-16 17:03:43,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=474642.8333333333, ans=0.2 2024-09-16 17:03:49,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=474642.8333333333, ans=0.025 2024-09-16 17:03:49,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2024-09-16 17:04:11,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=474699.5, ans=0.125 2024-09-16 17:04:15,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=22.5 2024-09-16 17:04:26,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=474727.8333333333, ans=0.125 2024-09-16 17:04:41,906 INFO [train.py:1198] (0/2) Epoch 27, batch 1400, loss[loss=0.2085, ctc_loss=0.1365, cr_loss=0.3596, over 21060.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1549, cr_loss=0.3794, over 4115389.86 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:04:56,955 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.119e+02 2.233e+02 2.429e+02 4.066e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-16 17:05:53,206 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:05:57,157 INFO [train.py:1198] (0/2) Epoch 27, batch 1450, loss[loss=0.2499, ctc_loss=0.165, cr_loss=0.4245, over 20843.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.155, cr_loss=0.379, over 4107030.12 frames. ], batch size: 65, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 17:06:05,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-09-16 17:06:20,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=474926.1666666667, ans=0.0 2024-09-16 17:06:23,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=474926.1666666667, ans=0.0 2024-09-16 17:06:34,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-16 17:06:44,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=474982.8333333333, ans=0.125 2024-09-16 17:07:05,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=475011.1666666667, ans=0.0 2024-09-16 17:07:16,706 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-09-16 17:07:18,897 INFO [train.py:1198] (0/2) Epoch 27, batch 1500, loss[loss=0.261, ctc_loss=0.1767, cr_loss=0.4215, over 20685.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.156, cr_loss=0.3801, over 4092098.27 frames. ], batch size: 68, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 17:07:30,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-16 17:07:35,605 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.125e+02 2.242e+02 2.361e+02 4.299e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-16 17:08:34,405 INFO [train.py:1198] (0/2) Epoch 27, batch 1550, loss[loss=0.2017, ctc_loss=0.1349, cr_loss=0.3341, over 21077.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.155, cr_loss=0.3785, over 4102627.55 frames. ], batch size: 53, lr: 3.09e-03, grad_scale: 16.0 2024-09-16 17:08:42,712 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.84 vs. limit=10.0 2024-09-16 17:09:10,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=475237.8333333333, ans=0.04949747468305833 2024-09-16 17:09:16,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=475237.8333333333, ans=0.125 2024-09-16 17:09:26,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=475266.1666666667, ans=0.0 2024-09-16 17:09:41,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=475294.5, ans=0.125 2024-09-16 17:09:49,886 INFO [train.py:1198] (0/2) Epoch 27, batch 1600, loss[loss=0.24, ctc_loss=0.1611, cr_loss=0.3945, over 21052.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1544, cr_loss=0.3774, over 4098448.73 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:10:06,382 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.111e+02 2.242e+02 2.482e+02 4.649e+02, threshold=4.483e+02, percent-clipped=1.0 2024-09-16 17:10:11,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=475351.1666666667, ans=0.125 2024-09-16 17:10:14,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=475351.1666666667, ans=0.0 2024-09-16 17:10:14,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=475351.1666666667, ans=0.025 2024-09-16 17:10:29,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=475379.5, ans=0.0 2024-09-16 17:10:32,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-09-16 17:11:05,819 INFO [train.py:1198] (0/2) Epoch 27, batch 1650, loss[loss=0.2565, ctc_loss=0.174, cr_loss=0.413, over 20966.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1541, cr_loss=0.3768, over 4106831.11 frames. ], batch size: 64, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:11:15,338 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:11:19,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=475492.8333333333, ans=0.0 2024-09-16 17:11:49,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=475549.5, ans=0.0 2024-09-16 17:12:09,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=475577.8333333333, ans=0.0 2024-09-16 17:12:23,743 INFO [train.py:1198] (0/2) Epoch 27, batch 1700, loss[loss=0.2141, ctc_loss=0.1403, cr_loss=0.3692, over 21039.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1536, cr_loss=0.3757, over 4112334.74 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:12:25,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2024-09-16 17:12:42,906 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.149e+02 2.268e+02 2.466e+02 3.529e+02, threshold=4.537e+02, percent-clipped=0.0 2024-09-16 17:12:43,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475634.5, ans=0.1 2024-09-16 17:12:52,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=475634.5, ans=0.07 2024-09-16 17:13:07,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=475662.8333333333, ans=0.1 2024-09-16 17:13:30,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=22.5 2024-09-16 17:13:31,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=475719.5, ans=0.07 2024-09-16 17:13:41,854 INFO [train.py:1198] (0/2) Epoch 27, batch 1750, loss[loss=0.2188, ctc_loss=0.1472, cr_loss=0.358, over 20876.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1532, cr_loss=0.3754, over 4119604.56 frames. ], batch size: 54, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:13:59,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=475776.1666666667, ans=0.0 2024-09-16 17:14:03,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475776.1666666667, ans=0.1 2024-09-16 17:14:24,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=475804.5, ans=0.0 2024-09-16 17:14:44,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2024-09-16 17:14:57,511 INFO [train.py:1198] (0/2) Epoch 27, batch 1800, loss[loss=0.2891, ctc_loss=0.2044, cr_loss=0.4234, over 14334.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1541, cr_loss=0.3767, over 4103356.17 frames. ], batch size: 149, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:15:02,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=475889.5, ans=0.125 2024-09-16 17:15:03,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=475889.5, ans=0.0 2024-09-16 17:15:06,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=475889.5, ans=0.125 2024-09-16 17:15:08,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=475889.5, ans=0.125 2024-09-16 17:15:14,085 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.149e+02 2.275e+02 2.481e+02 4.210e+02, threshold=4.550e+02, percent-clipped=0.0 2024-09-16 17:15:15,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=475917.8333333333, ans=0.125 2024-09-16 17:15:16,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2024-09-16 17:15:17,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=475917.8333333333, ans=0.125 2024-09-16 17:15:37,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=475946.1666666667, ans=0.125 2024-09-16 17:15:55,217 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-168000.pt 2024-09-16 17:16:13,850 INFO [train.py:1198] (0/2) Epoch 27, batch 1850, loss[loss=0.2432, ctc_loss=0.1643, cr_loss=0.3945, over 20649.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1536, cr_loss=0.3757, over 4094919.73 frames. ], batch size: 68, lr: 3.09e-03, grad_scale: 32.0 2024-09-16 17:16:24,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=476031.1666666667, ans=0.2 2024-09-16 17:16:33,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=476059.5, ans=0.0 2024-09-16 17:16:35,813 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=22.5 2024-09-16 17:16:47,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=476087.8333333333, ans=0.125 2024-09-16 17:16:52,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=476087.8333333333, ans=0.2 2024-09-16 17:17:04,520 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:17:30,053 INFO [train.py:1198] (0/2) Epoch 27, batch 1900, loss[loss=0.2879, ctc_loss=0.2056, cr_loss=0.4115, over 14847.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1531, cr_loss=0.3746, over 4082424.79 frames. ], batch size: 149, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:17:50,864 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.122e+02 2.285e+02 2.570e+02 3.510e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-16 17:18:11,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2024-09-16 17:18:27,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=476257.8333333333, ans=0.125 2024-09-16 17:18:29,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2024-09-16 17:18:35,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=22.5 2024-09-16 17:18:48,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=476286.1666666667, ans=0.04949747468305833 2024-09-16 17:18:50,778 INFO [train.py:1198] (0/2) Epoch 27, batch 1950, loss[loss=0.1868, ctc_loss=0.1199, cr_loss=0.3348, over 19898.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1546, cr_loss=0.3779, over 4069526.67 frames. ], batch size: 44, lr: 3.08e-03, grad_scale: 16.0 2024-09-16 17:19:10,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=476342.8333333333, ans=0.0 2024-09-16 17:19:25,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=476371.1666666667, ans=0.2 2024-09-16 17:20:06,237 INFO [train.py:1198] (0/2) Epoch 27, batch 2000, loss[loss=0.199, ctc_loss=0.1315, cr_loss=0.3373, over 19842.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.155, cr_loss=0.3787, over 4083000.13 frames. ], batch size: 44, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:20:24,298 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.168e+02 2.267e+02 2.349e+02 3.904e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-16 17:20:38,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=476512.8333333333, ans=0.0 2024-09-16 17:20:50,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476541.1666666667, ans=0.1 2024-09-16 17:20:58,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=476541.1666666667, ans=0.125 2024-09-16 17:20:58,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=15.0 2024-09-16 17:21:21,806 INFO [train.py:1198] (0/2) Epoch 27, batch 2050, loss[loss=0.2463, ctc_loss=0.168, cr_loss=0.3917, over 21047.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1548, cr_loss=0.3786, over 4098400.22 frames. ], batch size: 62, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:21:23,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476597.8333333333, ans=0.1 2024-09-16 17:21:35,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=476626.1666666667, ans=0.125 2024-09-16 17:22:04,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476654.5, ans=0.1 2024-09-16 17:22:16,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=476682.8333333333, ans=0.0 2024-09-16 17:22:19,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=476682.8333333333, ans=0.0 2024-09-16 17:22:37,525 INFO [train.py:1198] (0/2) Epoch 27, batch 2100, loss[loss=0.2451, ctc_loss=0.1669, cr_loss=0.3909, over 20641.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1546, cr_loss=0.3777, over 4088411.39 frames. ], batch size: 68, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:22:46,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=476739.5, ans=10.0 2024-09-16 17:22:51,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=476767.8333333333, ans=0.125 2024-09-16 17:22:55,340 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.174e+02 2.342e+02 2.611e+02 5.265e+02, threshold=4.684e+02, percent-clipped=1.0 2024-09-16 17:23:11,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=476796.1666666667, ans=0.0 2024-09-16 17:23:35,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=476824.5, ans=0.125 2024-09-16 17:23:45,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=476852.8333333333, ans=0.125 2024-09-16 17:23:56,174 INFO [train.py:1198] (0/2) Epoch 27, batch 2150, loss[loss=0.2471, ctc_loss=0.1676, cr_loss=0.3977, over 20940.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1551, cr_loss=0.3781, over 4083747.44 frames. ], batch size: 60, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:24:23,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=476909.5, ans=0.2 2024-09-16 17:24:43,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=476966.1666666667, ans=0.125 2024-09-16 17:24:44,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476966.1666666667, ans=0.1 2024-09-16 17:24:59,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=476994.5, ans=0.125 2024-09-16 17:25:14,293 INFO [train.py:1198] (0/2) Epoch 27, batch 2200, loss[loss=0.2027, ctc_loss=0.1335, cr_loss=0.3456, over 21002.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1552, cr_loss=0.3776, over 4077368.77 frames. ], batch size: 48, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:25:28,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=477051.1666666667, ans=0.125 2024-09-16 17:25:32,282 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.158e+02 2.310e+02 2.590e+02 4.056e+02, threshold=4.620e+02, percent-clipped=0.0 2024-09-16 17:25:44,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=477079.5, ans=0.125 2024-09-16 17:26:01,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=477107.8333333333, ans=0.0 2024-09-16 17:26:30,179 INFO [train.py:1198] (0/2) Epoch 27, batch 2250, loss[loss=0.1727, ctc_loss=0.1119, cr_loss=0.3042, over 20924.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.154, cr_loss=0.3756, over 4090586.18 frames. ], batch size: 49, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:27:08,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-16 17:27:27,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=477249.5, ans=0.125 2024-09-16 17:27:45,135 INFO [train.py:1198] (0/2) Epoch 27, batch 2300, loss[loss=0.2053, ctc_loss=0.1343, cr_loss=0.3552, over 21052.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1531, cr_loss=0.3742, over 4088805.71 frames. ], batch size: 56, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:27:53,190 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:28:03,699 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.086e+02 2.228e+02 2.361e+02 3.205e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-16 17:28:08,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=477334.5, ans=0.05 2024-09-16 17:28:09,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2024-09-16 17:28:22,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-09-16 17:28:40,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=22.5 2024-09-16 17:29:01,272 INFO [train.py:1198] (0/2) Epoch 27, batch 2350, loss[loss=0.2454, ctc_loss=0.1658, cr_loss=0.3983, over 19287.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1531, cr_loss=0.3743, over 4090192.12 frames. ], batch size: 90, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:29:16,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=477447.8333333333, ans=0.0 2024-09-16 17:29:32,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-16 17:29:55,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=477532.8333333333, ans=0.1 2024-09-16 17:30:08,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.35 vs. limit=6.0 2024-09-16 17:30:23,469 INFO [train.py:1198] (0/2) Epoch 27, batch 2400, loss[loss=0.219, ctc_loss=0.1455, cr_loss=0.3674, over 21052.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1539, cr_loss=0.3758, over 4096185.08 frames. ], batch size: 56, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:30:38,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=477617.8333333333, ans=0.125 2024-09-16 17:30:41,618 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.123e+02 2.202e+02 2.335e+02 3.360e+02, threshold=4.404e+02, percent-clipped=0.0 2024-09-16 17:30:44,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=477617.8333333333, ans=0.1 2024-09-16 17:31:19,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=477674.5, ans=0.125 2024-09-16 17:31:39,023 INFO [train.py:1198] (0/2) Epoch 27, batch 2450, loss[loss=0.2416, ctc_loss=0.1614, cr_loss=0.4008, over 20842.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1535, cr_loss=0.3754, over 4091826.23 frames. ], batch size: 59, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:32:54,496 INFO [train.py:1198] (0/2) Epoch 27, batch 2500, loss[loss=0.2222, ctc_loss=0.1483, cr_loss=0.3697, over 20782.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.154, cr_loss=0.3766, over 4092707.73 frames. ], batch size: 53, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:32:57,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=477872.8333333333, ans=0.1 2024-09-16 17:33:02,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=477872.8333333333, ans=0.0 2024-09-16 17:33:08,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=477901.1666666667, ans=0.125 2024-09-16 17:33:12,433 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.125e+02 2.282e+02 2.490e+02 4.054e+02, threshold=4.564e+02, percent-clipped=0.0 2024-09-16 17:33:14,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=477901.1666666667, ans=0.0 2024-09-16 17:33:48,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=477957.8333333333, ans=0.0 2024-09-16 17:34:09,026 INFO [train.py:1198] (0/2) Epoch 27, batch 2550, loss[loss=0.2521, ctc_loss=0.1723, cr_loss=0.3987, over 20304.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1541, cr_loss=0.3771, over 4095096.19 frames. ], batch size: 74, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:35:17,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=478127.8333333333, ans=0.0 2024-09-16 17:35:26,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=478156.1666666667, ans=0.125 2024-09-16 17:35:27,458 INFO [train.py:1198] (0/2) Epoch 27, batch 2600, loss[loss=0.2163, ctc_loss=0.1439, cr_loss=0.3619, over 20784.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1536, cr_loss=0.3765, over 4107118.57 frames. ], batch size: 53, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:35:48,423 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.148e+02 2.229e+02 2.374e+02 2.756e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-16 17:35:59,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=478212.8333333333, ans=0.125 2024-09-16 17:36:12,871 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:36:45,979 INFO [train.py:1198] (0/2) Epoch 27, batch 2650, loss[loss=0.2584, ctc_loss=0.1742, cr_loss=0.4211, over 21006.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1538, cr_loss=0.377, over 4102590.25 frames. ], batch size: 61, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:36:51,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=22.5 2024-09-16 17:36:55,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=478297.8333333333, ans=0.025 2024-09-16 17:37:22,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=478354.5, ans=0.2 2024-09-16 17:37:38,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=478382.8333333333, ans=0.125 2024-09-16 17:38:01,803 INFO [train.py:1198] (0/2) Epoch 27, batch 2700, loss[loss=0.1958, ctc_loss=0.1267, cr_loss=0.3454, over 20963.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1528, cr_loss=0.3755, over 4108648.66 frames. ], batch size: 50, lr: 3.08e-03, grad_scale: 16.0 2024-09-16 17:38:21,563 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.120e+02 2.240e+02 2.366e+02 5.113e+02, threshold=4.479e+02, percent-clipped=1.0 2024-09-16 17:38:46,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=478524.5, ans=0.0 2024-09-16 17:38:48,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=478524.5, ans=0.0 2024-09-16 17:38:54,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=478524.5, ans=0.09899494936611666 2024-09-16 17:38:57,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=478524.5, ans=15.0 2024-09-16 17:39:15,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=478552.8333333333, ans=0.125 2024-09-16 17:39:18,135 INFO [train.py:1198] (0/2) Epoch 27, batch 2750, loss[loss=0.2626, ctc_loss=0.1821, cr_loss=0.4024, over 18501.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1527, cr_loss=0.3753, over 4104757.95 frames. ], batch size: 108, lr: 3.08e-03, grad_scale: 16.0 2024-09-16 17:39:26,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=478581.1666666667, ans=0.125 2024-09-16 17:39:35,354 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-09-16 17:40:14,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=478666.1666666667, ans=0.125 2024-09-16 17:40:28,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=478694.5, ans=0.1 2024-09-16 17:40:33,746 INFO [train.py:1198] (0/2) Epoch 27, batch 2800, loss[loss=0.1956, ctc_loss=0.128, cr_loss=0.3376, over 20987.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1515, cr_loss=0.3736, over 4113700.20 frames. ], batch size: 55, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:40:48,156 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.35 vs. limit=15.0 2024-09-16 17:40:56,145 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.107e+02 2.245e+02 2.397e+02 3.647e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-16 17:40:56,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=478751.1666666667, ans=0.0 2024-09-16 17:41:16,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=478779.5, ans=0.125 2024-09-16 17:41:45,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=478836.1666666667, ans=0.125 2024-09-16 17:41:54,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=478864.5, ans=0.125 2024-09-16 17:41:55,643 INFO [train.py:1198] (0/2) Epoch 27, batch 2850, loss[loss=0.2353, ctc_loss=0.1673, cr_loss=0.34, over 14929.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.152, cr_loss=0.3744, over 4109311.37 frames. ], batch size: 149, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:41:56,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-16 17:42:22,926 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:42:51,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=478949.5, ans=10.0 2024-09-16 17:43:10,902 INFO [train.py:1198] (0/2) Epoch 27, batch 2900, loss[loss=0.2252, ctc_loss=0.1507, cr_loss=0.3725, over 20770.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1525, cr_loss=0.3754, over 4101168.13 frames. ], batch size: 53, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:43:20,958 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2024-09-16 17:43:29,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.90 vs. limit=10.0 2024-09-16 17:43:30,649 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.088e+02 2.226e+02 2.373e+02 7.586e+02, threshold=4.452e+02, percent-clipped=1.0 2024-09-16 17:44:11,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=479119.5, ans=0.1 2024-09-16 17:44:26,290 INFO [train.py:1198] (0/2) Epoch 27, batch 2950, loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.374, over 21028.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1528, cr_loss=0.376, over 4092915.24 frames. ], batch size: 62, lr: 3.08e-03, grad_scale: 32.0 2024-09-16 17:44:29,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=479147.8333333333, ans=0.0 2024-09-16 17:44:52,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=479176.1666666667, ans=0.2 2024-09-16 17:45:28,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=479261.1666666667, ans=0.125 2024-09-16 17:45:33,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=479261.1666666667, ans=0.125 2024-09-16 17:45:41,963 INFO [train.py:1198] (0/2) Epoch 27, batch 3000, loss[loss=0.1847, ctc_loss=0.119, cr_loss=0.3286, over 20252.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1542, cr_loss=0.3783, over 4075858.21 frames. ], batch size: 45, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 17:45:41,964 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 17:46:01,845 INFO [train.py:1230] (0/2) Epoch 27, validation: loss=0.04167, ctc_loss=0.04167, cr_loss=1.171e-14, over 944034.00 frames. 2024-09-16 17:46:01,846 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 17:46:03,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=479289.5, ans=0.125 2024-09-16 17:46:24,817 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.138e+02 2.279e+02 2.473e+02 3.607e+02, threshold=4.558e+02, percent-clipped=0.0 2024-09-16 17:46:28,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=479317.8333333333, ans=0.125 2024-09-16 17:46:30,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=22.5 2024-09-16 17:46:32,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=479317.8333333333, ans=0.05 2024-09-16 17:46:38,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=479346.1666666667, ans=0.125 2024-09-16 17:47:09,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=479402.8333333333, ans=0.0 2024-09-16 17:47:23,530 INFO [train.py:1198] (0/2) Epoch 27, batch 3050, loss[loss=0.2335, ctc_loss=0.157, cr_loss=0.3825, over 21021.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1539, cr_loss=0.3781, over 4093903.65 frames. ], batch size: 63, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 17:47:38,176 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.84 vs. limit=22.5 2024-09-16 17:47:45,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-16 17:47:57,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=22.5 2024-09-16 17:48:40,124 INFO [train.py:1198] (0/2) Epoch 27, batch 3100, loss[loss=0.2236, ctc_loss=0.1508, cr_loss=0.3643, over 21085.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1539, cr_loss=0.3778, over 4081972.80 frames. ], batch size: 59, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 17:48:52,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=479572.8333333333, ans=0.125 2024-09-16 17:48:53,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=479601.1666666667, ans=0.0 2024-09-16 17:48:59,641 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.763e+02 2.123e+02 2.334e+02 2.478e+02 3.629e+02, threshold=4.669e+02, percent-clipped=0.0 2024-09-16 17:49:56,109 INFO [train.py:1198] (0/2) Epoch 27, batch 3150, loss[loss=0.1982, ctc_loss=0.1311, cr_loss=0.3355, over 20960.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1534, cr_loss=0.3764, over 4082545.44 frames. ], batch size: 52, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 17:50:20,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.97 vs. limit=8.0 2024-09-16 17:50:26,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=479771.1666666667, ans=0.125 2024-09-16 17:51:01,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=479827.8333333333, ans=0.125 2024-09-16 17:51:01,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=479827.8333333333, ans=0.125 2024-09-16 17:51:11,917 INFO [train.py:1198] (0/2) Epoch 27, batch 3200, loss[loss=0.252, ctc_loss=0.1731, cr_loss=0.3944, over 20694.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.154, cr_loss=0.3771, over 4082080.32 frames. ], batch size: 71, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 17:51:12,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=479856.1666666667, ans=0.0 2024-09-16 17:51:13,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=479856.1666666667, ans=0.0 2024-09-16 17:51:30,542 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.26 vs. limit=6.0 2024-09-16 17:51:31,342 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.156e+02 2.280e+02 2.490e+02 4.987e+02, threshold=4.560e+02, percent-clipped=1.0 2024-09-16 17:52:32,450 INFO [train.py:1198] (0/2) Epoch 27, batch 3250, loss[loss=0.202, ctc_loss=0.1338, cr_loss=0.3407, over 21005.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.153, cr_loss=0.3757, over 4096544.07 frames. ], batch size: 52, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:52:46,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=480026.1666666667, ans=0.125 2024-09-16 17:53:08,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=480054.5, ans=0.125 2024-09-16 17:53:11,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=480054.5, ans=15.0 2024-09-16 17:53:21,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=480082.8333333333, ans=0.125 2024-09-16 17:53:30,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=480082.8333333333, ans=0.07 2024-09-16 17:53:47,832 INFO [train.py:1198] (0/2) Epoch 27, batch 3300, loss[loss=0.2578, ctc_loss=0.1795, cr_loss=0.3913, over 18393.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1538, cr_loss=0.3775, over 4102602.09 frames. ], batch size: 108, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:53:55,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=480139.5, ans=0.125 2024-09-16 17:53:57,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-09-16 17:54:08,817 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.111e+02 2.240e+02 2.473e+02 4.220e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-16 17:54:09,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=480167.8333333333, ans=0.0 2024-09-16 17:54:15,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=22.5 2024-09-16 17:54:24,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=480196.1666666667, ans=0.125 2024-09-16 17:54:39,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=480224.5, ans=0.125 2024-09-16 17:55:03,723 INFO [train.py:1198] (0/2) Epoch 27, batch 3350, loss[loss=0.2179, ctc_loss=0.1466, cr_loss=0.3561, over 20888.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1536, cr_loss=0.3769, over 4096944.81 frames. ], batch size: 54, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:55:10,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.95 vs. limit=10.0 2024-09-16 17:55:10,743 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-09-16 17:55:31,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=480309.5, ans=0.04949747468305833 2024-09-16 17:55:34,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=480337.8333333333, ans=0.125 2024-09-16 17:55:37,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=22.5 2024-09-16 17:55:50,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=22.5 2024-09-16 17:56:08,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=480394.5, ans=0.125 2024-09-16 17:56:18,539 INFO [train.py:1198] (0/2) Epoch 27, batch 3400, loss[loss=0.2121, ctc_loss=0.1425, cr_loss=0.3478, over 20971.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1538, cr_loss=0.3772, over 4107190.25 frames. ], batch size: 55, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:56:32,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=480451.1666666667, ans=0.0 2024-09-16 17:56:39,588 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.130e+02 2.268e+02 2.467e+02 3.043e+02, threshold=4.536e+02, percent-clipped=0.0 2024-09-16 17:57:30,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-16 17:57:33,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=480564.5, ans=0.025 2024-09-16 17:57:34,159 INFO [train.py:1198] (0/2) Epoch 27, batch 3450, loss[loss=0.1956, ctc_loss=0.1289, cr_loss=0.3334, over 21057.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1525, cr_loss=0.3752, over 4109262.01 frames. ], batch size: 53, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:58:11,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=480621.1666666667, ans=0.125 2024-09-16 17:58:31,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=480649.5, ans=0.125 2024-09-16 17:58:48,681 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-16 17:58:55,440 INFO [train.py:1198] (0/2) Epoch 27, batch 3500, loss[loss=0.2343, ctc_loss=0.1571, cr_loss=0.386, over 20785.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1532, cr_loss=0.376, over 4094981.49 frames. ], batch size: 53, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 17:59:03,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-09-16 17:59:16,365 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.133e+02 2.252e+02 2.402e+02 4.500e+02, threshold=4.504e+02, percent-clipped=0.0 2024-09-16 17:59:47,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-16 17:59:53,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2024-09-16 18:00:10,904 INFO [train.py:1198] (0/2) Epoch 27, batch 3550, loss[loss=0.2389, ctc_loss=0.1597, cr_loss=0.3961, over 20846.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1533, cr_loss=0.3761, over 4077837.40 frames. ], batch size: 65, lr: 3.07e-03, grad_scale: 16.0 2024-09-16 18:00:26,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=480876.1666666667, ans=0.0 2024-09-16 18:00:52,661 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2024-09-16 18:01:27,053 INFO [train.py:1198] (0/2) Epoch 27, batch 3600, loss[loss=0.1917, ctc_loss=0.1251, cr_loss=0.3329, over 20931.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1532, cr_loss=0.3763, over 4082828.92 frames. ], batch size: 49, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:01:28,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=480989.5, ans=0.125 2024-09-16 18:01:42,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=481017.8333333333, ans=0.125 2024-09-16 18:01:42,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=481017.8333333333, ans=0.0 2024-09-16 18:01:48,300 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.089e+02 2.227e+02 2.380e+02 4.273e+02, threshold=4.455e+02, percent-clipped=0.0 2024-09-16 18:01:59,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481046.1666666667, ans=0.1 2024-09-16 18:02:05,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=481046.1666666667, ans=0.0 2024-09-16 18:02:25,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481074.5, ans=0.1 2024-09-16 18:02:29,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=481102.8333333333, ans=0.125 2024-09-16 18:02:42,657 INFO [train.py:1198] (0/2) Epoch 27, batch 3650, loss[loss=0.2503, ctc_loss=0.1682, cr_loss=0.4109, over 20943.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1531, cr_loss=0.3764, over 4089235.43 frames. ], batch size: 64, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:02:47,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=481131.1666666667, ans=0.0 2024-09-16 18:02:56,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=481159.5, ans=0.0 2024-09-16 18:03:09,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=22.5 2024-09-16 18:03:31,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=481216.1666666667, ans=0.125 2024-09-16 18:03:36,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=481216.1666666667, ans=0.0 2024-09-16 18:03:45,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=481244.5, ans=0.125 2024-09-16 18:03:56,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481244.5, ans=0.1 2024-09-16 18:03:59,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=481244.5, ans=0.125 2024-09-16 18:04:05,029 INFO [train.py:1198] (0/2) Epoch 27, batch 3700, loss[loss=0.2176, ctc_loss=0.1447, cr_loss=0.3649, over 21016.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1527, cr_loss=0.3754, over 4091038.33 frames. ], batch size: 55, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:04:21,043 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-09-16 18:04:26,276 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.106e+02 2.281e+02 2.483e+02 8.071e+02, threshold=4.561e+02, percent-clipped=1.0 2024-09-16 18:04:29,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=481301.1666666667, ans=0.0 2024-09-16 18:04:35,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=481329.5, ans=0.125 2024-09-16 18:04:59,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.58 vs. limit=10.0 2024-09-16 18:05:20,557 INFO [train.py:1198] (0/2) Epoch 27, batch 3750, loss[loss=0.2328, ctc_loss=0.1529, cr_loss=0.3995, over 20646.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1525, cr_loss=0.3757, over 4092372.34 frames. ], batch size: 68, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:05:33,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=481414.5, ans=0.125 2024-09-16 18:05:36,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=481442.8333333333, ans=0.0 2024-09-16 18:06:07,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=481499.5, ans=0.0 2024-09-16 18:06:10,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=481499.5, ans=0.0 2024-09-16 18:06:12,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=481499.5, ans=0.025 2024-09-16 18:06:25,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=15.0 2024-09-16 18:06:31,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=481527.8333333333, ans=0.02 2024-09-16 18:06:36,104 INFO [train.py:1198] (0/2) Epoch 27, batch 3800, loss[loss=0.2336, ctc_loss=0.1558, cr_loss=0.3891, over 20895.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1513, cr_loss=0.3735, over 4095264.83 frames. ], batch size: 54, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:06:51,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=481584.5, ans=0.125 2024-09-16 18:06:57,075 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.125e+02 2.236e+02 2.403e+02 2.968e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-16 18:07:09,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=481612.8333333333, ans=0.125 2024-09-16 18:07:14,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=481612.8333333333, ans=0.125 2024-09-16 18:07:21,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=12.0 2024-09-16 18:07:28,153 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2024-09-16 18:07:52,057 INFO [train.py:1198] (0/2) Epoch 27, batch 3850, loss[loss=0.2233, ctc_loss=0.1465, cr_loss=0.3838, over 20282.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1515, cr_loss=0.3731, over 4091592.51 frames. ], batch size: 45, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:07:55,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=481697.8333333333, ans=0.0 2024-09-16 18:08:21,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=481754.5, ans=0.125 2024-09-16 18:08:30,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=481754.5, ans=0.0 2024-09-16 18:08:33,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=481754.5, ans=0.0 2024-09-16 18:08:39,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=481782.8333333333, ans=10.0 2024-09-16 18:09:06,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=481839.5, ans=0.125 2024-09-16 18:09:07,949 INFO [train.py:1198] (0/2) Epoch 27, batch 3900, loss[loss=0.2378, ctc_loss=0.1576, cr_loss=0.4008, over 20829.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1511, cr_loss=0.3725, over 4100623.27 frames. ], batch size: 59, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:09:31,914 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.667e+02 2.176e+02 2.309e+02 2.481e+02 3.316e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-16 18:09:43,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=481896.1666666667, ans=0.125 2024-09-16 18:10:29,180 INFO [train.py:1198] (0/2) Epoch 27, batch 3950, loss[loss=0.2314, ctc_loss=0.1568, cr_loss=0.3731, over 21037.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1516, cr_loss=0.3734, over 4095230.76 frames. ], batch size: 56, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:10:35,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481981.1666666667, ans=0.1 2024-09-16 18:10:52,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-09-16 18:11:00,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=482037.8333333333, ans=0.025 2024-09-16 18:11:04,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482037.8333333333, ans=0.1 2024-09-16 18:11:23,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=482066.1666666667, ans=0.0 2024-09-16 18:11:28,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=482094.5, ans=0.0 2024-09-16 18:11:31,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=482094.5, ans=0.1 2024-09-16 18:11:45,301 INFO [train.py:1198] (0/2) Epoch 27, batch 4000, loss[loss=0.2017, ctc_loss=0.1341, cr_loss=0.3382, over 21068.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1524, cr_loss=0.3749, over 4093206.30 frames. ], batch size: 53, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:12:06,118 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.138e+02 2.269e+02 2.399e+02 2.868e+02, threshold=4.538e+02, percent-clipped=0.0 2024-09-16 18:12:17,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=15.0 2024-09-16 18:12:29,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=482207.8333333333, ans=0.125 2024-09-16 18:12:40,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-16 18:13:00,950 INFO [train.py:1198] (0/2) Epoch 27, batch 4050, loss[loss=0.2125, ctc_loss=0.143, cr_loss=0.3477, over 19910.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1529, cr_loss=0.3761, over 4105704.14 frames. ], batch size: 44, lr: 3.07e-03, grad_scale: 32.0 2024-09-16 18:13:07,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482264.5, ans=0.1 2024-09-16 18:14:16,189 INFO [train.py:1198] (0/2) Epoch 27, batch 4100, loss[loss=0.2311, ctc_loss=0.1519, cr_loss=0.3958, over 21010.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1522, cr_loss=0.3752, over 4109419.00 frames. ], batch size: 52, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:14:27,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=482406.1666666667, ans=0.125 2024-09-16 18:14:29,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=482434.5, ans=0.125 2024-09-16 18:14:38,644 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.110e+02 2.285e+02 2.392e+02 3.456e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-16 18:15:00,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482491.1666666667, ans=0.1 2024-09-16 18:15:09,144 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:15:14,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=482491.1666666667, ans=0.125 2024-09-16 18:15:14,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=482491.1666666667, ans=0.5 2024-09-16 18:15:37,255 INFO [train.py:1198] (0/2) Epoch 27, batch 4150, loss[loss=0.2343, ctc_loss=0.157, cr_loss=0.3864, over 20770.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1539, cr_loss=0.3775, over 4105425.42 frames. ], batch size: 56, lr: 3.06e-03, grad_scale: 16.0 2024-09-16 18:16:04,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=482576.1666666667, ans=0.025 2024-09-16 18:16:24,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=482632.8333333333, ans=0.125 2024-09-16 18:16:49,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=482661.1666666667, ans=0.0 2024-09-16 18:16:50,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=482661.1666666667, ans=0.125 2024-09-16 18:16:53,634 INFO [train.py:1198] (0/2) Epoch 27, batch 4200, loss[loss=0.2175, ctc_loss=0.1424, cr_loss=0.3759, over 21019.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1541, cr_loss=0.3785, over 4110078.97 frames. ], batch size: 61, lr: 3.06e-03, grad_scale: 16.0 2024-09-16 18:17:16,710 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.112e+02 2.267e+02 2.413e+02 2.899e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-16 18:17:21,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=482717.8333333333, ans=0.125 2024-09-16 18:18:05,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=482802.8333333333, ans=0.125 2024-09-16 18:18:09,739 INFO [train.py:1198] (0/2) Epoch 27, batch 4250, loss[loss=0.2458, ctc_loss=0.1658, cr_loss=0.3999, over 20626.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1546, cr_loss=0.3792, over 4101983.73 frames. ], batch size: 68, lr: 3.06e-03, grad_scale: 16.0 2024-09-16 18:18:17,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=482831.1666666667, ans=0.125 2024-09-16 18:18:52,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=482887.8333333333, ans=0.125 2024-09-16 18:18:55,360 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.61 vs. limit=6.0 2024-09-16 18:18:56,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=482916.1666666667, ans=0.025 2024-09-16 18:19:00,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=12.0 2024-09-16 18:19:16,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=482944.5, ans=0.2 2024-09-16 18:19:25,107 INFO [train.py:1198] (0/2) Epoch 27, batch 4300, loss[loss=0.1808, ctc_loss=0.1169, cr_loss=0.3195, over 20300.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1542, cr_loss=0.3778, over 4099868.23 frames. ], batch size: 45, lr: 3.06e-03, grad_scale: 16.0 2024-09-16 18:19:47,753 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.156e+02 2.311e+02 2.523e+02 4.886e+02, threshold=4.622e+02, percent-clipped=1.0 2024-09-16 18:19:55,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=483029.5, ans=0.2 2024-09-16 18:20:02,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.92 vs. limit=10.0 2024-09-16 18:20:14,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=483057.8333333333, ans=0.0 2024-09-16 18:20:40,960 INFO [train.py:1198] (0/2) Epoch 27, batch 4350, loss[loss=0.2399, ctc_loss=0.1611, cr_loss=0.3943, over 20296.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1543, cr_loss=0.3785, over 4099803.78 frames. ], batch size: 74, lr: 3.06e-03, grad_scale: 16.0 2024-09-16 18:20:44,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=483114.5, ans=0.5 2024-09-16 18:20:53,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=483114.5, ans=0.0 2024-09-16 18:20:54,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=483114.5, ans=0.0 2024-09-16 18:21:19,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2024-09-16 18:21:41,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=483199.5, ans=0.04949747468305833 2024-09-16 18:21:42,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=483199.5, ans=0.125 2024-09-16 18:21:51,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=483227.8333333333, ans=0.2 2024-09-16 18:22:02,135 INFO [train.py:1198] (0/2) Epoch 27, batch 4400, loss[loss=0.2311, ctc_loss=0.1571, cr_loss=0.37, over 21008.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1532, cr_loss=0.3769, over 4114043.19 frames. ], batch size: 61, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:22:07,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483256.1666666667, ans=0.1 2024-09-16 18:22:15,091 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=15.0 2024-09-16 18:22:24,727 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.722e+02 2.129e+02 2.248e+02 2.540e+02 4.104e+02, threshold=4.496e+02, percent-clipped=0.0 2024-09-16 18:22:59,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=483341.1666666667, ans=0.0 2024-09-16 18:23:07,996 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=12.0 2024-09-16 18:23:17,565 INFO [train.py:1198] (0/2) Epoch 27, batch 4450, loss[loss=0.2215, ctc_loss=0.1486, cr_loss=0.3645, over 20789.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1529, cr_loss=0.3765, over 4097531.91 frames. ], batch size: 56, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:23:22,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=483397.8333333333, ans=0.2 2024-09-16 18:23:25,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483397.8333333333, ans=0.1 2024-09-16 18:23:30,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=483397.8333333333, ans=0.1 2024-09-16 18:23:34,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483426.1666666667, ans=0.1 2024-09-16 18:23:38,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=483426.1666666667, ans=0.125 2024-09-16 18:23:49,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=483454.5, ans=0.125 2024-09-16 18:24:33,682 INFO [train.py:1198] (0/2) Epoch 27, batch 4500, loss[loss=0.2252, ctc_loss=0.149, cr_loss=0.381, over 20827.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3761, over 4103074.75 frames. ], batch size: 59, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:24:34,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2024-09-16 18:24:54,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=22.5 2024-09-16 18:24:56,488 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.085e+02 2.250e+02 2.356e+02 2.921e+02, threshold=4.500e+02, percent-clipped=0.0 2024-09-16 18:24:59,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=483567.8333333333, ans=0.02 2024-09-16 18:25:01,735 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2024-09-16 18:25:40,838 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-16 18:25:49,125 INFO [train.py:1198] (0/2) Epoch 27, batch 4550, loss[loss=0.2496, ctc_loss=0.1698, cr_loss=0.3987, over 20659.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1529, cr_loss=0.3764, over 4105666.00 frames. ], batch size: 68, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:25:53,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=483681.1666666667, ans=0.07 2024-09-16 18:25:55,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=483681.1666666667, ans=0.125 2024-09-16 18:25:56,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=483681.1666666667, ans=0.04949747468305833 2024-09-16 18:26:04,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=483709.5, ans=0.2 2024-09-16 18:26:25,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483737.8333333333, ans=0.1 2024-09-16 18:27:09,738 INFO [train.py:1198] (0/2) Epoch 27, batch 4600, loss[loss=0.1851, ctc_loss=0.1203, cr_loss=0.3242, over 21046.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1534, cr_loss=0.3772, over 4104392.90 frames. ], batch size: 53, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:27:29,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=483851.1666666667, ans=0.0 2024-09-16 18:27:32,228 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.168e+02 2.267e+02 2.436e+02 3.097e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-16 18:28:05,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=483907.8333333333, ans=0.125 2024-09-16 18:28:18,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483936.1666666667, ans=0.1 2024-09-16 18:28:24,808 INFO [train.py:1198] (0/2) Epoch 27, batch 4650, loss[loss=0.1994, ctc_loss=0.1298, cr_loss=0.3481, over 20987.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1535, cr_loss=0.3772, over 4107951.60 frames. ], batch size: 52, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:29:19,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=484049.5, ans=0.125 2024-09-16 18:29:30,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=484077.8333333333, ans=0.125 2024-09-16 18:29:40,977 INFO [train.py:1198] (0/2) Epoch 27, batch 4700, loss[loss=0.1967, ctc_loss=0.1309, cr_loss=0.3291, over 20964.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1531, cr_loss=0.3754, over 4107222.04 frames. ], batch size: 51, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:30:03,539 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.157e+02 2.324e+02 2.567e+02 3.307e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-16 18:30:05,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=484134.5, ans=0.125 2024-09-16 18:30:24,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=484191.1666666667, ans=0.025 2024-09-16 18:30:36,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=484191.1666666667, ans=0.0 2024-09-16 18:30:42,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=484219.5, ans=0.05 2024-09-16 18:30:56,636 INFO [train.py:1198] (0/2) Epoch 27, batch 4750, loss[loss=0.1869, ctc_loss=0.1224, cr_loss=0.3226, over 20346.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1523, cr_loss=0.374, over 4110323.99 frames. ], batch size: 45, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:31:49,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=484332.8333333333, ans=0.2 2024-09-16 18:32:11,976 INFO [train.py:1198] (0/2) Epoch 27, batch 4800, loss[loss=0.2072, ctc_loss=0.1377, cr_loss=0.3473, over 21000.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1521, cr_loss=0.3742, over 4111749.64 frames. ], batch size: 52, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:32:30,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=15.0 2024-09-16 18:32:40,119 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.084e+02 2.249e+02 2.422e+02 7.487e+02, threshold=4.497e+02, percent-clipped=2.0 2024-09-16 18:32:44,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=484417.8333333333, ans=15.0 2024-09-16 18:33:00,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2024-09-16 18:33:33,382 INFO [train.py:1198] (0/2) Epoch 27, batch 4850, loss[loss=0.2209, ctc_loss=0.146, cr_loss=0.3743, over 21048.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1512, cr_loss=0.3736, over 4118799.32 frames. ], batch size: 53, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:33:35,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=484531.1666666667, ans=0.0 2024-09-16 18:33:40,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-16 18:34:00,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=484559.5, ans=0.95 2024-09-16 18:34:11,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=484587.8333333333, ans=0.2 2024-09-16 18:34:28,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=484616.1666666667, ans=0.0 2024-09-16 18:34:32,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=484644.5, ans=0.125 2024-09-16 18:34:46,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2024-09-16 18:34:49,183 INFO [train.py:1198] (0/2) Epoch 27, batch 4900, loss[loss=0.2349, ctc_loss=0.1565, cr_loss=0.3916, over 21027.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1514, cr_loss=0.3738, over 4117926.19 frames. ], batch size: 62, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:34:54,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=484672.8333333333, ans=0.125 2024-09-16 18:35:11,570 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.096e+02 2.248e+02 2.383e+02 4.576e+02, threshold=4.495e+02, percent-clipped=1.0 2024-09-16 18:35:19,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=484729.5, ans=0.0 2024-09-16 18:35:53,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=484786.1666666667, ans=0.125 2024-09-16 18:36:02,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=484814.5, ans=0.125 2024-09-16 18:36:03,196 INFO [train.py:1198] (0/2) Epoch 27, batch 4950, loss[loss=0.2418, ctc_loss=0.1611, cr_loss=0.4034, over 21000.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1524, cr_loss=0.3751, over 4117516.45 frames. ], batch size: 55, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:36:04,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=484814.5, ans=0.125 2024-09-16 18:36:24,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=484842.8333333333, ans=0.0 2024-09-16 18:36:31,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=484871.1666666667, ans=0.2 2024-09-16 18:37:14,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=484927.8333333333, ans=0.0 2024-09-16 18:37:17,567 INFO [train.py:1198] (0/2) Epoch 27, batch 5000, loss[loss=0.2227, ctc_loss=0.1498, cr_loss=0.3641, over 20780.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1522, cr_loss=0.374, over 4116447.20 frames. ], batch size: 53, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:37:27,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2024-09-16 18:37:39,892 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.098e+02 2.266e+02 2.407e+02 4.241e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-16 18:37:40,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=484984.5, ans=0.125 2024-09-16 18:37:43,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2024-09-16 18:38:31,951 INFO [train.py:1198] (0/2) Epoch 27, batch 5050, loss[loss=0.2145, ctc_loss=0.1415, cr_loss=0.3652, over 20976.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1523, cr_loss=0.3746, over 4116600.60 frames. ], batch size: 64, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:39:18,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=485182.8333333333, ans=0.125 2024-09-16 18:39:18,596 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-09-16 18:39:25,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=485182.8333333333, ans=0.0 2024-09-16 18:39:45,975 INFO [train.py:1198] (0/2) Epoch 27, batch 5100, loss[loss=0.2218, ctc_loss=0.1457, cr_loss=0.3803, over 20987.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.152, cr_loss=0.3741, over 4119049.89 frames. ], batch size: 55, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:39:55,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=485239.5, ans=0.0 2024-09-16 18:39:55,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=22.5 2024-09-16 18:40:05,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=485267.8333333333, ans=0.2 2024-09-16 18:40:08,339 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.101e+02 2.228e+02 2.381e+02 5.913e+02, threshold=4.457e+02, percent-clipped=1.0 2024-09-16 18:40:30,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2024-09-16 18:41:05,681 INFO [train.py:1198] (0/2) Epoch 27, batch 5150, loss[loss=0.2133, ctc_loss=0.1403, cr_loss=0.3647, over 20971.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.152, cr_loss=0.3735, over 4112827.36 frames. ], batch size: 52, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:41:16,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=485381.1666666667, ans=0.0 2024-09-16 18:41:21,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=485409.5, ans=22.5 2024-09-16 18:41:25,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=485409.5, ans=10.0 2024-09-16 18:41:40,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485437.8333333333, ans=0.1 2024-09-16 18:42:19,973 INFO [train.py:1198] (0/2) Epoch 27, batch 5200, loss[loss=0.2032, ctc_loss=0.1347, cr_loss=0.3423, over 21027.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1523, cr_loss=0.3744, over 4112847.98 frames. ], batch size: 55, lr: 3.06e-03, grad_scale: 32.0 2024-09-16 18:42:22,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-09-16 18:42:39,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=485551.1666666667, ans=0.1 2024-09-16 18:42:41,666 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.075e+02 2.213e+02 2.363e+02 2.743e+02, threshold=4.426e+02, percent-clipped=0.0 2024-09-16 18:42:49,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485579.5, ans=0.1 2024-09-16 18:42:51,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-16 18:43:03,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=485607.8333333333, ans=0.125 2024-09-16 18:43:11,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=485607.8333333333, ans=0.2 2024-09-16 18:43:34,237 INFO [train.py:1198] (0/2) Epoch 27, batch 5250, loss[loss=0.2871, ctc_loss=0.2088, cr_loss=0.3914, over 14654.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1521, cr_loss=0.374, over 4096620.76 frames. ], batch size: 149, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:43:52,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=485692.8333333333, ans=0.125 2024-09-16 18:43:52,598 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:44:12,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=485721.1666666667, ans=15.0 2024-09-16 18:44:25,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=22.5 2024-09-16 18:44:48,604 INFO [train.py:1198] (0/2) Epoch 27, batch 5300, loss[loss=0.2569, ctc_loss=0.1745, cr_loss=0.4118, over 20955.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1524, cr_loss=0.3746, over 4096572.91 frames. ], batch size: 64, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:45:10,860 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.129e+02 2.269e+02 2.416e+02 3.705e+02, threshold=4.537e+02, percent-clipped=0.0 2024-09-16 18:45:12,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=485834.5, ans=0.125 2024-09-16 18:45:56,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2024-09-16 18:45:59,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=485919.5, ans=0.125 2024-09-16 18:46:02,484 INFO [train.py:1198] (0/2) Epoch 27, batch 5350, loss[loss=0.2015, ctc_loss=0.1348, cr_loss=0.3335, over 20892.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1523, cr_loss=0.3743, over 4091440.99 frames. ], batch size: 54, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:46:25,218 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:46:29,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=485976.1666666667, ans=0.1 2024-09-16 18:46:47,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=486032.8333333333, ans=0.125 2024-09-16 18:46:56,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=486032.8333333333, ans=0.2 2024-09-16 18:46:59,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=486032.8333333333, ans=0.2 2024-09-16 18:47:17,258 INFO [train.py:1198] (0/2) Epoch 27, batch 5400, loss[loss=0.2221, ctc_loss=0.1461, cr_loss=0.3802, over 20796.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.152, cr_loss=0.3737, over 4096476.30 frames. ], batch size: 53, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:47:29,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-16 18:47:32,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=486117.8333333333, ans=0.125 2024-09-16 18:47:39,765 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.139e+02 2.231e+02 2.415e+02 3.061e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-16 18:47:50,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=486146.1666666667, ans=0.2 2024-09-16 18:48:24,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2024-09-16 18:48:31,478 INFO [train.py:1198] (0/2) Epoch 27, batch 5450, loss[loss=0.2627, ctc_loss=0.1752, cr_loss=0.4377, over 20392.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1524, cr_loss=0.3743, over 4105082.46 frames. ], batch size: 74, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:49:07,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=486287.8333333333, ans=0.125 2024-09-16 18:49:21,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=486316.1666666667, ans=0.2 2024-09-16 18:49:46,054 INFO [train.py:1198] (0/2) Epoch 27, batch 5500, loss[loss=0.205, ctc_loss=0.1374, cr_loss=0.3378, over 21086.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1531, cr_loss=0.3752, over 4092913.99 frames. ], batch size: 56, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:50:11,137 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.091e+02 2.230e+02 2.356e+02 4.028e+02, threshold=4.460e+02, percent-clipped=0.0 2024-09-16 18:50:11,944 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-09-16 18:50:31,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=486429.5, ans=0.2 2024-09-16 18:50:33,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=486429.5, ans=0.125 2024-09-16 18:50:40,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486457.8333333333, ans=0.1 2024-09-16 18:50:41,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=486457.8333333333, ans=0.0 2024-09-16 18:51:05,458 INFO [train.py:1198] (0/2) Epoch 27, batch 5550, loss[loss=0.2571, ctc_loss=0.1758, cr_loss=0.4066, over 20846.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1538, cr_loss=0.3767, over 4093153.06 frames. ], batch size: 59, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:51:47,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=486571.1666666667, ans=0.125 2024-09-16 18:51:52,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=12.0 2024-09-16 18:52:08,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=486627.8333333333, ans=0.0 2024-09-16 18:52:19,808 INFO [train.py:1198] (0/2) Epoch 27, batch 5600, loss[loss=0.2222, ctc_loss=0.149, cr_loss=0.3663, over 20900.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1533, cr_loss=0.3759, over 4095407.02 frames. ], batch size: 54, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:52:41,966 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.128e+02 2.236e+02 2.434e+02 4.346e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-16 18:52:51,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-09-16 18:53:01,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=486712.8333333333, ans=10.0 2024-09-16 18:53:10,969 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-16 18:53:19,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=486769.5, ans=0.125 2024-09-16 18:53:31,673 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:53:34,439 INFO [train.py:1198] (0/2) Epoch 27, batch 5650, loss[loss=0.2238, ctc_loss=0.1489, cr_loss=0.3744, over 20819.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1544, cr_loss=0.3768, over 4074233.88 frames. ], batch size: 59, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:53:39,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=486797.8333333333, ans=0.125 2024-09-16 18:53:39,440 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:53:39,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=486797.8333333333, ans=0.125 2024-09-16 18:53:46,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=486797.8333333333, ans=0.0 2024-09-16 18:53:49,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=486826.1666666667, ans=0.2 2024-09-16 18:54:15,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-16 18:54:22,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486882.8333333333, ans=0.1 2024-09-16 18:54:36,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=486911.1666666667, ans=0.0 2024-09-16 18:54:44,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=486911.1666666667, ans=0.1 2024-09-16 18:54:44,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=486911.1666666667, ans=0.125 2024-09-16 18:54:49,211 INFO [train.py:1198] (0/2) Epoch 27, batch 5700, loss[loss=0.2248, ctc_loss=0.1494, cr_loss=0.3769, over 21019.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1547, cr_loss=0.3784, over 4080887.03 frames. ], batch size: 63, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:55:08,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486967.8333333333, ans=0.1 2024-09-16 18:55:09,170 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:55:11,756 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.191e+02 2.338e+02 2.537e+02 5.088e+02, threshold=4.676e+02, percent-clipped=1.0 2024-09-16 18:55:13,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=486967.8333333333, ans=0.125 2024-09-16 18:55:37,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.21 vs. limit=15.0 2024-09-16 18:55:41,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=487024.5, ans=0.0 2024-09-16 18:55:52,528 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=12.0 2024-09-16 18:56:04,087 INFO [train.py:1198] (0/2) Epoch 27, batch 5750, loss[loss=0.2317, ctc_loss=0.1526, cr_loss=0.3956, over 21054.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1556, cr_loss=0.3795, over 4074126.75 frames. ], batch size: 56, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:56:45,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=487137.8333333333, ans=0.025 2024-09-16 18:56:53,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487166.1666666667, ans=0.1 2024-09-16 18:57:19,112 INFO [train.py:1198] (0/2) Epoch 27, batch 5800, loss[loss=0.2441, ctc_loss=0.164, cr_loss=0.4007, over 20958.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1544, cr_loss=0.3774, over 4084761.44 frames. ], batch size: 64, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:57:29,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=487222.8333333333, ans=0.125 2024-09-16 18:57:41,102 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.107e+02 2.242e+02 2.438e+02 6.579e+02, threshold=4.484e+02, percent-clipped=1.0 2024-09-16 18:57:57,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=487279.5, ans=0.2 2024-09-16 18:58:03,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=487307.8333333333, ans=0.0 2024-09-16 18:58:13,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=487307.8333333333, ans=0.125 2024-09-16 18:58:15,140 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-172000.pt 2024-09-16 18:58:33,848 INFO [train.py:1198] (0/2) Epoch 27, batch 5850, loss[loss=0.2547, ctc_loss=0.1705, cr_loss=0.4211, over 20869.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.154, cr_loss=0.3766, over 4072217.13 frames. ], batch size: 57, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:59:02,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=487392.8333333333, ans=0.025 2024-09-16 18:59:12,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=487421.1666666667, ans=0.015 2024-09-16 18:59:16,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=487421.1666666667, ans=0.125 2024-09-16 18:59:29,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487449.5, ans=0.1 2024-09-16 18:59:53,445 INFO [train.py:1198] (0/2) Epoch 27, batch 5900, loss[loss=0.2294, ctc_loss=0.1528, cr_loss=0.3834, over 21060.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1536, cr_loss=0.3757, over 4059468.63 frames. ], batch size: 62, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 18:59:58,686 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=15.0 2024-09-16 19:00:16,056 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.096e+02 2.262e+02 2.458e+02 4.394e+02, threshold=4.524e+02, percent-clipped=0.0 2024-09-16 19:00:29,934 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.74 vs. limit=6.0 2024-09-16 19:00:42,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=487591.1666666667, ans=0.2 2024-09-16 19:00:57,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2024-09-16 19:01:01,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=487619.5, ans=0.025 2024-09-16 19:01:01,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=487619.5, ans=0.2 2024-09-16 19:01:02,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-09-16 19:01:08,230 INFO [train.py:1198] (0/2) Epoch 27, batch 5950, loss[loss=0.2343, ctc_loss=0.1583, cr_loss=0.3799, over 21024.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1539, cr_loss=0.3764, over 4064087.37 frames. ], batch size: 62, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:01:19,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=487647.8333333333, ans=0.125 2024-09-16 19:01:25,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=487676.1666666667, ans=0.05 2024-09-16 19:01:33,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=487676.1666666667, ans=0.125 2024-09-16 19:01:40,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=487704.5, ans=0.0 2024-09-16 19:01:41,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=22.5 2024-09-16 19:02:22,683 INFO [train.py:1198] (0/2) Epoch 27, batch 6000, loss[loss=0.2469, ctc_loss=0.1672, cr_loss=0.3982, over 21018.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1543, cr_loss=0.3777, over 4080198.28 frames. ], batch size: 63, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:02:22,684 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 19:02:44,552 INFO [train.py:1230] (0/2) Epoch 27, validation: loss=0.04154, ctc_loss=0.04154, cr_loss=1.252e-14, over 944034.00 frames. 2024-09-16 19:02:44,552 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 19:03:07,050 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.126e+02 2.233e+02 2.350e+02 2.661e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-16 19:03:35,239 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=22.5 2024-09-16 19:03:59,395 INFO [train.py:1198] (0/2) Epoch 27, batch 6050, loss[loss=0.2034, ctc_loss=0.1349, cr_loss=0.3423, over 20878.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1549, cr_loss=0.3791, over 4071546.55 frames. ], batch size: 54, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:04:36,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=487987.8333333333, ans=0.125 2024-09-16 19:04:36,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=12.0 2024-09-16 19:05:11,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=488044.5, ans=0.125 2024-09-16 19:05:14,298 INFO [train.py:1198] (0/2) Epoch 27, batch 6100, loss[loss=0.2242, ctc_loss=0.1507, cr_loss=0.3673, over 21079.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1548, cr_loss=0.3786, over 4076859.15 frames. ], batch size: 59, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:05:31,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=488101.1666666667, ans=0.125 2024-09-16 19:05:38,037 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.139e+02 2.338e+02 2.495e+02 3.055e+02, threshold=4.677e+02, percent-clipped=0.0 2024-09-16 19:05:55,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2024-09-16 19:06:28,648 INFO [train.py:1198] (0/2) Epoch 27, batch 6150, loss[loss=0.186, ctc_loss=0.1212, cr_loss=0.3242, over 20941.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1543, cr_loss=0.378, over 4083740.49 frames. ], batch size: 48, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:06:31,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.52 vs. limit=15.0 2024-09-16 19:07:17,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=12.0 2024-09-16 19:07:22,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=488299.5, ans=0.09899494936611666 2024-09-16 19:07:36,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=488327.8333333333, ans=0.04949747468305833 2024-09-16 19:07:42,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=488356.1666666667, ans=0.125 2024-09-16 19:07:43,593 INFO [train.py:1198] (0/2) Epoch 27, batch 6200, loss[loss=0.2607, ctc_loss=0.1785, cr_loss=0.4109, over 19423.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1564, cr_loss=0.3799, over 4045367.91 frames. ], batch size: 90, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:08:06,833 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.164e+02 2.279e+02 2.489e+02 3.203e+02, threshold=4.557e+02, percent-clipped=0.0 2024-09-16 19:08:10,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2024-09-16 19:08:44,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-16 19:08:57,344 INFO [train.py:1198] (0/2) Epoch 27, batch 6250, loss[loss=0.2318, ctc_loss=0.1565, cr_loss=0.3764, over 20738.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1564, cr_loss=0.3787, over 4013308.65 frames. ], batch size: 71, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:09:18,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=488526.1666666667, ans=0.2 2024-09-16 19:09:22,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=488526.1666666667, ans=0.125 2024-09-16 19:09:29,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=488554.5, ans=0.125 2024-09-16 19:09:54,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=488611.1666666667, ans=0.2 2024-09-16 19:10:10,366 INFO [train.py:1198] (0/2) Epoch 27, batch 6300, loss[loss=0.2718, ctc_loss=0.1863, cr_loss=0.4275, over 18447.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1583, cr_loss=0.3813, over 3960852.72 frames. ], batch size: 108, lr: 3.05e-03, grad_scale: 32.0 2024-09-16 19:10:33,817 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.246e+02 2.457e+02 2.703e+02 3.753e+02, threshold=4.914e+02, percent-clipped=0.0 2024-09-16 19:11:09,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=488752.8333333333, ans=0.125 2024-09-16 19:11:22,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2024-09-16 19:11:23,013 INFO [train.py:1198] (0/2) Epoch 27, batch 6350, loss[loss=0.2339, ctc_loss=0.1668, cr_loss=0.3355, over 14332.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1626, cr_loss=0.3845, over 3815005.54 frames. ], batch size: 149, lr: 3.04e-03, grad_scale: 32.0 2024-09-16 19:11:34,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=488781.1666666667, ans=0.2 2024-09-16 19:11:35,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=22.5 2024-09-16 19:12:21,556 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-27.pt 2024-09-16 19:13:12,535 INFO [train.py:1198] (0/2) Epoch 28, batch 0, loss[loss=0.2108, ctc_loss=0.1395, cr_loss=0.3563, over 20948.00 frames. ], tot_loss[loss=0.2108, ctc_loss=0.1395, cr_loss=0.3563, over 20948.00 frames. ], batch size: 55, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:13:12,536 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 19:13:30,794 INFO [train.py:1230] (0/2) Epoch 28, validation: loss=0.04112, ctc_loss=0.04112, cr_loss=1.192e-14, over 944034.00 frames. 2024-09-16 19:13:30,794 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 19:14:07,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=488954.0, ans=0.125 2024-09-16 19:14:08,364 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.339e+02 2.616e+02 2.805e+02 4.235e+02, threshold=5.232e+02, percent-clipped=0.0 2024-09-16 19:14:32,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=489010.6666666667, ans=0.125 2024-09-16 19:14:46,359 INFO [train.py:1198] (0/2) Epoch 28, batch 50, loss[loss=0.2386, ctc_loss=0.1596, cr_loss=0.3946, over 21087.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1554, cr_loss=0.3808, over 916138.30 frames. ], batch size: 59, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:14:49,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=489039.0, ans=0.125 2024-09-16 19:15:19,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=489095.6666666667, ans=0.0 2024-09-16 19:15:19,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=489095.6666666667, ans=0.0 2024-09-16 19:15:24,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-16 19:15:37,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489124.0, ans=0.1 2024-09-16 19:15:57,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=489152.3333333333, ans=0.125 2024-09-16 19:15:57,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=489152.3333333333, ans=0.0 2024-09-16 19:15:57,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.26 vs. limit=6.0 2024-09-16 19:15:58,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=489152.3333333333, ans=0.125 2024-09-16 19:16:01,213 INFO [train.py:1198] (0/2) Epoch 28, batch 100, loss[loss=0.2463, ctc_loss=0.1669, cr_loss=0.3973, over 20650.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1563, cr_loss=0.3812, over 1610708.98 frames. ], batch size: 66, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:16:10,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=489180.6666666667, ans=0.0 2024-09-16 19:16:21,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=489209.0, ans=0.125 2024-09-16 19:16:38,665 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.049e+02 2.186e+02 2.346e+02 2.862e+02, threshold=4.372e+02, percent-clipped=0.0 2024-09-16 19:16:46,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=489265.6666666667, ans=0.2 2024-09-16 19:16:55,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=489265.6666666667, ans=0.0 2024-09-16 19:17:16,253 INFO [train.py:1198] (0/2) Epoch 28, batch 150, loss[loss=0.2364, ctc_loss=0.16, cr_loss=0.3821, over 19497.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.156, cr_loss=0.3797, over 2160988.63 frames. ], batch size: 90, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:17:20,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-16 19:18:13,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.20 vs. limit=22.5 2024-09-16 19:18:35,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=489435.6666666667, ans=0.025 2024-09-16 19:18:38,074 INFO [train.py:1198] (0/2) Epoch 28, batch 200, loss[loss=0.1916, ctc_loss=0.1283, cr_loss=0.3162, over 20972.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1543, cr_loss=0.3772, over 2595039.32 frames. ], batch size: 50, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:18:48,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=489464.0, ans=0.0 2024-09-16 19:19:04,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=489492.3333333333, ans=0.0 2024-09-16 19:19:15,604 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.113e+02 2.202e+02 2.382e+02 3.605e+02, threshold=4.405e+02, percent-clipped=0.0 2024-09-16 19:19:52,517 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-16 19:19:53,340 INFO [train.py:1198] (0/2) Epoch 28, batch 250, loss[loss=0.2275, ctc_loss=0.1526, cr_loss=0.3747, over 20945.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1543, cr_loss=0.3775, over 2928806.18 frames. ], batch size: 60, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:20:01,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=489605.6666666667, ans=0.125 2024-09-16 19:20:11,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=489634.0, ans=0.125 2024-09-16 19:20:20,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=489634.0, ans=0.1 2024-09-16 19:20:23,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=489662.3333333333, ans=0.125 2024-09-16 19:20:32,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=489662.3333333333, ans=0.125 2024-09-16 19:21:08,511 INFO [train.py:1198] (0/2) Epoch 28, batch 300, loss[loss=0.2302, ctc_loss=0.1539, cr_loss=0.3816, over 20883.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1537, cr_loss=0.3771, over 3172111.77 frames. ], batch size: 57, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:21:08,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=489747.3333333333, ans=0.0 2024-09-16 19:21:22,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=489775.6666666667, ans=0.0 2024-09-16 19:21:46,342 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.146e+02 2.308e+02 2.546e+02 4.169e+02, threshold=4.616e+02, percent-clipped=0.0 2024-09-16 19:22:24,322 INFO [train.py:1198] (0/2) Epoch 28, batch 350, loss[loss=0.2382, ctc_loss=0.1562, cr_loss=0.4097, over 20774.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1529, cr_loss=0.3762, over 3379672.74 frames. ], batch size: 53, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:22:36,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=489889.0, ans=0.0 2024-09-16 19:22:50,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=489917.3333333333, ans=0.125 2024-09-16 19:22:59,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=489945.6666666667, ans=0.125 2024-09-16 19:23:25,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490002.3333333333, ans=0.1 2024-09-16 19:23:40,125 INFO [train.py:1198] (0/2) Epoch 28, batch 400, loss[loss=0.2581, ctc_loss=0.1768, cr_loss=0.4069, over 19991.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1521, cr_loss=0.3755, over 3543889.15 frames. ], batch size: 80, lr: 2.99e-03, grad_scale: 32.0 2024-09-16 19:23:52,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=490030.6666666667, ans=0.125 2024-09-16 19:24:06,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=490059.0, ans=0.125 2024-09-16 19:24:22,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=490087.3333333333, ans=0.125 2024-09-16 19:24:24,053 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.058e+02 2.177e+02 2.307e+02 4.460e+02, threshold=4.355e+02, percent-clipped=0.0 2024-09-16 19:24:27,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=490087.3333333333, ans=0.09899494936611666 2024-09-16 19:25:01,599 INFO [train.py:1198] (0/2) Epoch 28, batch 450, loss[loss=0.1839, ctc_loss=0.1194, cr_loss=0.3224, over 20971.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1532, cr_loss=0.3768, over 3670570.59 frames. ], batch size: 48, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:25:07,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=490172.3333333333, ans=0.125 2024-09-16 19:25:15,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490200.6666666667, ans=0.1 2024-09-16 19:25:35,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=490229.0, ans=0.125 2024-09-16 19:25:36,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=490229.0, ans=0.0 2024-09-16 19:25:50,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=490257.3333333333, ans=0.125 2024-09-16 19:25:52,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=490257.3333333333, ans=0.125 2024-09-16 19:25:58,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=490257.3333333333, ans=0.0 2024-09-16 19:26:06,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2024-09-16 19:26:17,333 INFO [train.py:1198] (0/2) Epoch 28, batch 500, loss[loss=0.2059, ctc_loss=0.138, cr_loss=0.3398, over 20773.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1536, cr_loss=0.3767, over 3760004.07 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:26:54,825 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.113e+02 2.221e+02 2.416e+02 4.040e+02, threshold=4.442e+02, percent-clipped=0.0 2024-09-16 19:27:32,675 INFO [train.py:1198] (0/2) Epoch 28, batch 550, loss[loss=0.1972, ctc_loss=0.1305, cr_loss=0.3338, over 20785.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1537, cr_loss=0.3769, over 3827729.34 frames. ], batch size: 53, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:28:24,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=490540.6666666667, ans=0.2 2024-09-16 19:28:48,513 INFO [train.py:1198] (0/2) Epoch 28, batch 600, loss[loss=0.2077, ctc_loss=0.1371, cr_loss=0.3533, over 20953.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1525, cr_loss=0.3755, over 3902282.28 frames. ], batch size: 50, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:28:56,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490597.3333333333, ans=0.1 2024-09-16 19:29:18,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=490654.0, ans=0.0 2024-09-16 19:29:25,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=490654.0, ans=0.0 2024-09-16 19:29:26,165 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.094e+02 2.223e+02 2.358e+02 3.482e+02, threshold=4.446e+02, percent-clipped=0.0 2024-09-16 19:29:44,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.03 vs. limit=10.0 2024-09-16 19:29:49,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=490682.3333333333, ans=0.2 2024-09-16 19:30:00,086 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:30:10,172 INFO [train.py:1198] (0/2) Epoch 28, batch 650, loss[loss=0.1865, ctc_loss=0.1221, cr_loss=0.3222, over 20994.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1534, cr_loss=0.3765, over 3939728.18 frames. ], batch size: 52, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:30:24,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=490767.3333333333, ans=0.1 2024-09-16 19:30:35,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.39 vs. limit=15.0 2024-09-16 19:31:05,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=490824.0, ans=0.0 2024-09-16 19:31:23,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=490852.3333333333, ans=0.125 2024-09-16 19:31:26,263 INFO [train.py:1198] (0/2) Epoch 28, batch 700, loss[loss=0.2134, ctc_loss=0.1405, cr_loss=0.3643, over 20790.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1534, cr_loss=0.3764, over 3959192.63 frames. ], batch size: 53, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:32:03,890 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.093e+02 2.264e+02 2.478e+02 2.916e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-16 19:32:41,688 INFO [train.py:1198] (0/2) Epoch 28, batch 750, loss[loss=0.2504, ctc_loss=0.1679, cr_loss=0.4121, over 21038.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.376, over 4001982.73 frames. ], batch size: 62, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:33:09,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-16 19:33:21,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=491079.0, ans=0.2 2024-09-16 19:33:33,535 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:33:34,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=491107.3333333333, ans=0.125 2024-09-16 19:33:35,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=491107.3333333333, ans=0.125 2024-09-16 19:33:47,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=491135.6666666667, ans=0.125 2024-09-16 19:33:57,451 INFO [train.py:1198] (0/2) Epoch 28, batch 800, loss[loss=0.2112, ctc_loss=0.1407, cr_loss=0.3525, over 20823.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1524, cr_loss=0.3752, over 4028334.05 frames. ], batch size: 59, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:34:13,265 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-09-16 19:34:35,373 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.101e+02 2.245e+02 2.423e+02 3.043e+02, threshold=4.490e+02, percent-clipped=0.0 2024-09-16 19:34:50,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=491249.0, ans=0.125 2024-09-16 19:34:53,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2024-09-16 19:35:01,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=491277.3333333333, ans=0.125 2024-09-16 19:35:13,316 INFO [train.py:1198] (0/2) Epoch 28, batch 850, loss[loss=0.2243, ctc_loss=0.149, cr_loss=0.3764, over 20862.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1527, cr_loss=0.3756, over 4041523.64 frames. ], batch size: 54, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:35:26,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=491305.6666666667, ans=0.0 2024-09-16 19:35:50,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=491362.3333333333, ans=0.0 2024-09-16 19:36:16,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=491390.6666666667, ans=0.0 2024-09-16 19:36:30,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2024-09-16 19:36:35,792 INFO [train.py:1198] (0/2) Epoch 28, batch 900, loss[loss=0.2162, ctc_loss=0.1451, cr_loss=0.3554, over 20965.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1532, cr_loss=0.3763, over 4052815.50 frames. ], batch size: 58, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:36:37,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=491447.3333333333, ans=0.125 2024-09-16 19:36:45,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.97 vs. limit=10.0 2024-09-16 19:37:05,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491504.0, ans=0.1 2024-09-16 19:37:13,913 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.131e+02 2.254e+02 2.374e+02 2.915e+02, threshold=4.507e+02, percent-clipped=0.0 2024-09-16 19:37:27,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=491532.3333333333, ans=0.025 2024-09-16 19:37:51,807 INFO [train.py:1198] (0/2) Epoch 28, batch 950, loss[loss=0.1815, ctc_loss=0.1184, cr_loss=0.3156, over 20960.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1528, cr_loss=0.3763, over 4079310.05 frames. ], batch size: 49, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:38:23,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491645.6666666667, ans=0.1 2024-09-16 19:38:26,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=491645.6666666667, ans=10.0 2024-09-16 19:38:42,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=491674.0, ans=0.125 2024-09-16 19:38:53,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2024-09-16 19:39:04,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=491702.3333333333, ans=0.125 2024-09-16 19:39:07,121 INFO [train.py:1198] (0/2) Epoch 28, batch 1000, loss[loss=0.2285, ctc_loss=0.1517, cr_loss=0.384, over 21013.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.153, cr_loss=0.3769, over 4080213.58 frames. ], batch size: 61, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:39:10,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=491730.6666666667, ans=0.04949747468305833 2024-09-16 19:39:16,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=491730.6666666667, ans=0.125 2024-09-16 19:39:45,067 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.136e+02 2.281e+02 2.453e+02 5.009e+02, threshold=4.561e+02, percent-clipped=2.0 2024-09-16 19:40:03,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491815.6666666667, ans=0.1 2024-09-16 19:40:20,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=491844.0, ans=0.0 2024-09-16 19:40:23,348 INFO [train.py:1198] (0/2) Epoch 28, batch 1050, loss[loss=0.2243, ctc_loss=0.1498, cr_loss=0.3725, over 20784.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1529, cr_loss=0.3764, over 4080981.27 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:40:28,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-09-16 19:40:43,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=491900.6666666667, ans=0.125 2024-09-16 19:40:44,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=491900.6666666667, ans=0.2 2024-09-16 19:40:54,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=491929.0, ans=10.0 2024-09-16 19:41:01,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=491929.0, ans=0.0 2024-09-16 19:41:07,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=491957.3333333333, ans=15.0 2024-09-16 19:41:34,742 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-09-16 19:41:44,451 INFO [train.py:1198] (0/2) Epoch 28, batch 1100, loss[loss=0.2584, ctc_loss=0.1729, cr_loss=0.4275, over 19413.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1531, cr_loss=0.3759, over 4076112.23 frames. ], batch size: 91, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:42:22,246 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.136e+02 2.231e+02 2.490e+02 3.661e+02, threshold=4.462e+02, percent-clipped=0.0 2024-09-16 19:42:38,822 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:42:48,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=492127.3333333333, ans=0.125 2024-09-16 19:42:59,863 INFO [train.py:1198] (0/2) Epoch 28, batch 1150, loss[loss=0.2287, ctc_loss=0.1543, cr_loss=0.3721, over 21016.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1543, cr_loss=0.378, over 4071358.55 frames. ], batch size: 61, lr: 2.98e-03, grad_scale: 16.0 2024-09-16 19:43:25,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492184.0, ans=0.1 2024-09-16 19:43:30,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=492212.3333333333, ans=0.0 2024-09-16 19:43:38,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2024-09-16 19:43:42,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.91 vs. limit=10.0 2024-09-16 19:44:15,287 INFO [train.py:1198] (0/2) Epoch 28, batch 1200, loss[loss=0.1939, ctc_loss=0.1286, cr_loss=0.3263, over 20951.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1544, cr_loss=0.3777, over 4072418.48 frames. ], batch size: 49, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:44:50,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=492354.0, ans=0.0 2024-09-16 19:44:54,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.207e+02 2.302e+02 2.494e+02 3.514e+02, threshold=4.604e+02, percent-clipped=0.0 2024-09-16 19:45:29,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=492439.0, ans=0.125 2024-09-16 19:45:30,918 INFO [train.py:1198] (0/2) Epoch 28, batch 1250, loss[loss=0.2539, ctc_loss=0.174, cr_loss=0.3995, over 21024.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1529, cr_loss=0.3751, over 4069869.81 frames. ], batch size: 63, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:45:36,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.76 vs. limit=15.0 2024-09-16 19:45:37,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=492439.0, ans=0.0 2024-09-16 19:45:38,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=492439.0, ans=0.125 2024-09-16 19:46:18,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=492524.0, ans=0.2 2024-09-16 19:46:46,588 INFO [train.py:1198] (0/2) Epoch 28, batch 1300, loss[loss=0.2492, ctc_loss=0.1665, cr_loss=0.4134, over 19315.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1524, cr_loss=0.375, over 4067119.75 frames. ], batch size: 90, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:46:51,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-16 19:47:28,837 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.102e+02 2.236e+02 2.427e+02 7.756e+02, threshold=4.473e+02, percent-clipped=1.0 2024-09-16 19:47:36,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=492665.6666666667, ans=0.125 2024-09-16 19:48:05,276 INFO [train.py:1198] (0/2) Epoch 28, batch 1350, loss[loss=0.2285, ctc_loss=0.1542, cr_loss=0.3717, over 20774.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1531, cr_loss=0.3758, over 4078503.92 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:48:17,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=492722.3333333333, ans=0.015 2024-09-16 19:48:36,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=492779.0, ans=0.2 2024-09-16 19:48:42,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=492779.0, ans=0.125 2024-09-16 19:48:42,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=492779.0, ans=0.125 2024-09-16 19:48:54,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=492807.3333333333, ans=0.0 2024-09-16 19:49:05,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=492835.6666666667, ans=0.0 2024-09-16 19:49:21,316 INFO [train.py:1198] (0/2) Epoch 28, batch 1400, loss[loss=0.2114, ctc_loss=0.1407, cr_loss=0.3533, over 20778.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1532, cr_loss=0.3751, over 4067490.56 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:49:21,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=492864.0, ans=0.125 2024-09-16 19:50:01,212 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.130e+02 2.235e+02 2.387e+02 3.813e+02, threshold=4.471e+02, percent-clipped=0.0 2024-09-16 19:50:01,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=492920.6666666667, ans=0.05 2024-09-16 19:50:24,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492977.3333333333, ans=0.1 2024-09-16 19:50:37,460 INFO [train.py:1198] (0/2) Epoch 28, batch 1450, loss[loss=0.2215, ctc_loss=0.1439, cr_loss=0.3881, over 20984.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1526, cr_loss=0.375, over 4078142.18 frames. ], batch size: 48, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:50:50,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=493005.6666666667, ans=0.125 2024-09-16 19:50:57,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=493034.0, ans=15.0 2024-09-16 19:51:31,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=493090.6666666667, ans=0.0 2024-09-16 19:51:52,877 INFO [train.py:1198] (0/2) Epoch 28, batch 1500, loss[loss=0.2047, ctc_loss=0.136, cr_loss=0.3435, over 20932.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1523, cr_loss=0.3747, over 4095357.13 frames. ], batch size: 50, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:52:06,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=493175.6666666667, ans=0.025 2024-09-16 19:52:32,104 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.152e+02 2.248e+02 2.416e+02 3.527e+02, threshold=4.497e+02, percent-clipped=0.0 2024-09-16 19:53:04,718 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-09-16 19:53:14,708 INFO [train.py:1198] (0/2) Epoch 28, batch 1550, loss[loss=0.2371, ctc_loss=0.1581, cr_loss=0.3948, over 20884.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1521, cr_loss=0.3744, over 4095516.12 frames. ], batch size: 57, lr: 2.98e-03, grad_scale: 32.0 2024-09-16 19:53:23,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=493289.0, ans=0.125 2024-09-16 19:53:48,521 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2024-09-16 19:53:51,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-16 19:54:27,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=493402.3333333333, ans=0.125 2024-09-16 19:54:30,055 INFO [train.py:1198] (0/2) Epoch 28, batch 1600, loss[loss=0.2079, ctc_loss=0.139, cr_loss=0.3443, over 19421.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1508, cr_loss=0.3723, over 4089577.81 frames. ], batch size: 43, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 19:55:09,177 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.057e+02 2.231e+02 2.372e+02 4.901e+02, threshold=4.462e+02, percent-clipped=0.0 2024-09-16 19:55:27,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=493515.6666666667, ans=0.125 2024-09-16 19:55:40,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.27 vs. limit=10.0 2024-09-16 19:55:44,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493572.3333333333, ans=0.1 2024-09-16 19:55:45,738 INFO [train.py:1198] (0/2) Epoch 28, batch 1650, loss[loss=0.2294, ctc_loss=0.1523, cr_loss=0.3857, over 20795.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1513, cr_loss=0.3729, over 4094079.73 frames. ], batch size: 53, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 19:56:04,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=493600.6666666667, ans=0.0 2024-09-16 19:56:24,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=493629.0, ans=0.125 2024-09-16 19:56:59,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2024-09-16 19:57:01,426 INFO [train.py:1198] (0/2) Epoch 28, batch 1700, loss[loss=0.239, ctc_loss=0.1578, cr_loss=0.406, over 20850.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1514, cr_loss=0.3732, over 4098236.04 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 19:57:39,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=493770.6666666667, ans=0.125 2024-09-16 19:57:40,168 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.190e+02 2.320e+02 2.535e+02 3.922e+02, threshold=4.640e+02, percent-clipped=1.0 2024-09-16 19:57:43,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=493770.6666666667, ans=0.125 2024-09-16 19:57:51,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=493799.0, ans=0.125 2024-09-16 19:58:04,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=493827.3333333333, ans=0.0 2024-09-16 19:58:07,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=493827.3333333333, ans=0.05 2024-09-16 19:58:16,309 INFO [train.py:1198] (0/2) Epoch 28, batch 1750, loss[loss=0.2301, ctc_loss=0.1534, cr_loss=0.3833, over 21065.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1519, cr_loss=0.3747, over 4101545.24 frames. ], batch size: 59, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 19:58:33,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493855.6666666667, ans=0.1 2024-09-16 19:58:55,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.55 vs. limit=15.0 2024-09-16 19:59:08,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=493940.6666666667, ans=0.2 2024-09-16 19:59:09,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=493940.6666666667, ans=0.025 2024-09-16 19:59:36,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=15.0 2024-09-16 19:59:38,643 INFO [train.py:1198] (0/2) Epoch 28, batch 1800, loss[loss=0.2432, ctc_loss=0.1647, cr_loss=0.3926, over 20777.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1523, cr_loss=0.375, over 4110465.48 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 19:59:40,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=493997.3333333333, ans=0.0 2024-09-16 19:59:53,822 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:00:17,669 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.135e+02 2.255e+02 2.429e+02 3.215e+02, threshold=4.511e+02, percent-clipped=0.0 2024-09-16 20:00:54,188 INFO [train.py:1198] (0/2) Epoch 28, batch 1850, loss[loss=0.2322, ctc_loss=0.1538, cr_loss=0.3922, over 20648.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.153, cr_loss=0.3759, over 4100205.40 frames. ], batch size: 66, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:00:56,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=494139.0, ans=6.0 2024-09-16 20:01:23,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494195.6666666667, ans=0.1 2024-09-16 20:01:24,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-09-16 20:01:26,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-16 20:01:32,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=494195.6666666667, ans=0.125 2024-09-16 20:01:48,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=494224.0, ans=0.04949747468305833 2024-09-16 20:01:52,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=494252.3333333333, ans=0.125 2024-09-16 20:02:09,253 INFO [train.py:1198] (0/2) Epoch 28, batch 1900, loss[loss=0.2007, ctc_loss=0.1316, cr_loss=0.3452, over 20972.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1527, cr_loss=0.3756, over 4107657.77 frames. ], batch size: 49, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:02:19,411 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-16 20:02:41,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=494337.3333333333, ans=0.0 2024-09-16 20:02:46,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-16 20:02:48,715 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.108e+02 2.247e+02 2.390e+02 3.525e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-16 20:03:25,339 INFO [train.py:1198] (0/2) Epoch 28, batch 1950, loss[loss=0.2487, ctc_loss=0.1657, cr_loss=0.4148, over 20977.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1522, cr_loss=0.3754, over 4115588.06 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:03:49,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-09-16 20:03:57,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=494479.0, ans=0.0 2024-09-16 20:03:57,917 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:04:05,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494479.0, ans=0.1 2024-09-16 20:04:14,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2024-09-16 20:04:27,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=494507.3333333333, ans=0.125 2024-09-16 20:04:46,761 INFO [train.py:1198] (0/2) Epoch 28, batch 2000, loss[loss=0.2184, ctc_loss=0.146, cr_loss=0.362, over 21072.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1521, cr_loss=0.3753, over 4110940.61 frames. ], batch size: 59, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:04:56,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=494564.0, ans=0.125 2024-09-16 20:05:00,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=494592.3333333333, ans=0.2 2024-09-16 20:05:17,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=494620.6666666667, ans=0.025 2024-09-16 20:05:22,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=494620.6666666667, ans=0.0 2024-09-16 20:05:26,487 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.133e+02 2.302e+02 2.529e+02 5.901e+02, threshold=4.605e+02, percent-clipped=3.0 2024-09-16 20:05:49,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494677.3333333333, ans=0.1 2024-09-16 20:06:03,214 INFO [train.py:1198] (0/2) Epoch 28, batch 2050, loss[loss=0.2224, ctc_loss=0.148, cr_loss=0.3721, over 20985.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1522, cr_loss=0.3743, over 4089476.82 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:06:18,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494734.0, ans=0.1 2024-09-16 20:06:22,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494734.0, ans=0.1 2024-09-16 20:06:44,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=494762.3333333333, ans=8.0 2024-09-16 20:06:47,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=494790.6666666667, ans=0.125 2024-09-16 20:07:08,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=494819.0, ans=0.125 2024-09-16 20:07:18,431 INFO [train.py:1198] (0/2) Epoch 28, batch 2100, loss[loss=0.2125, ctc_loss=0.1421, cr_loss=0.3521, over 20971.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1522, cr_loss=0.3747, over 4100531.66 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:07:30,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=494847.3333333333, ans=0.125 2024-09-16 20:07:30,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=494847.3333333333, ans=0.2 2024-09-16 20:07:31,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.80 vs. limit=10.0 2024-09-16 20:07:41,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=494875.6666666667, ans=0.0 2024-09-16 20:07:49,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2024-09-16 20:07:56,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=494904.0, ans=0.125 2024-09-16 20:07:57,534 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.122e+02 2.272e+02 2.437e+02 5.450e+02, threshold=4.543e+02, percent-clipped=1.0 2024-09-16 20:08:03,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=494932.3333333333, ans=10.0 2024-09-16 20:08:15,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=494932.3333333333, ans=0.125 2024-09-16 20:08:16,288 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2024-09-16 20:08:33,323 INFO [train.py:1198] (0/2) Epoch 28, batch 2150, loss[loss=0.2258, ctc_loss=0.1504, cr_loss=0.3771, over 20976.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1528, cr_loss=0.3756, over 4085873.12 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:09:12,030 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2024-09-16 20:09:12,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.57 vs. limit=15.0 2024-09-16 20:09:27,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-16 20:09:47,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2024-09-16 20:09:48,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495130.6666666667, ans=0.1 2024-09-16 20:09:52,277 INFO [train.py:1198] (0/2) Epoch 28, batch 2200, loss[loss=0.2625, ctc_loss=0.1791, cr_loss=0.4168, over 19694.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1529, cr_loss=0.3757, over 4091748.81 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:09:52,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=495130.6666666667, ans=0.125 2024-09-16 20:10:06,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495130.6666666667, ans=0.1 2024-09-16 20:10:34,731 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.141e+02 2.278e+02 2.440e+02 4.602e+02, threshold=4.557e+02, percent-clipped=1.0 2024-09-16 20:11:11,561 INFO [train.py:1198] (0/2) Epoch 28, batch 2250, loss[loss=0.213, ctc_loss=0.1408, cr_loss=0.3613, over 20880.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1531, cr_loss=0.3757, over 4105743.94 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:11:22,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=495272.3333333333, ans=0.0 2024-09-16 20:11:25,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2024-09-16 20:11:30,232 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=22.5 2024-09-16 20:11:36,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=495300.6666666667, ans=0.125 2024-09-16 20:11:44,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=495329.0, ans=0.025 2024-09-16 20:12:20,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=495385.6666666667, ans=0.0 2024-09-16 20:12:26,593 INFO [train.py:1198] (0/2) Epoch 28, batch 2300, loss[loss=0.2401, ctc_loss=0.1616, cr_loss=0.3925, over 20987.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1543, cr_loss=0.378, over 4100892.14 frames. ], batch size: 67, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:12:28,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=495414.0, ans=0.125 2024-09-16 20:12:45,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=495442.3333333333, ans=0.0 2024-09-16 20:12:52,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=495442.3333333333, ans=0.025 2024-09-16 20:13:06,135 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.112e+02 2.326e+02 2.444e+02 2.935e+02, threshold=4.652e+02, percent-clipped=0.0 2024-09-16 20:13:15,907 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2024-09-16 20:13:42,051 INFO [train.py:1198] (0/2) Epoch 28, batch 2350, loss[loss=0.2403, ctc_loss=0.1609, cr_loss=0.3966, over 20085.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.153, cr_loss=0.3755, over 4100259.76 frames. ], batch size: 80, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:14:09,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495584.0, ans=0.1 2024-09-16 20:14:44,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=495669.0, ans=0.04949747468305833 2024-09-16 20:14:57,524 INFO [train.py:1198] (0/2) Epoch 28, batch 2400, loss[loss=0.2318, ctc_loss=0.1529, cr_loss=0.3947, over 21030.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1529, cr_loss=0.3752, over 4094374.13 frames. ], batch size: 63, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:15:00,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=495697.3333333333, ans=0.125 2024-09-16 20:15:03,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495697.3333333333, ans=0.1 2024-09-16 20:15:30,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=495754.0, ans=0.125 2024-09-16 20:15:41,847 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.113e+02 2.271e+02 2.494e+02 3.632e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-16 20:16:15,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=495810.6666666667, ans=0.0 2024-09-16 20:16:17,681 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-09-16 20:16:18,515 INFO [train.py:1198] (0/2) Epoch 28, batch 2450, loss[loss=0.2108, ctc_loss=0.1399, cr_loss=0.3545, over 21061.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1523, cr_loss=0.3749, over 4093211.18 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:16:30,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=495839.0, ans=0.125 2024-09-16 20:16:35,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=495867.3333333333, ans=0.125 2024-09-16 20:16:40,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2024-09-16 20:16:44,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=495867.3333333333, ans=0.0 2024-09-16 20:16:51,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=15.0 2024-09-16 20:17:20,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495952.3333333333, ans=0.1 2024-09-16 20:17:34,050 INFO [train.py:1198] (0/2) Epoch 28, batch 2500, loss[loss=0.2441, ctc_loss=0.1626, cr_loss=0.4075, over 20711.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1518, cr_loss=0.3738, over 4085984.45 frames. ], batch size: 71, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:17:50,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=496009.0, ans=0.2 2024-09-16 20:18:13,358 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.187e+02 2.313e+02 2.428e+02 3.220e+02, threshold=4.626e+02, percent-clipped=0.0 2024-09-16 20:18:15,378 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:18:18,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=496065.6666666667, ans=0.2 2024-09-16 20:18:19,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=496065.6666666667, ans=0.0 2024-09-16 20:18:25,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=496065.6666666667, ans=0.0 2024-09-16 20:18:28,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496065.6666666667, ans=0.1 2024-09-16 20:18:34,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=496094.0, ans=0.025 2024-09-16 20:18:35,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=496094.0, ans=0.025 2024-09-16 20:18:48,958 INFO [train.py:1198] (0/2) Epoch 28, batch 2550, loss[loss=0.1944, ctc_loss=0.1267, cr_loss=0.3386, over 19944.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1524, cr_loss=0.3747, over 4073196.48 frames. ], batch size: 44, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:18:55,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=496122.3333333333, ans=0.0 2024-09-16 20:19:07,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496150.6666666667, ans=0.1 2024-09-16 20:19:13,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=496150.6666666667, ans=0.0 2024-09-16 20:19:19,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=496179.0, ans=0.125 2024-09-16 20:19:36,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=496207.3333333333, ans=0.125 2024-09-16 20:19:40,752 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.71 vs. limit=12.0 2024-09-16 20:19:47,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=496207.3333333333, ans=0.125 2024-09-16 20:20:05,433 INFO [train.py:1198] (0/2) Epoch 28, batch 2600, loss[loss=0.2328, ctc_loss=0.1551, cr_loss=0.3888, over 20867.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1528, cr_loss=0.375, over 4064327.48 frames. ], batch size: 65, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:20:41,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=496320.6666666667, ans=0.0 2024-09-16 20:20:44,478 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.141e+02 2.289e+02 2.459e+02 4.311e+02, threshold=4.578e+02, percent-clipped=0.0 2024-09-16 20:21:26,807 INFO [train.py:1198] (0/2) Epoch 28, batch 2650, loss[loss=0.2032, ctc_loss=0.1354, cr_loss=0.3393, over 20218.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1523, cr_loss=0.3754, over 4072171.61 frames. ], batch size: 45, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:21:27,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496405.6666666667, ans=0.0 2024-09-16 20:22:42,063 INFO [train.py:1198] (0/2) Epoch 28, batch 2700, loss[loss=0.1876, ctc_loss=0.1224, cr_loss=0.3259, over 20980.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1522, cr_loss=0.3746, over 4067714.47 frames. ], batch size: 48, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:23:21,631 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.128e+02 2.256e+02 2.425e+02 3.621e+02, threshold=4.512e+02, percent-clipped=0.0 2024-09-16 20:23:32,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496632.3333333333, ans=0.0 2024-09-16 20:23:39,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-09-16 20:23:57,848 INFO [train.py:1198] (0/2) Epoch 28, batch 2750, loss[loss=0.2364, ctc_loss=0.1623, cr_loss=0.3702, over 18342.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.3761, over 4077253.17 frames. ], batch size: 108, lr: 2.97e-03, grad_scale: 32.0 2024-09-16 20:24:00,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.25 vs. limit=15.0 2024-09-16 20:24:05,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=496689.0, ans=0.1 2024-09-16 20:24:24,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=15.0 2024-09-16 20:24:26,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=496745.6666666667, ans=0.125 2024-09-16 20:24:38,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=496745.6666666667, ans=0.2 2024-09-16 20:24:48,580 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=22.5 2024-09-16 20:24:55,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=496774.0, ans=0.125 2024-09-16 20:25:09,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=496802.3333333333, ans=0.07 2024-09-16 20:25:13,430 INFO [train.py:1198] (0/2) Epoch 28, batch 2800, loss[loss=0.2089, ctc_loss=0.1385, cr_loss=0.3521, over 19580.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1512, cr_loss=0.3736, over 4087852.84 frames. ], batch size: 43, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:25:43,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.14 vs. limit=15.0 2024-09-16 20:25:52,566 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.109e+02 2.232e+02 2.388e+02 2.954e+02, threshold=4.464e+02, percent-clipped=0.0 2024-09-16 20:25:54,929 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=22.5 2024-09-16 20:26:11,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=496915.6666666667, ans=0.125 2024-09-16 20:26:20,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=496944.0, ans=0.5 2024-09-16 20:26:29,520 INFO [train.py:1198] (0/2) Epoch 28, batch 2850, loss[loss=0.2526, ctc_loss=0.1701, cr_loss=0.4128, over 20865.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1508, cr_loss=0.3723, over 4089501.33 frames. ], batch size: 65, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:27:01,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=497029.0, ans=0.025 2024-09-16 20:27:03,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=497029.0, ans=0.0 2024-09-16 20:27:06,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=497029.0, ans=0.125 2024-09-16 20:27:25,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=497057.3333333333, ans=0.125 2024-09-16 20:27:51,526 INFO [train.py:1198] (0/2) Epoch 28, batch 2900, loss[loss=0.252, ctc_loss=0.1723, cr_loss=0.3985, over 19998.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1513, cr_loss=0.3737, over 4098567.98 frames. ], batch size: 80, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:28:13,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=497142.3333333333, ans=0.125 2024-09-16 20:28:31,407 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.125e+02 2.271e+02 2.398e+02 2.995e+02, threshold=4.542e+02, percent-clipped=0.0 2024-09-16 20:28:37,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=497199.0, ans=0.0 2024-09-16 20:29:07,760 INFO [train.py:1198] (0/2) Epoch 28, batch 2950, loss[loss=0.214, ctc_loss=0.1443, cr_loss=0.3487, over 20996.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.151, cr_loss=0.3733, over 4111361.70 frames. ], batch size: 48, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:29:14,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=497255.6666666667, ans=0.5 2024-09-16 20:29:26,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=497284.0, ans=0.125 2024-09-16 20:30:04,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=497340.6666666667, ans=0.1 2024-09-16 20:30:23,427 INFO [train.py:1198] (0/2) Epoch 28, batch 3000, loss[loss=0.2001, ctc_loss=0.1312, cr_loss=0.3447, over 20965.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1505, cr_loss=0.3729, over 4116930.53 frames. ], batch size: 51, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:30:23,428 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 20:30:48,313 INFO [train.py:1230] (0/2) Epoch 28, validation: loss=0.04093, ctc_loss=0.04093, cr_loss=1.238e-14, over 944034.00 frames. 2024-09-16 20:30:48,313 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 20:31:27,834 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.138e+02 2.291e+02 2.531e+02 4.310e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-16 20:31:42,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=497482.3333333333, ans=0.0 2024-09-16 20:31:58,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=497510.6666666667, ans=0.1 2024-09-16 20:32:04,176 INFO [train.py:1198] (0/2) Epoch 28, batch 3050, loss[loss=0.273, ctc_loss=0.1942, cr_loss=0.394, over 14748.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1511, cr_loss=0.3737, over 4095301.38 frames. ], batch size: 149, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:32:22,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=497567.3333333333, ans=0.125 2024-09-16 20:32:27,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=497567.3333333333, ans=0.125 2024-09-16 20:32:27,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-09-16 20:32:49,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=497595.6666666667, ans=0.125 2024-09-16 20:33:25,902 INFO [train.py:1198] (0/2) Epoch 28, batch 3100, loss[loss=0.2436, ctc_loss=0.1627, cr_loss=0.4047, over 20328.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1514, cr_loss=0.3743, over 4102489.74 frames. ], batch size: 74, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:33:28,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2024-09-16 20:33:48,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=497709.0, ans=0.125 2024-09-16 20:33:58,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=497737.3333333333, ans=0.1 2024-09-16 20:34:04,465 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.109e+02 2.242e+02 2.400e+02 5.396e+02, threshold=4.483e+02, percent-clipped=1.0 2024-09-16 20:34:04,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=497737.3333333333, ans=0.125 2024-09-16 20:34:27,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=15.0 2024-09-16 20:34:37,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=497794.0, ans=0.125 2024-09-16 20:34:40,641 INFO [train.py:1198] (0/2) Epoch 28, batch 3150, loss[loss=0.2221, ctc_loss=0.1494, cr_loss=0.3636, over 20883.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1512, cr_loss=0.3738, over 4107416.56 frames. ], batch size: 54, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:34:46,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=497822.3333333333, ans=0.125 2024-09-16 20:35:02,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-16 20:35:30,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.76 vs. limit=12.0 2024-09-16 20:35:51,508 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.22 vs. limit=12.0 2024-09-16 20:35:56,670 INFO [train.py:1198] (0/2) Epoch 28, batch 3200, loss[loss=0.2032, ctc_loss=0.1354, cr_loss=0.3393, over 20978.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.152, cr_loss=0.3746, over 4093699.04 frames. ], batch size: 48, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:36:12,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.52 vs. limit=22.5 2024-09-16 20:36:31,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=498020.6666666667, ans=0.0 2024-09-16 20:36:33,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=498020.6666666667, ans=0.125 2024-09-16 20:36:34,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=498020.6666666667, ans=0.125 2024-09-16 20:36:37,057 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.199e+02 2.317e+02 2.478e+02 3.979e+02, threshold=4.634e+02, percent-clipped=0.0 2024-09-16 20:36:58,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=498077.3333333333, ans=0.0 2024-09-16 20:37:07,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=498077.3333333333, ans=0.2 2024-09-16 20:37:11,796 INFO [train.py:1198] (0/2) Epoch 28, batch 3250, loss[loss=0.2046, ctc_loss=0.1357, cr_loss=0.3449, over 20899.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1532, cr_loss=0.3752, over 4065228.72 frames. ], batch size: 54, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:37:46,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=498162.3333333333, ans=0.125 2024-09-16 20:37:50,011 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-09-16 20:38:01,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=498190.6666666667, ans=0.125 2024-09-16 20:38:24,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-09-16 20:38:29,967 INFO [train.py:1198] (0/2) Epoch 28, batch 3300, loss[loss=0.2115, ctc_loss=0.1404, cr_loss=0.3557, over 20975.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1535, cr_loss=0.3768, over 4080923.94 frames. ], batch size: 50, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:38:30,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=498247.3333333333, ans=0.125 2024-09-16 20:39:13,732 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.130e+02 2.280e+02 2.472e+02 3.865e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-16 20:39:34,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=498360.6666666667, ans=0.125 2024-09-16 20:39:37,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=498360.6666666667, ans=0.2 2024-09-16 20:39:41,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.84 vs. limit=15.0 2024-09-16 20:39:48,679 INFO [train.py:1198] (0/2) Epoch 28, batch 3350, loss[loss=0.275, ctc_loss=0.198, cr_loss=0.3853, over 13568.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1541, cr_loss=0.3776, over 4078398.19 frames. ], batch size: 149, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:39:57,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=498389.0, ans=0.125 2024-09-16 20:40:05,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=498417.3333333333, ans=0.1 2024-09-16 20:40:22,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=498445.6666666667, ans=0.125 2024-09-16 20:40:48,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=498502.3333333333, ans=0.0 2024-09-16 20:41:04,428 INFO [train.py:1198] (0/2) Epoch 28, batch 3400, loss[loss=0.2371, ctc_loss=0.1602, cr_loss=0.3848, over 21039.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1535, cr_loss=0.3765, over 4071920.33 frames. ], batch size: 62, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:41:17,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.97 vs. limit=22.5 2024-09-16 20:41:33,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498587.3333333333, ans=0.1 2024-09-16 20:41:45,209 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.146e+02 2.277e+02 2.460e+02 4.514e+02, threshold=4.555e+02, percent-clipped=0.0 2024-09-16 20:42:15,571 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-176000.pt 2024-09-16 20:42:21,017 INFO [train.py:1198] (0/2) Epoch 28, batch 3450, loss[loss=0.2239, ctc_loss=0.145, cr_loss=0.3946, over 20867.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1531, cr_loss=0.3767, over 4081714.13 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:42:24,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-16 20:42:36,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=498700.6666666667, ans=0.125 2024-09-16 20:42:47,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=498700.6666666667, ans=0.125 2024-09-16 20:42:51,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.96 vs. limit=22.5 2024-09-16 20:42:59,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498729.0, ans=0.1 2024-09-16 20:43:00,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=498729.0, ans=0.0 2024-09-16 20:43:02,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.57 vs. limit=15.0 2024-09-16 20:43:03,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=498729.0, ans=0.2 2024-09-16 20:43:20,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=498785.6666666667, ans=0.125 2024-09-16 20:43:33,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=498785.6666666667, ans=0.1 2024-09-16 20:43:36,475 INFO [train.py:1198] (0/2) Epoch 28, batch 3500, loss[loss=0.2259, ctc_loss=0.1532, cr_loss=0.3637, over 20640.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.376, over 4086547.14 frames. ], batch size: 68, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:44:22,559 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.183e+02 2.312e+02 2.447e+02 5.682e+02, threshold=4.625e+02, percent-clipped=1.0 2024-09-16 20:44:26,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2024-09-16 20:44:38,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=498899.0, ans=0.0 2024-09-16 20:44:44,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=498927.3333333333, ans=0.125 2024-09-16 20:44:57,457 INFO [train.py:1198] (0/2) Epoch 28, batch 3550, loss[loss=0.2401, ctc_loss=0.1622, cr_loss=0.3894, over 20298.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1524, cr_loss=0.3753, over 4087953.07 frames. ], batch size: 74, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:45:12,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=498984.0, ans=0.125 2024-09-16 20:45:14,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=498984.0, ans=0.125 2024-09-16 20:45:29,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=499012.3333333333, ans=0.0 2024-09-16 20:45:32,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=499012.3333333333, ans=0.125 2024-09-16 20:45:37,215 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-16 20:45:39,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=499012.3333333333, ans=0.0 2024-09-16 20:45:58,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499069.0, ans=0.1 2024-09-16 20:46:12,813 INFO [train.py:1198] (0/2) Epoch 28, batch 3600, loss[loss=0.2911, ctc_loss=0.2064, cr_loss=0.4236, over 14367.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1526, cr_loss=0.3756, over 4088020.14 frames. ], batch size: 151, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:46:28,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=22.5 2024-09-16 20:46:37,695 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-09-16 20:46:53,206 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.146e+02 2.288e+02 2.475e+02 4.845e+02, threshold=4.576e+02, percent-clipped=1.0 2024-09-16 20:47:28,405 INFO [train.py:1198] (0/2) Epoch 28, batch 3650, loss[loss=0.2218, ctc_loss=0.1515, cr_loss=0.3514, over 20360.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1524, cr_loss=0.3747, over 4096740.84 frames. ], batch size: 74, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:47:39,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=499239.0, ans=0.125 2024-09-16 20:48:08,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-16 20:48:14,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=499324.0, ans=0.025 2024-09-16 20:48:24,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=499324.0, ans=0.125 2024-09-16 20:48:29,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=499352.3333333333, ans=0.125 2024-09-16 20:48:35,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=499352.3333333333, ans=0.125 2024-09-16 20:48:41,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=499352.3333333333, ans=0.125 2024-09-16 20:48:44,197 INFO [train.py:1198] (0/2) Epoch 28, batch 3700, loss[loss=0.2747, ctc_loss=0.1866, cr_loss=0.4403, over 20947.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1516, cr_loss=0.3736, over 4103375.46 frames. ], batch size: 67, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:49:25,202 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.098e+02 2.247e+02 2.386e+02 3.394e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-16 20:49:37,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2024-09-16 20:49:39,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=499465.6666666667, ans=0.125 2024-09-16 20:49:45,504 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=22.5 2024-09-16 20:50:06,017 INFO [train.py:1198] (0/2) Epoch 28, batch 3750, loss[loss=0.2356, ctc_loss=0.1579, cr_loss=0.3885, over 20693.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1521, cr_loss=0.3742, over 4100704.82 frames. ], batch size: 66, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:50:39,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=499579.0, ans=0.125 2024-09-16 20:51:14,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=499635.6666666667, ans=0.0 2024-09-16 20:51:21,715 INFO [train.py:1198] (0/2) Epoch 28, batch 3800, loss[loss=0.2041, ctc_loss=0.1351, cr_loss=0.345, over 20985.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1514, cr_loss=0.3734, over 4092873.55 frames. ], batch size: 48, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:51:41,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=499692.3333333333, ans=0.2 2024-09-16 20:51:46,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=499692.3333333333, ans=0.025 2024-09-16 20:52:02,249 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.126e+02 2.237e+02 2.383e+02 2.888e+02, threshold=4.474e+02, percent-clipped=0.0 2024-09-16 20:52:02,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=499720.6666666667, ans=0.0 2024-09-16 20:52:07,435 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=12.0 2024-09-16 20:52:28,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=499777.3333333333, ans=0.125 2024-09-16 20:52:36,012 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:52:37,233 INFO [train.py:1198] (0/2) Epoch 28, batch 3850, loss[loss=0.2082, ctc_loss=0.1398, cr_loss=0.3421, over 21048.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.151, cr_loss=0.3729, over 4096630.61 frames. ], batch size: 62, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:52:48,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2024-09-16 20:53:24,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499890.6666666667, ans=0.1 2024-09-16 20:53:32,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=12.0 2024-09-16 20:53:52,736 INFO [train.py:1198] (0/2) Epoch 28, batch 3900, loss[loss=0.2267, ctc_loss=0.1522, cr_loss=0.3727, over 21057.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.152, cr_loss=0.3749, over 4088421.04 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:54:07,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=499975.6666666667, ans=0.125 2024-09-16 20:54:33,132 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.115e+02 2.236e+02 2.375e+02 3.783e+02, threshold=4.472e+02, percent-clipped=0.0 2024-09-16 20:55:08,581 INFO [train.py:1198] (0/2) Epoch 28, batch 3950, loss[loss=0.2255, ctc_loss=0.1512, cr_loss=0.3715, over 20962.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1514, cr_loss=0.3741, over 4101618.34 frames. ], batch size: 58, lr: 2.96e-03, grad_scale: 32.0 2024-09-16 20:55:21,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=500089.0, ans=0.2 2024-09-16 20:55:24,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=22.5 2024-09-16 20:55:44,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2024-09-16 20:56:30,378 INFO [train.py:1198] (0/2) Epoch 28, batch 4000, loss[loss=0.2669, ctc_loss=0.1819, cr_loss=0.4253, over 20640.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1522, cr_loss=0.3753, over 4100587.73 frames. ], batch size: 66, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 20:56:30,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=500230.6666666667, ans=0.0 2024-09-16 20:57:03,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=500287.3333333333, ans=0.125 2024-09-16 20:57:11,145 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.144e+02 2.276e+02 2.484e+02 3.087e+02, threshold=4.552e+02, percent-clipped=0.0 2024-09-16 20:57:46,278 INFO [train.py:1198] (0/2) Epoch 28, batch 4050, loss[loss=0.2313, ctc_loss=0.1548, cr_loss=0.3828, over 21020.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1515, cr_loss=0.3746, over 4103754.87 frames. ], batch size: 62, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 20:57:49,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=500372.3333333333, ans=0.0 2024-09-16 20:57:55,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=500372.3333333333, ans=0.015 2024-09-16 20:58:21,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=500429.0, ans=0.125 2024-09-16 20:58:24,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=500429.0, ans=0.04949747468305833 2024-09-16 20:58:29,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=500429.0, ans=0.125 2024-09-16 20:58:42,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=500457.3333333333, ans=0.1 2024-09-16 20:58:55,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=500485.6666666667, ans=0.1 2024-09-16 20:59:01,992 INFO [train.py:1198] (0/2) Epoch 28, batch 4100, loss[loss=0.2143, ctc_loss=0.1409, cr_loss=0.3666, over 20969.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1522, cr_loss=0.3762, over 4100519.09 frames. ], batch size: 49, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 20:59:03,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=500514.0, ans=0.125 2024-09-16 20:59:18,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=500542.3333333333, ans=0.1 2024-09-16 20:59:42,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=22.5 2024-09-16 20:59:44,567 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.085e+02 2.201e+02 2.340e+02 3.271e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-16 20:59:57,396 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2024-09-16 21:00:13,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=500627.3333333333, ans=10.0 2024-09-16 21:00:17,568 INFO [train.py:1198] (0/2) Epoch 28, batch 4150, loss[loss=0.2038, ctc_loss=0.1358, cr_loss=0.3401, over 20974.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1519, cr_loss=0.3752, over 4102241.89 frames. ], batch size: 52, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:00:23,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=500655.6666666667, ans=10.0 2024-09-16 21:00:40,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=500684.0, ans=0.5 2024-09-16 21:00:43,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=500684.0, ans=0.0 2024-09-16 21:00:43,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.54 vs. limit=15.0 2024-09-16 21:01:35,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=500797.3333333333, ans=15.0 2024-09-16 21:01:35,876 INFO [train.py:1198] (0/2) Epoch 28, batch 4200, loss[loss=0.1924, ctc_loss=0.1263, cr_loss=0.3305, over 20939.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1513, cr_loss=0.3744, over 4107522.88 frames. ], batch size: 50, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:02:20,475 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.165e+02 2.284e+02 2.410e+02 6.079e+02, threshold=4.568e+02, percent-clipped=1.0 2024-09-16 21:02:20,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=500854.0, ans=0.04949747468305833 2024-09-16 21:02:27,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.29 vs. limit=10.0 2024-09-16 21:02:27,504 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-09-16 21:02:54,296 INFO [train.py:1198] (0/2) Epoch 28, batch 4250, loss[loss=0.2137, ctc_loss=0.143, cr_loss=0.3535, over 20882.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1524, cr_loss=0.3757, over 4103980.36 frames. ], batch size: 54, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:02:54,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=500939.0, ans=0.0 2024-09-16 21:03:08,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500967.3333333333, ans=0.1 2024-09-16 21:03:11,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=500967.3333333333, ans=0.025 2024-09-16 21:03:13,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-16 21:03:14,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=500967.3333333333, ans=0.0 2024-09-16 21:03:36,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=500995.6666666667, ans=0.015 2024-09-16 21:03:37,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=500995.6666666667, ans=0.125 2024-09-16 21:03:41,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=501024.0, ans=0.0 2024-09-16 21:03:50,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=501024.0, ans=0.125 2024-09-16 21:03:58,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=501052.3333333333, ans=0.2 2024-09-16 21:04:10,330 INFO [train.py:1198] (0/2) Epoch 28, batch 4300, loss[loss=0.2233, ctc_loss=0.1501, cr_loss=0.3662, over 21004.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1526, cr_loss=0.3757, over 4087328.14 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:04:19,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.24 vs. limit=15.0 2024-09-16 21:04:28,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501109.0, ans=0.1 2024-09-16 21:04:33,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=22.5 2024-09-16 21:04:34,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=501109.0, ans=0.2 2024-09-16 21:04:35,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-09-16 21:04:46,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=501137.3333333333, ans=0.125 2024-09-16 21:04:52,545 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.142e+02 2.228e+02 2.405e+02 4.426e+02, threshold=4.457e+02, percent-clipped=0.0 2024-09-16 21:05:15,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-09-16 21:05:19,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=501194.0, ans=0.125 2024-09-16 21:05:25,696 INFO [train.py:1198] (0/2) Epoch 28, batch 4350, loss[loss=0.2248, ctc_loss=0.1519, cr_loss=0.365, over 20780.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1528, cr_loss=0.3766, over 4094242.39 frames. ], batch size: 53, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:06:27,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=501335.6666666667, ans=0.125 2024-09-16 21:06:41,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=501335.6666666667, ans=22.5 2024-09-16 21:06:42,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=12.0 2024-09-16 21:06:44,319 INFO [train.py:1198] (0/2) Epoch 28, batch 4400, loss[loss=0.247, ctc_loss=0.1646, cr_loss=0.4117, over 20947.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3759, over 4093053.03 frames. ], batch size: 64, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:07:16,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=501420.6666666667, ans=0.125 2024-09-16 21:07:30,712 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.147e+02 2.251e+02 2.378e+02 5.998e+02, threshold=4.502e+02, percent-clipped=1.0 2024-09-16 21:07:30,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=501420.6666666667, ans=0.0 2024-09-16 21:07:31,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=501420.6666666667, ans=0.025 2024-09-16 21:08:04,217 INFO [train.py:1198] (0/2) Epoch 28, batch 4450, loss[loss=0.2301, ctc_loss=0.152, cr_loss=0.3907, over 20868.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1541, cr_loss=0.3782, over 4094523.25 frames. ], batch size: 54, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:08:29,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2024-09-16 21:09:19,614 INFO [train.py:1198] (0/2) Epoch 28, batch 4500, loss[loss=0.2046, ctc_loss=0.1354, cr_loss=0.3458, over 20769.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.154, cr_loss=0.3782, over 4094737.46 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:09:35,124 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:09:56,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=501704.0, ans=0.1 2024-09-16 21:09:59,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=501704.0, ans=0.2 2024-09-16 21:10:01,879 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.133e+02 2.291e+02 2.453e+02 3.158e+02, threshold=4.581e+02, percent-clipped=0.0 2024-09-16 21:10:35,604 INFO [train.py:1198] (0/2) Epoch 28, batch 4550, loss[loss=0.2398, ctc_loss=0.1613, cr_loss=0.3921, over 21014.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1541, cr_loss=0.3788, over 4098331.62 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:10:58,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=501817.3333333333, ans=0.0 2024-09-16 21:11:52,244 INFO [train.py:1198] (0/2) Epoch 28, batch 4600, loss[loss=0.2339, ctc_loss=0.1557, cr_loss=0.3912, over 20993.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1536, cr_loss=0.3777, over 4101537.05 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:12:08,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=501959.0, ans=0.0 2024-09-16 21:12:26,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=501987.3333333333, ans=0.125 2024-09-16 21:12:39,086 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.146e+02 2.237e+02 2.412e+02 3.153e+02, threshold=4.474e+02, percent-clipped=0.0 2024-09-16 21:13:13,547 INFO [train.py:1198] (0/2) Epoch 28, batch 4650, loss[loss=0.2158, ctc_loss=0.1427, cr_loss=0.3656, over 20762.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.154, cr_loss=0.3789, over 4105489.83 frames. ], batch size: 53, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:13:13,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=502072.3333333333, ans=0.0 2024-09-16 21:13:33,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=502100.6666666667, ans=0.2 2024-09-16 21:14:27,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=502214.0, ans=0.0 2024-09-16 21:14:28,674 INFO [train.py:1198] (0/2) Epoch 28, batch 4700, loss[loss=0.2451, ctc_loss=0.1655, cr_loss=0.3982, over 20297.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1535, cr_loss=0.3771, over 4101289.37 frames. ], batch size: 74, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:14:36,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=502214.0, ans=0.2 2024-09-16 21:14:45,564 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:15:12,595 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.129e+02 2.268e+02 2.494e+02 5.015e+02, threshold=4.537e+02, percent-clipped=1.0 2024-09-16 21:15:44,771 INFO [train.py:1198] (0/2) Epoch 28, batch 4750, loss[loss=0.272, ctc_loss=0.1912, cr_loss=0.4038, over 18208.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1537, cr_loss=0.3779, over 4100163.52 frames. ], batch size: 108, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:15:57,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=502355.6666666667, ans=0.125 2024-09-16 21:16:23,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-16 21:16:28,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-16 21:17:01,159 INFO [train.py:1198] (0/2) Epoch 28, batch 4800, loss[loss=0.2014, ctc_loss=0.1332, cr_loss=0.341, over 20935.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1533, cr_loss=0.3769, over 4095783.64 frames. ], batch size: 49, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:17:44,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.224e+02 2.306e+02 2.486e+02 3.165e+02, threshold=4.611e+02, percent-clipped=0.0 2024-09-16 21:17:56,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=502582.3333333333, ans=0.125 2024-09-16 21:18:19,046 INFO [train.py:1198] (0/2) Epoch 28, batch 4850, loss[loss=0.2058, ctc_loss=0.1374, cr_loss=0.3422, over 20922.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1529, cr_loss=0.3758, over 4092452.27 frames. ], batch size: 60, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:19:18,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=502724.0, ans=0.1 2024-09-16 21:19:37,147 INFO [train.py:1198] (0/2) Epoch 28, batch 4900, loss[loss=0.2454, ctc_loss=0.1658, cr_loss=0.3982, over 20640.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1542, cr_loss=0.3775, over 4068190.80 frames. ], batch size: 68, lr: 2.95e-03, grad_scale: 32.0 2024-09-16 21:19:49,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=502780.6666666667, ans=0.125 2024-09-16 21:20:08,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=502837.3333333333, ans=0.125 2024-09-16 21:20:20,047 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.110e+02 2.257e+02 2.460e+02 3.997e+02, threshold=4.514e+02, percent-clipped=0.0 2024-09-16 21:20:29,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=502865.6666666667, ans=0.1 2024-09-16 21:20:51,006 INFO [train.py:1198] (0/2) Epoch 28, batch 4950, loss[loss=0.1854, ctc_loss=0.121, cr_loss=0.3216, over 20964.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1537, cr_loss=0.3767, over 4069264.55 frames. ], batch size: 50, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:21:01,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=502922.3333333333, ans=0.0 2024-09-16 21:21:04,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=502950.6666666667, ans=0.0 2024-09-16 21:21:22,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=502979.0, ans=0.2 2024-09-16 21:21:32,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=502979.0, ans=0.125 2024-09-16 21:21:38,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503007.3333333333, ans=0.1 2024-09-16 21:21:43,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2024-09-16 21:21:53,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503035.6666666667, ans=0.1 2024-09-16 21:22:04,927 INFO [train.py:1198] (0/2) Epoch 28, batch 5000, loss[loss=0.1818, ctc_loss=0.1208, cr_loss=0.3052, over 20967.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1537, cr_loss=0.3776, over 4076550.94 frames. ], batch size: 50, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:22:05,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=503064.0, ans=0.125 2024-09-16 21:22:06,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=503064.0, ans=0.015 2024-09-16 21:22:27,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503092.3333333333, ans=0.1 2024-09-16 21:22:40,320 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=22.5 2024-09-16 21:22:49,544 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.130e+02 2.261e+02 2.428e+02 2.897e+02, threshold=4.523e+02, percent-clipped=0.0 2024-09-16 21:23:16,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=503177.3333333333, ans=0.0 2024-09-16 21:23:19,581 INFO [train.py:1198] (0/2) Epoch 28, batch 5050, loss[loss=0.2403, ctc_loss=0.1612, cr_loss=0.3953, over 19595.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3762, over 4095986.14 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 16.0 2024-09-16 21:23:19,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=503205.6666666667, ans=0.125 2024-09-16 21:23:27,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=503205.6666666667, ans=0.125 2024-09-16 21:24:05,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=503290.6666666667, ans=0.125 2024-09-16 21:24:08,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=503290.6666666667, ans=0.0 2024-09-16 21:24:19,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=503319.0, ans=0.2 2024-09-16 21:24:33,133 INFO [train.py:1198] (0/2) Epoch 28, batch 5100, loss[loss=0.2316, ctc_loss=0.154, cr_loss=0.388, over 21034.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1523, cr_loss=0.3758, over 4099489.98 frames. ], batch size: 63, lr: 2.95e-03, grad_scale: 8.0 2024-09-16 21:24:45,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=503347.3333333333, ans=0.125 2024-09-16 21:24:45,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.71 vs. limit=10.0 2024-09-16 21:24:46,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=503375.6666666667, ans=0.125 2024-09-16 21:25:18,474 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.141e+02 2.268e+02 2.427e+02 3.703e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-16 21:25:46,627 INFO [train.py:1198] (0/2) Epoch 28, batch 5150, loss[loss=0.2556, ctc_loss=0.1725, cr_loss=0.4153, over 20145.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1523, cr_loss=0.3755, over 4088340.86 frames. ], batch size: 80, lr: 2.95e-03, grad_scale: 8.0 2024-09-16 21:25:52,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=503489.0, ans=0.05 2024-09-16 21:26:09,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=12.0 2024-09-16 21:26:38,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=503574.0, ans=0.125 2024-09-16 21:26:52,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=503602.3333333333, ans=0.0 2024-09-16 21:27:03,376 INFO [train.py:1198] (0/2) Epoch 28, batch 5200, loss[loss=0.2024, ctc_loss=0.1345, cr_loss=0.3396, over 21048.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1523, cr_loss=0.3749, over 4096355.60 frames. ], batch size: 53, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:27:46,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503715.6666666667, ans=0.1 2024-09-16 21:27:49,157 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.200e+02 2.332e+02 2.586e+02 6.274e+02, threshold=4.663e+02, percent-clipped=2.0 2024-09-16 21:28:00,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=503715.6666666667, ans=0.125 2024-09-16 21:28:00,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.63 vs. limit=10.0 2024-09-16 21:28:17,910 INFO [train.py:1198] (0/2) Epoch 28, batch 5250, loss[loss=0.2264, ctc_loss=0.1482, cr_loss=0.3908, over 20660.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1524, cr_loss=0.3754, over 4095629.38 frames. ], batch size: 71, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:28:20,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=503772.3333333333, ans=0.125 2024-09-16 21:29:03,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=503829.0, ans=0.2 2024-09-16 21:29:35,074 INFO [train.py:1198] (0/2) Epoch 28, batch 5300, loss[loss=0.2535, ctc_loss=0.175, cr_loss=0.3925, over 19984.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1522, cr_loss=0.3752, over 4096097.97 frames. ], batch size: 80, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:29:44,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=503914.0, ans=0.125 2024-09-16 21:29:49,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=503942.3333333333, ans=0.125 2024-09-16 21:30:19,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2024-09-16 21:30:21,515 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.155e+02 2.302e+02 2.472e+02 3.709e+02, threshold=4.603e+02, percent-clipped=0.0 2024-09-16 21:30:49,535 INFO [train.py:1198] (0/2) Epoch 28, batch 5350, loss[loss=0.2143, ctc_loss=0.1419, cr_loss=0.362, over 20876.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1518, cr_loss=0.3742, over 4108700.44 frames. ], batch size: 54, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:31:28,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=504112.3333333333, ans=0.125 2024-09-16 21:31:33,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=504140.6666666667, ans=12.0 2024-09-16 21:31:50,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=504169.0, ans=0.125 2024-09-16 21:31:53,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504169.0, ans=0.1 2024-09-16 21:31:55,718 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-16 21:32:04,142 INFO [train.py:1198] (0/2) Epoch 28, batch 5400, loss[loss=0.1802, ctc_loss=0.1159, cr_loss=0.3219, over 20294.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1524, cr_loss=0.3753, over 4094170.65 frames. ], batch size: 45, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:32:38,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=504254.0, ans=10.0 2024-09-16 21:32:48,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=504282.3333333333, ans=0.0 2024-09-16 21:32:49,832 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.092e+02 2.222e+02 2.408e+02 7.406e+02, threshold=4.444e+02, percent-clipped=2.0 2024-09-16 21:32:54,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=504282.3333333333, ans=0.0 2024-09-16 21:33:18,121 INFO [train.py:1198] (0/2) Epoch 28, batch 5450, loss[loss=0.1839, ctc_loss=0.1222, cr_loss=0.3086, over 20979.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1526, cr_loss=0.3753, over 4082660.56 frames. ], batch size: 51, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:33:27,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=504339.0, ans=0.125 2024-09-16 21:34:32,139 INFO [train.py:1198] (0/2) Epoch 28, batch 5500, loss[loss=0.2747, ctc_loss=0.1875, cr_loss=0.4356, over 20343.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1528, cr_loss=0.3755, over 4084955.91 frames. ], batch size: 74, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:34:38,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=504480.6666666667, ans=0.02 2024-09-16 21:34:43,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=504480.6666666667, ans=0.0 2024-09-16 21:34:45,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=504509.0, ans=0.0 2024-09-16 21:35:18,224 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.105e+02 2.280e+02 2.480e+02 5.090e+02, threshold=4.561e+02, percent-clipped=1.0 2024-09-16 21:35:24,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=504565.6666666667, ans=0.0 2024-09-16 21:35:43,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2024-09-16 21:35:45,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=504622.3333333333, ans=0.0 2024-09-16 21:35:47,202 INFO [train.py:1198] (0/2) Epoch 28, batch 5550, loss[loss=0.2401, ctc_loss=0.1623, cr_loss=0.389, over 21046.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1526, cr_loss=0.3748, over 4095301.98 frames. ], batch size: 62, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:36:04,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=504650.6666666667, ans=0.1 2024-09-16 21:36:25,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=504679.0, ans=0.0 2024-09-16 21:36:30,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504679.0, ans=0.1 2024-09-16 21:36:57,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=504735.6666666667, ans=0.125 2024-09-16 21:37:04,277 INFO [train.py:1198] (0/2) Epoch 28, batch 5600, loss[loss=0.1989, ctc_loss=0.13, cr_loss=0.3444, over 20967.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1523, cr_loss=0.3756, over 4100207.48 frames. ], batch size: 48, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:37:34,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=504792.3333333333, ans=0.5 2024-09-16 21:37:41,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=504820.6666666667, ans=0.025 2024-09-16 21:37:44,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=504820.6666666667, ans=0.09899494936611666 2024-09-16 21:37:53,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.146e+02 2.252e+02 2.387e+02 5.267e+02, threshold=4.505e+02, percent-clipped=1.0 2024-09-16 21:37:58,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=22.5 2024-09-16 21:38:11,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=504877.3333333333, ans=0.125 2024-09-16 21:38:21,303 INFO [train.py:1198] (0/2) Epoch 28, batch 5650, loss[loss=0.2027, ctc_loss=0.1338, cr_loss=0.3448, over 21059.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1521, cr_loss=0.3753, over 4106597.47 frames. ], batch size: 53, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:38:26,533 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=12.0 2024-09-16 21:38:36,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=504934.0, ans=0.125 2024-09-16 21:38:40,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=504934.0, ans=0.0 2024-09-16 21:38:50,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-16 21:38:53,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=504962.3333333333, ans=15.0 2024-09-16 21:39:19,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=22.5 2024-09-16 21:39:24,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=505019.0, ans=0.2 2024-09-16 21:39:34,638 INFO [train.py:1198] (0/2) Epoch 28, batch 5700, loss[loss=0.2524, ctc_loss=0.169, cr_loss=0.4171, over 20858.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1533, cr_loss=0.3776, over 4094729.05 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:39:39,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=505047.3333333333, ans=0.125 2024-09-16 21:39:49,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=505075.6666666667, ans=0.0 2024-09-16 21:40:07,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=505104.0, ans=0.025 2024-09-16 21:40:20,585 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.140e+02 2.238e+02 2.342e+02 2.999e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-16 21:40:41,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=505160.6666666667, ans=0.125 2024-09-16 21:40:48,546 INFO [train.py:1198] (0/2) Epoch 28, batch 5750, loss[loss=0.1983, ctc_loss=0.1304, cr_loss=0.3395, over 20998.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1528, cr_loss=0.3771, over 4093141.75 frames. ], batch size: 51, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:41:17,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=505245.6666666667, ans=0.2 2024-09-16 21:41:17,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=505245.6666666667, ans=0.125 2024-09-16 21:41:22,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2024-09-16 21:41:48,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=505302.3333333333, ans=0.125 2024-09-16 21:41:50,562 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-09-16 21:42:03,453 INFO [train.py:1198] (0/2) Epoch 28, batch 5800, loss[loss=0.2362, ctc_loss=0.1558, cr_loss=0.4021, over 20865.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1531, cr_loss=0.3777, over 4096446.40 frames. ], batch size: 65, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:42:09,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=505330.6666666667, ans=0.125 2024-09-16 21:42:15,800 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:42:30,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=505359.0, ans=0.1 2024-09-16 21:42:48,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=22.5 2024-09-16 21:42:51,182 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.134e+02 2.272e+02 2.544e+02 5.505e+02, threshold=4.543e+02, percent-clipped=1.0 2024-09-16 21:43:18,143 INFO [train.py:1198] (0/2) Epoch 28, batch 5850, loss[loss=0.202, ctc_loss=0.1326, cr_loss=0.3471, over 21071.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1527, cr_loss=0.377, over 4100863.93 frames. ], batch size: 53, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:43:27,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=505472.3333333333, ans=0.2 2024-09-16 21:44:05,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=505557.3333333333, ans=0.0 2024-09-16 21:44:20,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=505585.6666666667, ans=0.2 2024-09-16 21:44:32,280 INFO [train.py:1198] (0/2) Epoch 28, batch 5900, loss[loss=0.2172, ctc_loss=0.1457, cr_loss=0.3576, over 20974.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1529, cr_loss=0.3773, over 4094797.50 frames. ], batch size: 55, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:44:45,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=505614.0, ans=0.0 2024-09-16 21:44:54,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=505642.3333333333, ans=0.0 2024-09-16 21:44:55,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2024-09-16 21:45:00,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=505642.3333333333, ans=0.125 2024-09-16 21:45:20,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=505699.0, ans=0.125 2024-09-16 21:45:22,690 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.117e+02 2.265e+02 2.412e+02 3.728e+02, threshold=4.529e+02, percent-clipped=0.0 2024-09-16 21:45:27,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=505699.0, ans=0.125 2024-09-16 21:45:31,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=505699.0, ans=0.125 2024-09-16 21:45:49,403 INFO [train.py:1198] (0/2) Epoch 28, batch 5950, loss[loss=0.2134, ctc_loss=0.1429, cr_loss=0.3525, over 20913.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1523, cr_loss=0.3759, over 4087846.54 frames. ], batch size: 54, lr: 2.94e-03, grad_scale: 16.0 2024-09-16 21:45:51,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=505755.6666666667, ans=0.0 2024-09-16 21:46:01,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-16 21:46:11,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=505784.0, ans=0.0 2024-09-16 21:46:53,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=505869.0, ans=0.0 2024-09-16 21:47:06,096 INFO [train.py:1198] (0/2) Epoch 28, batch 6000, loss[loss=0.1986, ctc_loss=0.1305, cr_loss=0.3403, over 21055.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1525, cr_loss=0.3757, over 4088292.17 frames. ], batch size: 53, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:47:06,097 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 21:47:18,921 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4695, 4.4478, 3.4029, 3.9927], device='cuda:0') 2024-09-16 21:47:29,408 INFO [train.py:1230] (0/2) Epoch 28, validation: loss=0.04197, ctc_loss=0.04197, cr_loss=1.216e-14, over 944034.00 frames. 2024-09-16 21:47:29,409 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 21:48:00,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=505954.0, ans=0.125 2024-09-16 21:48:17,106 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.116e+02 2.217e+02 2.360e+02 3.028e+02, threshold=4.434e+02, percent-clipped=0.0 2024-09-16 21:48:32,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506010.6666666667, ans=0.1 2024-09-16 21:48:44,064 INFO [train.py:1198] (0/2) Epoch 28, batch 6050, loss[loss=0.2599, ctc_loss=0.1763, cr_loss=0.4183, over 20838.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1525, cr_loss=0.3757, over 4095928.57 frames. ], batch size: 65, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:49:24,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=506095.6666666667, ans=0.2 2024-09-16 21:49:46,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=506152.3333333333, ans=0.125 2024-09-16 21:49:59,972 INFO [train.py:1198] (0/2) Epoch 28, batch 6100, loss[loss=0.2297, ctc_loss=0.1552, cr_loss=0.3728, over 20286.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1523, cr_loss=0.3755, over 4093686.05 frames. ], batch size: 74, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:50:24,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=506209.0, ans=0.125 2024-09-16 21:50:47,881 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.156e+02 2.328e+02 2.488e+02 3.502e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-16 21:51:14,449 INFO [train.py:1198] (0/2) Epoch 28, batch 6150, loss[loss=0.2039, ctc_loss=0.1361, cr_loss=0.3387, over 20984.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1518, cr_loss=0.3747, over 4092244.60 frames. ], batch size: 52, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:52:16,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=506435.6666666667, ans=0.1 2024-09-16 21:52:29,176 INFO [train.py:1198] (0/2) Epoch 28, batch 6200, loss[loss=0.209, ctc_loss=0.1368, cr_loss=0.361, over 20950.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1511, cr_loss=0.3728, over 4080757.93 frames. ], batch size: 50, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:52:56,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=506492.3333333333, ans=0.125 2024-09-16 21:53:15,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=506549.0, ans=0.125 2024-09-16 21:53:16,148 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.160e+02 2.342e+02 2.500e+02 5.895e+02, threshold=4.684e+02, percent-clipped=1.0 2024-09-16 21:53:19,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=506549.0, ans=0.125 2024-09-16 21:53:33,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506577.3333333333, ans=0.1 2024-09-16 21:53:43,202 INFO [train.py:1198] (0/2) Epoch 28, batch 6250, loss[loss=0.2158, ctc_loss=0.1396, cr_loss=0.381, over 20355.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1522, cr_loss=0.3738, over 4055089.54 frames. ], batch size: 45, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:54:30,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=506690.6666666667, ans=0.125 2024-09-16 21:54:52,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=506719.0, ans=0.0 2024-09-16 21:54:57,938 INFO [train.py:1198] (0/2) Epoch 28, batch 6300, loss[loss=0.2564, ctc_loss=0.18, cr_loss=0.3819, over 14375.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1545, cr_loss=0.376, over 3999337.90 frames. ], batch size: 150, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:55:25,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=506804.0, ans=0.025 2024-09-16 21:55:27,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=506804.0, ans=0.09899494936611666 2024-09-16 21:55:33,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=506804.0, ans=0.025 2024-09-16 21:55:44,319 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.280e+02 2.444e+02 2.673e+02 4.130e+02, threshold=4.889e+02, percent-clipped=0.0 2024-09-16 21:56:08,857 INFO [train.py:1198] (0/2) Epoch 28, batch 6350, loss[loss=0.2967, ctc_loss=0.2048, cr_loss=0.4593, over 18008.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.16, cr_loss=0.3811, over 3856789.79 frames. ], batch size: 108, lr: 2.94e-03, grad_scale: 32.0 2024-09-16 21:56:21,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=506889.0, ans=0.2 2024-09-16 21:56:49,395 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-16 21:56:53,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=506974.0, ans=0.0 2024-09-16 21:57:00,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.55 vs. limit=12.0 2024-09-16 21:57:06,888 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-28.pt 2024-09-16 21:57:56,070 INFO [train.py:1198] (0/2) Epoch 29, batch 0, loss[loss=0.2506, ctc_loss=0.1695, cr_loss=0.4051, over 19994.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1695, cr_loss=0.4051, over 19994.00 frames. ], batch size: 80, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 21:57:56,071 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 21:58:14,217 INFO [train.py:1230] (0/2) Epoch 29, validation: loss=0.04134, ctc_loss=0.04134, cr_loss=1.227e-14, over 944034.00 frames. 2024-09-16 21:58:14,218 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 21:59:18,789 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.224e+02 2.498e+02 2.716e+02 3.407e+02, threshold=4.995e+02, percent-clipped=0.0 2024-09-16 21:59:32,277 INFO [train.py:1198] (0/2) Epoch 29, batch 50, loss[loss=0.2292, ctc_loss=0.149, cr_loss=0.4009, over 20933.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1506, cr_loss=0.372, over 932994.09 frames. ], batch size: 60, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 21:59:38,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=507146.8333333333, ans=0.0 2024-09-16 21:59:58,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=507175.1666666667, ans=0.125 2024-09-16 21:59:59,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=507175.1666666667, ans=0.125 2024-09-16 22:00:11,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507203.5, ans=0.1 2024-09-16 22:00:29,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=507231.8333333333, ans=0.0 2024-09-16 22:00:47,527 INFO [train.py:1198] (0/2) Epoch 29, batch 100, loss[loss=0.2134, ctc_loss=0.1422, cr_loss=0.3563, over 20780.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1513, cr_loss=0.3754, over 1644770.54 frames. ], batch size: 53, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:01:52,036 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.132e+02 2.252e+02 2.426e+02 2.986e+02, threshold=4.505e+02, percent-clipped=0.0 2024-09-16 22:02:02,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507401.8333333333, ans=0.1 2024-09-16 22:02:05,611 INFO [train.py:1198] (0/2) Epoch 29, batch 150, loss[loss=0.1979, ctc_loss=0.1295, cr_loss=0.3419, over 21070.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1513, cr_loss=0.3746, over 2178733.16 frames. ], batch size: 53, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:02:13,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=12.0 2024-09-16 22:02:53,142 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-09-16 22:03:02,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507515.1666666667, ans=0.1 2024-09-16 22:03:21,560 INFO [train.py:1198] (0/2) Epoch 29, batch 200, loss[loss=0.2254, ctc_loss=0.1494, cr_loss=0.3803, over 21087.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1508, cr_loss=0.3746, over 2611680.81 frames. ], batch size: 59, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:03:23,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-09-16 22:03:33,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=507571.8333333333, ans=0.2 2024-09-16 22:03:33,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=507571.8333333333, ans=0.125 2024-09-16 22:03:54,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=507628.5, ans=0.125 2024-09-16 22:04:16,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=507656.8333333333, ans=0.125 2024-09-16 22:04:22,790 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-09-16 22:04:23,491 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.105e+02 2.221e+02 2.379e+02 4.827e+02, threshold=4.441e+02, percent-clipped=1.0 2024-09-16 22:04:38,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-16 22:04:40,409 INFO [train.py:1198] (0/2) Epoch 29, batch 250, loss[loss=0.261, ctc_loss=0.1802, cr_loss=0.4039, over 14170.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1514, cr_loss=0.3755, over 2933679.47 frames. ], batch size: 150, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:04:50,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.74 vs. limit=10.0 2024-09-16 22:05:06,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=507741.8333333333, ans=0.125 2024-09-16 22:05:11,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507770.1666666667, ans=0.1 2024-09-16 22:05:17,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=507770.1666666667, ans=0.125 2024-09-16 22:05:23,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=507770.1666666667, ans=0.0 2024-09-16 22:05:36,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.16 vs. limit=15.0 2024-09-16 22:05:37,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=507798.5, ans=0.125 2024-09-16 22:05:56,760 INFO [train.py:1198] (0/2) Epoch 29, batch 300, loss[loss=0.2232, ctc_loss=0.1491, cr_loss=0.3706, over 20975.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1507, cr_loss=0.374, over 3195502.74 frames. ], batch size: 55, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:06:01,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=507855.1666666667, ans=0.125 2024-09-16 22:06:03,661 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-16 22:06:19,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=507883.5, ans=0.125 2024-09-16 22:06:20,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=507883.5, ans=0.125 2024-09-16 22:06:44,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=507940.1666666667, ans=0.125 2024-09-16 22:06:58,731 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.127e+02 2.263e+02 2.443e+02 4.033e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-16 22:07:15,521 INFO [train.py:1198] (0/2) Epoch 29, batch 350, loss[loss=0.2438, ctc_loss=0.1706, cr_loss=0.3662, over 14187.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1514, cr_loss=0.3757, over 3380682.09 frames. ], batch size: 149, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:07:31,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=508025.1666666667, ans=0.0 2024-09-16 22:07:47,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=508053.5, ans=0.2 2024-09-16 22:08:05,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=508081.8333333333, ans=0.125 2024-09-16 22:08:24,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=508110.1666666667, ans=0.0 2024-09-16 22:08:30,458 INFO [train.py:1198] (0/2) Epoch 29, batch 400, loss[loss=0.2277, ctc_loss=0.1532, cr_loss=0.3727, over 21037.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1518, cr_loss=0.3753, over 3523258.40 frames. ], batch size: 62, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:08:33,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=508138.5, ans=0.1 2024-09-16 22:08:43,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=508138.5, ans=0.0 2024-09-16 22:08:48,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=508166.8333333333, ans=0.0 2024-09-16 22:09:31,819 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.124e+02 2.239e+02 2.409e+02 3.301e+02, threshold=4.479e+02, percent-clipped=0.0 2024-09-16 22:09:45,127 INFO [train.py:1198] (0/2) Epoch 29, batch 450, loss[loss=0.2696, ctc_loss=0.1865, cr_loss=0.4156, over 19348.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1522, cr_loss=0.3755, over 3642873.08 frames. ], batch size: 90, lr: 2.88e-03, grad_scale: 32.0 2024-09-16 22:10:18,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=508336.8333333333, ans=0.125 2024-09-16 22:10:27,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=508336.8333333333, ans=0.0 2024-09-16 22:10:29,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-09-16 22:10:33,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=508365.1666666667, ans=0.0 2024-09-16 22:10:43,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.52 vs. limit=6.0 2024-09-16 22:10:47,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=508393.5, ans=0.125 2024-09-16 22:11:04,183 INFO [train.py:1198] (0/2) Epoch 29, batch 500, loss[loss=0.1979, ctc_loss=0.131, cr_loss=0.3346, over 20352.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1505, cr_loss=0.3735, over 3758115.69 frames. ], batch size: 45, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:11:39,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=508478.5, ans=0.025 2024-09-16 22:12:07,617 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.095e+02 2.179e+02 2.359e+02 3.125e+02, threshold=4.358e+02, percent-clipped=0.0 2024-09-16 22:12:14,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.37 vs. limit=15.0 2024-09-16 22:12:19,826 INFO [train.py:1198] (0/2) Epoch 29, batch 550, loss[loss=0.2007, ctc_loss=0.1335, cr_loss=0.3355, over 20954.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1502, cr_loss=0.3734, over 3836426.40 frames. ], batch size: 49, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:13:14,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=508648.5, ans=15.0 2024-09-16 22:13:38,285 INFO [train.py:1198] (0/2) Epoch 29, batch 600, loss[loss=0.2553, ctc_loss=0.1739, cr_loss=0.4067, over 20091.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1517, cr_loss=0.376, over 3893358.60 frames. ], batch size: 80, lr: 2.88e-03, grad_scale: 8.0 2024-09-16 22:13:43,547 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-16 22:14:11,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=508761.8333333333, ans=0.0 2024-09-16 22:14:43,770 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.148e+02 2.274e+02 2.476e+02 4.327e+02, threshold=4.547e+02, percent-clipped=0.0 2024-09-16 22:14:54,326 INFO [train.py:1198] (0/2) Epoch 29, batch 650, loss[loss=0.2242, ctc_loss=0.1513, cr_loss=0.3643, over 21054.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1516, cr_loss=0.3761, over 3935742.71 frames. ], batch size: 59, lr: 2.88e-03, grad_scale: 8.0 2024-09-16 22:15:03,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=508846.8333333333, ans=0.07 2024-09-16 22:15:19,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=508875.1666666667, ans=0.125 2024-09-16 22:16:13,543 INFO [train.py:1198] (0/2) Epoch 29, batch 700, loss[loss=0.1936, ctc_loss=0.1272, cr_loss=0.3322, over 20960.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1516, cr_loss=0.3762, over 3986468.12 frames. ], batch size: 49, lr: 2.88e-03, grad_scale: 8.0 2024-09-16 22:16:48,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=509045.1666666667, ans=0.2 2024-09-16 22:16:51,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=509045.1666666667, ans=0.2 2024-09-16 22:17:02,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2024-09-16 22:17:18,176 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.151e+02 2.230e+02 2.410e+02 6.479e+02, threshold=4.459e+02, percent-clipped=1.0 2024-09-16 22:17:28,602 INFO [train.py:1198] (0/2) Epoch 29, batch 750, loss[loss=0.2224, ctc_loss=0.1486, cr_loss=0.3692, over 20885.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1524, cr_loss=0.3769, over 4006729.20 frames. ], batch size: 57, lr: 2.88e-03, grad_scale: 8.0 2024-09-16 22:17:33,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=509130.1666666667, ans=0.0 2024-09-16 22:18:03,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=509186.8333333333, ans=0.125 2024-09-16 22:18:13,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.70 vs. limit=10.0 2024-09-16 22:18:15,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=509215.1666666667, ans=0.0 2024-09-16 22:18:20,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=509215.1666666667, ans=0.0 2024-09-16 22:18:23,570 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.79 vs. limit=15.0 2024-09-16 22:18:41,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=509243.5, ans=0.125 2024-09-16 22:18:47,021 INFO [train.py:1198] (0/2) Epoch 29, batch 800, loss[loss=0.1695, ctc_loss=0.11, cr_loss=0.2977, over 20337.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1522, cr_loss=0.376, over 4026496.85 frames. ], batch size: 45, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:18:50,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=509271.8333333333, ans=0.04949747468305833 2024-09-16 22:18:51,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=509271.8333333333, ans=0.0 2024-09-16 22:18:59,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=509271.8333333333, ans=0.125 2024-09-16 22:19:08,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=509300.1666666667, ans=0.0 2024-09-16 22:19:21,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=509328.5, ans=0.0 2024-09-16 22:19:35,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=509356.8333333333, ans=0.125 2024-09-16 22:19:51,636 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.118e+02 2.231e+02 2.446e+02 6.172e+02, threshold=4.462e+02, percent-clipped=1.0 2024-09-16 22:19:55,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=509385.1666666667, ans=0.125 2024-09-16 22:20:02,340 INFO [train.py:1198] (0/2) Epoch 29, batch 850, loss[loss=0.2316, ctc_loss=0.1544, cr_loss=0.3857, over 20961.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1515, cr_loss=0.3752, over 4053751.42 frames. ], batch size: 55, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:20:59,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=509498.5, ans=0.125 2024-09-16 22:21:03,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=509526.8333333333, ans=0.125 2024-09-16 22:21:05,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=509526.8333333333, ans=0.0 2024-09-16 22:21:18,316 INFO [train.py:1198] (0/2) Epoch 29, batch 900, loss[loss=0.1975, ctc_loss=0.1306, cr_loss=0.3345, over 21001.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1509, cr_loss=0.3736, over 4064895.32 frames. ], batch size: 48, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:22:04,005 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-16 22:22:18,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=509640.1666666667, ans=0.0 2024-09-16 22:22:25,847 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.150e+02 2.291e+02 2.446e+02 6.186e+02, threshold=4.582e+02, percent-clipped=1.0 2024-09-16 22:22:36,655 INFO [train.py:1198] (0/2) Epoch 29, batch 950, loss[loss=0.257, ctc_loss=0.1748, cr_loss=0.4111, over 20280.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1514, cr_loss=0.3748, over 4080334.00 frames. ], batch size: 74, lr: 2.88e-03, grad_scale: 16.0 2024-09-16 22:22:44,424 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:23:02,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509725.1666666667, ans=0.1 2024-09-16 22:23:12,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=509753.5, ans=0.125 2024-09-16 22:23:35,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=509810.1666666667, ans=0.125 2024-09-16 22:23:42,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-16 22:23:51,747 INFO [train.py:1198] (0/2) Epoch 29, batch 1000, loss[loss=0.2322, ctc_loss=0.1587, cr_loss=0.3675, over 20687.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1512, cr_loss=0.3747, over 4092496.63 frames. ], batch size: 66, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:23:56,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=509838.5, ans=0.125 2024-09-16 22:24:29,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=509895.1666666667, ans=0.0 2024-09-16 22:24:35,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509895.1666666667, ans=0.1 2024-09-16 22:24:46,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=509923.5, ans=0.025 2024-09-16 22:24:59,245 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.126e+02 2.274e+02 2.456e+02 3.874e+02, threshold=4.547e+02, percent-clipped=0.0 2024-09-16 22:25:09,908 INFO [train.py:1198] (0/2) Epoch 29, batch 1050, loss[loss=0.2447, ctc_loss=0.1646, cr_loss=0.4001, over 20664.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1513, cr_loss=0.3744, over 4085034.57 frames. ], batch size: 71, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:25:17,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=509980.1666666667, ans=0.2 2024-09-16 22:25:19,072 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-180000.pt 2024-09-16 22:25:35,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=510008.5, ans=0.025 2024-09-16 22:25:51,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=510036.8333333333, ans=0.1 2024-09-16 22:25:57,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=510065.1666666667, ans=0.125 2024-09-16 22:26:25,406 INFO [train.py:1198] (0/2) Epoch 29, batch 1100, loss[loss=0.2574, ctc_loss=0.1755, cr_loss=0.4098, over 20737.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1509, cr_loss=0.3742, over 4087870.50 frames. ], batch size: 71, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:26:40,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=510150.1666666667, ans=0.125 2024-09-16 22:26:45,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=510150.1666666667, ans=0.2 2024-09-16 22:26:54,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=510178.5, ans=0.1 2024-09-16 22:27:33,480 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.109e+02 2.245e+02 2.404e+02 3.046e+02, threshold=4.489e+02, percent-clipped=0.0 2024-09-16 22:27:34,233 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2024-09-16 22:27:40,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=12.0 2024-09-16 22:27:43,748 INFO [train.py:1198] (0/2) Epoch 29, batch 1150, loss[loss=0.2553, ctc_loss=0.17, cr_loss=0.4264, over 20627.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1514, cr_loss=0.3752, over 4103970.61 frames. ], batch size: 68, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:27:44,905 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2024-09-16 22:28:23,861 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:28:43,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=510376.8333333333, ans=0.125 2024-09-16 22:28:59,928 INFO [train.py:1198] (0/2) Epoch 29, batch 1200, loss[loss=0.204, ctc_loss=0.1375, cr_loss=0.3325, over 20894.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1508, cr_loss=0.3743, over 4112793.77 frames. ], batch size: 54, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:29:06,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=510405.1666666667, ans=0.0 2024-09-16 22:29:18,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510433.5, ans=0.1 2024-09-16 22:29:47,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510490.1666666667, ans=0.1 2024-09-16 22:29:53,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=510490.1666666667, ans=0.2 2024-09-16 22:30:05,123 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.088e+02 2.280e+02 2.493e+02 4.353e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-16 22:30:18,912 INFO [train.py:1198] (0/2) Epoch 29, batch 1250, loss[loss=0.2189, ctc_loss=0.1455, cr_loss=0.3666, over 20987.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1513, cr_loss=0.3744, over 4109885.32 frames. ], batch size: 51, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:30:33,002 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:31:34,811 INFO [train.py:1198] (0/2) Epoch 29, batch 1300, loss[loss=0.238, ctc_loss=0.1604, cr_loss=0.3879, over 20670.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1518, cr_loss=0.375, over 4098457.96 frames. ], batch size: 66, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:32:05,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=510745.1666666667, ans=0.125 2024-09-16 22:32:21,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-09-16 22:32:40,107 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.148e+02 2.225e+02 2.364e+02 3.598e+02, threshold=4.449e+02, percent-clipped=0.0 2024-09-16 22:32:50,745 INFO [train.py:1198] (0/2) Epoch 29, batch 1350, loss[loss=0.266, ctc_loss=0.1885, cr_loss=0.3875, over 14446.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1517, cr_loss=0.3748, over 4089795.00 frames. ], batch size: 149, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:34:05,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=510943.5, ans=0.125 2024-09-16 22:34:10,122 INFO [train.py:1198] (0/2) Epoch 29, batch 1400, loss[loss=0.2352, ctc_loss=0.1588, cr_loss=0.3819, over 21000.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1518, cr_loss=0.3749, over 4079681.21 frames. ], batch size: 63, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:34:22,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=510971.8333333333, ans=0.025 2024-09-16 22:35:04,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=511056.8333333333, ans=0.125 2024-09-16 22:35:14,918 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.124e+02 2.250e+02 2.355e+02 2.864e+02, threshold=4.501e+02, percent-clipped=0.0 2024-09-16 22:35:25,618 INFO [train.py:1198] (0/2) Epoch 29, batch 1450, loss[loss=0.1986, ctc_loss=0.1337, cr_loss=0.325, over 20956.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1518, cr_loss=0.3748, over 4086921.45 frames. ], batch size: 49, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:35:56,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=511141.8333333333, ans=0.1 2024-09-16 22:36:01,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=511170.1666666667, ans=0.0 2024-09-16 22:36:44,972 INFO [train.py:1198] (0/2) Epoch 29, batch 1500, loss[loss=0.2585, ctc_loss=0.1732, cr_loss=0.4264, over 20976.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1516, cr_loss=0.3747, over 4088693.84 frames. ], batch size: 58, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:37:11,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=511283.5, ans=0.07 2024-09-16 22:37:50,281 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.140e+02 2.286e+02 2.435e+02 3.535e+02, threshold=4.573e+02, percent-clipped=0.0 2024-09-16 22:37:55,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2024-09-16 22:38:00,876 INFO [train.py:1198] (0/2) Epoch 29, batch 1550, loss[loss=0.181, ctc_loss=0.1188, cr_loss=0.3107, over 21017.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1509, cr_loss=0.3732, over 4083574.76 frames. ], batch size: 52, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:38:02,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=511396.8333333333, ans=0.125 2024-09-16 22:38:37,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=511453.5, ans=0.125 2024-09-16 22:38:48,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2024-09-16 22:38:48,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=22.5 2024-09-16 22:38:57,639 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=12.0 2024-09-16 22:38:58,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=511481.8333333333, ans=0.0 2024-09-16 22:39:07,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=511510.1666666667, ans=0.125 2024-09-16 22:39:10,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=511510.1666666667, ans=0.05 2024-09-16 22:39:19,500 INFO [train.py:1198] (0/2) Epoch 29, batch 1600, loss[loss=0.2246, ctc_loss=0.148, cr_loss=0.3827, over 21044.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1515, cr_loss=0.3739, over 4081822.47 frames. ], batch size: 62, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:40:11,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=511623.5, ans=0.125 2024-09-16 22:40:25,052 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.124e+02 2.264e+02 2.371e+02 3.682e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-16 22:40:35,605 INFO [train.py:1198] (0/2) Epoch 29, batch 1650, loss[loss=0.2362, ctc_loss=0.1589, cr_loss=0.3865, over 21025.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.152, cr_loss=0.3751, over 4081703.35 frames. ], batch size: 61, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:40:52,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=511708.5, ans=0.0 2024-09-16 22:41:53,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=8.0 2024-09-16 22:41:53,686 INFO [train.py:1198] (0/2) Epoch 29, batch 1700, loss[loss=0.2149, ctc_loss=0.1449, cr_loss=0.35, over 20792.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1514, cr_loss=0.3741, over 4088784.85 frames. ], batch size: 53, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:42:01,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=511821.8333333333, ans=0.0 2024-09-16 22:42:41,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=511906.8333333333, ans=0.2 2024-09-16 22:42:57,141 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-09-16 22:43:00,650 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.153e+02 2.296e+02 2.463e+02 4.622e+02, threshold=4.592e+02, percent-clipped=1.0 2024-09-16 22:43:09,787 INFO [train.py:1198] (0/2) Epoch 29, batch 1750, loss[loss=0.234, ctc_loss=0.1608, cr_loss=0.3658, over 19369.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1503, cr_loss=0.3723, over 4098384.97 frames. ], batch size: 90, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:43:11,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=511963.5, ans=0.1 2024-09-16 22:43:26,961 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:43:54,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=512048.5, ans=0.1 2024-09-16 22:43:57,643 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=22.5 2024-09-16 22:44:03,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=512048.5, ans=0.2 2024-09-16 22:44:09,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=512076.8333333333, ans=0.0 2024-09-16 22:44:28,574 INFO [train.py:1198] (0/2) Epoch 29, batch 1800, loss[loss=0.171, ctc_loss=0.1123, cr_loss=0.2937, over 19974.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1504, cr_loss=0.3723, over 4100775.36 frames. ], batch size: 44, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:44:28,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=512105.1666666667, ans=0.035 2024-09-16 22:45:02,072 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:45:08,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=512161.8333333333, ans=0.125 2024-09-16 22:45:09,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2024-09-16 22:45:26,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=512190.1666666667, ans=0.0 2024-09-16 22:45:28,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=512218.5, ans=0.125 2024-09-16 22:45:35,700 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.152e+02 2.333e+02 2.463e+02 3.061e+02, threshold=4.666e+02, percent-clipped=0.0 2024-09-16 22:45:44,761 INFO [train.py:1198] (0/2) Epoch 29, batch 1850, loss[loss=0.2645, ctc_loss=0.1808, cr_loss=0.4182, over 20656.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1502, cr_loss=0.3722, over 4106540.58 frames. ], batch size: 66, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:46:30,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=512331.8333333333, ans=0.125 2024-09-16 22:46:47,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=512360.1666666667, ans=0.125 2024-09-16 22:47:00,275 INFO [train.py:1198] (0/2) Epoch 29, batch 1900, loss[loss=0.2094, ctc_loss=0.1371, cr_loss=0.3611, over 20785.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1499, cr_loss=0.3719, over 4103554.73 frames. ], batch size: 53, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:47:26,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=512416.8333333333, ans=0.2 2024-09-16 22:47:32,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=512445.1666666667, ans=0.125 2024-09-16 22:47:43,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-09-16 22:47:44,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=512445.1666666667, ans=0.025 2024-09-16 22:48:00,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=512473.5, ans=0.1 2024-09-16 22:48:09,950 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.091e+02 2.187e+02 2.363e+02 3.962e+02, threshold=4.374e+02, percent-clipped=0.0 2024-09-16 22:48:19,383 INFO [train.py:1198] (0/2) Epoch 29, batch 1950, loss[loss=0.1813, ctc_loss=0.1184, cr_loss=0.3147, over 20959.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1492, cr_loss=0.3712, over 4116740.92 frames. ], batch size: 51, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:48:46,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-16 22:48:57,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.70 vs. limit=5.0 2024-09-16 22:49:07,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2024-09-16 22:49:29,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=512643.5, ans=0.125 2024-09-16 22:49:33,600 INFO [train.py:1198] (0/2) Epoch 29, batch 2000, loss[loss=0.2416, ctc_loss=0.1625, cr_loss=0.3956, over 21055.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1498, cr_loss=0.3719, over 4121431.29 frames. ], batch size: 62, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:50:01,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-16 22:50:02,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=512728.5, ans=0.125 2024-09-16 22:50:13,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=512728.5, ans=0.0 2024-09-16 22:50:25,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=512756.8333333333, ans=0.0 2024-09-16 22:50:28,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=512756.8333333333, ans=0.0 2024-09-16 22:50:32,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512756.8333333333, ans=0.1 2024-09-16 22:50:34,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512756.8333333333, ans=0.1 2024-09-16 22:50:43,120 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.165e+02 2.291e+02 2.433e+02 4.983e+02, threshold=4.582e+02, percent-clipped=1.0 2024-09-16 22:50:52,256 INFO [train.py:1198] (0/2) Epoch 29, batch 2050, loss[loss=0.2555, ctc_loss=0.1702, cr_loss=0.4265, over 20129.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1506, cr_loss=0.3736, over 4114777.05 frames. ], batch size: 80, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:50:57,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=512813.5, ans=0.0 2024-09-16 22:51:29,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=512870.1666666667, ans=0.0 2024-09-16 22:51:39,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=512898.5, ans=0.0 2024-09-16 22:51:42,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=512898.5, ans=0.125 2024-09-16 22:51:45,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=512898.5, ans=0.125 2024-09-16 22:51:51,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=512926.8333333333, ans=0.035 2024-09-16 22:52:05,344 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-09-16 22:52:07,565 INFO [train.py:1198] (0/2) Epoch 29, batch 2100, loss[loss=0.2451, ctc_loss=0.1647, cr_loss=0.4021, over 21033.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1515, cr_loss=0.3745, over 4114214.20 frames. ], batch size: 63, lr: 2.87e-03, grad_scale: 32.0 2024-09-16 22:52:08,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-09-16 22:52:13,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=512955.1666666667, ans=0.125 2024-09-16 22:52:51,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=513040.1666666667, ans=0.125 2024-09-16 22:52:54,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=513040.1666666667, ans=0.0 2024-09-16 22:53:13,147 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-16 22:53:18,493 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.144e+02 2.259e+02 2.385e+02 3.448e+02, threshold=4.518e+02, percent-clipped=0.0 2024-09-16 22:53:18,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=513068.5, ans=0.0 2024-09-16 22:53:21,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=513068.5, ans=0.0 2024-09-16 22:53:26,127 INFO [train.py:1198] (0/2) Epoch 29, batch 2150, loss[loss=0.202, ctc_loss=0.1352, cr_loss=0.334, over 19923.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1513, cr_loss=0.3738, over 4105545.20 frames. ], batch size: 44, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:53:52,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=513125.1666666667, ans=0.0 2024-09-16 22:54:01,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=513153.5, ans=0.125 2024-09-16 22:54:41,708 INFO [train.py:1198] (0/2) Epoch 29, batch 2200, loss[loss=0.2668, ctc_loss=0.1896, cr_loss=0.386, over 14316.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1518, cr_loss=0.3752, over 4097608.13 frames. ], batch size: 149, lr: 2.87e-03, grad_scale: 16.0 2024-09-16 22:55:13,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=513295.1666666667, ans=0.125 2024-09-16 22:55:17,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2024-09-16 22:55:49,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.100e+02 2.240e+02 2.383e+02 5.972e+02, threshold=4.480e+02, percent-clipped=1.0 2024-09-16 22:55:55,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513380.1666666667, ans=0.1 2024-09-16 22:55:57,136 INFO [train.py:1198] (0/2) Epoch 29, batch 2250, loss[loss=0.2305, ctc_loss=0.1535, cr_loss=0.385, over 21023.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1509, cr_loss=0.374, over 4110660.59 frames. ], batch size: 63, lr: 2.86e-03, grad_scale: 16.0 2024-09-16 22:56:11,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=513380.1666666667, ans=0.0 2024-09-16 22:56:41,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=513436.8333333333, ans=0.0 2024-09-16 22:56:44,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=513465.1666666667, ans=0.125 2024-09-16 22:57:03,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=513493.5, ans=0.125 2024-09-16 22:57:15,359 INFO [train.py:1198] (0/2) Epoch 29, batch 2300, loss[loss=0.2029, ctc_loss=0.1324, cr_loss=0.3527, over 20778.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1517, cr_loss=0.3755, over 4113353.49 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 16.0 2024-09-16 22:57:24,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=513521.8333333333, ans=0.2 2024-09-16 22:57:34,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=513550.1666666667, ans=0.125 2024-09-16 22:57:36,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=513550.1666666667, ans=0.0 2024-09-16 22:57:47,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=513578.5, ans=0.125 2024-09-16 22:57:53,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=513578.5, ans=0.2 2024-09-16 22:58:08,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513606.8333333333, ans=0.1 2024-09-16 22:58:23,296 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.103e+02 2.248e+02 2.400e+02 4.352e+02, threshold=4.496e+02, percent-clipped=0.0 2024-09-16 22:58:30,665 INFO [train.py:1198] (0/2) Epoch 29, batch 2350, loss[loss=0.2232, ctc_loss=0.1515, cr_loss=0.3588, over 21059.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1511, cr_loss=0.3744, over 4104375.29 frames. ], batch size: 59, lr: 2.86e-03, grad_scale: 16.0 2024-09-16 22:58:32,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=513663.5, ans=0.95 2024-09-16 22:59:00,002 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2024-09-16 22:59:33,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=513776.8333333333, ans=0.025 2024-09-16 22:59:50,417 INFO [train.py:1198] (0/2) Epoch 29, batch 2400, loss[loss=0.1893, ctc_loss=0.1228, cr_loss=0.3326, over 20932.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1517, cr_loss=0.3752, over 4104700.16 frames. ], batch size: 49, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 22:59:54,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=22.5 2024-09-16 23:00:22,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=513861.8333333333, ans=0.0 2024-09-16 23:00:25,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=513861.8333333333, ans=0.125 2024-09-16 23:00:26,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=513861.8333333333, ans=0.0 2024-09-16 23:00:30,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=513861.8333333333, ans=0.0 2024-09-16 23:00:58,548 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.122e+02 2.266e+02 2.430e+02 2.916e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-16 23:01:06,240 INFO [train.py:1198] (0/2) Epoch 29, batch 2450, loss[loss=0.2027, ctc_loss=0.1365, cr_loss=0.3307, over 20806.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.152, cr_loss=0.3757, over 4104458.97 frames. ], batch size: 53, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:01:11,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=513946.8333333333, ans=0.0 2024-09-16 23:01:40,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=514003.5, ans=0.125 2024-09-16 23:01:40,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=514003.5, ans=0.025 2024-09-16 23:02:25,163 INFO [train.py:1198] (0/2) Epoch 29, batch 2500, loss[loss=0.2202, ctc_loss=0.1449, cr_loss=0.3765, over 20868.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1514, cr_loss=0.3744, over 4107850.88 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:02:32,158 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-16 23:03:28,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-09-16 23:03:33,828 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.136e+02 2.264e+02 2.407e+02 3.369e+02, threshold=4.528e+02, percent-clipped=0.0 2024-09-16 23:03:41,234 INFO [train.py:1198] (0/2) Epoch 29, batch 2550, loss[loss=0.2452, ctc_loss=0.1617, cr_loss=0.4174, over 21008.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1506, cr_loss=0.3737, over 4116843.95 frames. ], batch size: 63, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:04:04,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=514258.5, ans=0.1 2024-09-16 23:04:13,827 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:04:30,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=514315.1666666667, ans=0.1 2024-09-16 23:04:53,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=514343.5, ans=0.125 2024-09-16 23:05:00,621 INFO [train.py:1198] (0/2) Epoch 29, batch 2600, loss[loss=0.2016, ctc_loss=0.1331, cr_loss=0.3425, over 20968.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1505, cr_loss=0.3734, over 4120049.12 frames. ], batch size: 52, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:05:14,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-09-16 23:06:04,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=514485.1666666667, ans=0.125 2024-09-16 23:06:08,637 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.142e+02 2.284e+02 2.431e+02 3.588e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-16 23:06:09,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.21 vs. limit=6.0 2024-09-16 23:06:16,099 INFO [train.py:1198] (0/2) Epoch 29, batch 2650, loss[loss=0.2022, ctc_loss=0.1331, cr_loss=0.3456, over 20800.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1508, cr_loss=0.3745, over 4119923.84 frames. ], batch size: 53, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:06:57,907 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2024-09-16 23:07:00,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-16 23:07:31,844 INFO [train.py:1198] (0/2) Epoch 29, batch 2700, loss[loss=0.2147, ctc_loss=0.1427, cr_loss=0.3599, over 21051.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1505, cr_loss=0.3743, over 4118485.13 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:07:41,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=514655.1666666667, ans=0.125 2024-09-16 23:07:44,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=514655.1666666667, ans=0.2 2024-09-16 23:08:10,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=514711.8333333333, ans=0.0 2024-09-16 23:08:43,124 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.099e+02 2.261e+02 2.472e+02 3.844e+02, threshold=4.522e+02, percent-clipped=0.0 2024-09-16 23:08:45,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=514768.5, ans=0.04949747468305833 2024-09-16 23:08:50,585 INFO [train.py:1198] (0/2) Epoch 29, batch 2750, loss[loss=0.238, ctc_loss=0.1551, cr_loss=0.4146, over 20934.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1504, cr_loss=0.3739, over 4126573.84 frames. ], batch size: 60, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:08:52,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=514796.8333333333, ans=0.125 2024-09-16 23:09:14,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=514825.1666666667, ans=0.2 2024-09-16 23:09:42,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-16 23:10:06,417 INFO [train.py:1198] (0/2) Epoch 29, batch 2800, loss[loss=0.2293, ctc_loss=0.1525, cr_loss=0.3843, over 21070.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1508, cr_loss=0.3748, over 4126135.96 frames. ], batch size: 59, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:10:49,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=22.5 2024-09-16 23:11:13,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=515051.8333333333, ans=0.125 2024-09-16 23:11:17,581 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.106e+02 2.230e+02 2.380e+02 2.857e+02, threshold=4.459e+02, percent-clipped=0.0 2024-09-16 23:11:22,766 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-09-16 23:11:24,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-16 23:11:25,035 INFO [train.py:1198] (0/2) Epoch 29, batch 2850, loss[loss=0.2025, ctc_loss=0.1328, cr_loss=0.3487, over 20780.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1507, cr_loss=0.3744, over 4124672.48 frames. ], batch size: 56, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:11:58,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=515136.8333333333, ans=0.0 2024-09-16 23:12:26,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=515193.5, ans=0.125 2024-09-16 23:12:26,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=515193.5, ans=0.125 2024-09-16 23:12:30,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=515193.5, ans=0.0 2024-09-16 23:12:32,738 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.25 vs. limit=15.0 2024-09-16 23:12:33,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=515193.5, ans=0.125 2024-09-16 23:12:40,967 INFO [train.py:1198] (0/2) Epoch 29, batch 2900, loss[loss=0.2428, ctc_loss=0.1621, cr_loss=0.4038, over 20669.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1517, cr_loss=0.3759, over 4121233.73 frames. ], batch size: 68, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:13:06,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=515250.1666666667, ans=0.125 2024-09-16 23:13:21,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=515278.5, ans=0.0 2024-09-16 23:13:47,726 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:13:51,775 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.125e+02 2.234e+02 2.447e+02 6.615e+02, threshold=4.467e+02, percent-clipped=1.0 2024-09-16 23:13:59,380 INFO [train.py:1198] (0/2) Epoch 29, batch 2950, loss[loss=0.2339, ctc_loss=0.156, cr_loss=0.3892, over 20986.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1522, cr_loss=0.3768, over 4116003.24 frames. ], batch size: 55, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:14:01,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-16 23:14:19,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=515391.8333333333, ans=0.0 2024-09-16 23:14:19,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-16 23:14:28,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2024-09-16 23:15:15,110 INFO [train.py:1198] (0/2) Epoch 29, batch 3000, loss[loss=0.1857, ctc_loss=0.1222, cr_loss=0.3174, over 20985.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1523, cr_loss=0.3771, over 4110703.83 frames. ], batch size: 48, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:15:15,111 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 23:15:38,420 INFO [train.py:1230] (0/2) Epoch 29, validation: loss=0.04172, ctc_loss=0.04172, cr_loss=1.228e-14, over 944034.00 frames. 2024-09-16 23:15:38,421 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-16 23:15:44,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=515505.1666666667, ans=0.125 2024-09-16 23:15:47,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=515505.1666666667, ans=0.2 2024-09-16 23:15:53,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=515533.5, ans=0.025 2024-09-16 23:16:18,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=22.5 2024-09-16 23:16:49,648 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.142e+02 2.276e+02 2.439e+02 4.356e+02, threshold=4.552e+02, percent-clipped=0.0 2024-09-16 23:16:57,198 INFO [train.py:1198] (0/2) Epoch 29, batch 3050, loss[loss=0.202, ctc_loss=0.1334, cr_loss=0.3431, over 20787.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1516, cr_loss=0.3755, over 4109700.99 frames. ], batch size: 53, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:17:02,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=515646.8333333333, ans=0.1 2024-09-16 23:17:32,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=515703.5, ans=10.0 2024-09-16 23:17:37,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=515703.5, ans=0.125 2024-09-16 23:18:11,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=515788.5, ans=0.2 2024-09-16 23:18:12,672 INFO [train.py:1198] (0/2) Epoch 29, batch 3100, loss[loss=0.2736, ctc_loss=0.1861, cr_loss=0.4372, over 18278.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1517, cr_loss=0.3755, over 4105598.16 frames. ], batch size: 108, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:18:14,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=515788.5, ans=0.125 2024-09-16 23:18:16,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=515788.5, ans=0.125 2024-09-16 23:18:41,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=515845.1666666667, ans=0.0 2024-09-16 23:18:50,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515845.1666666667, ans=0.1 2024-09-16 23:18:59,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=515873.5, ans=0.035 2024-09-16 23:19:04,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=515873.5, ans=0.0 2024-09-16 23:19:23,842 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.105e+02 2.305e+02 2.411e+02 4.093e+02, threshold=4.611e+02, percent-clipped=0.0 2024-09-16 23:19:31,336 INFO [train.py:1198] (0/2) Epoch 29, batch 3150, loss[loss=0.2102, ctc_loss=0.1411, cr_loss=0.3456, over 20934.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1517, cr_loss=0.3754, over 4105994.53 frames. ], batch size: 60, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:19:39,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515930.1666666667, ans=0.1 2024-09-16 23:20:12,689 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:20:38,813 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-16 23:20:46,901 INFO [train.py:1198] (0/2) Epoch 29, batch 3200, loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3667, over 21011.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1518, cr_loss=0.3754, over 4106236.61 frames. ], batch size: 61, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:20:56,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=516071.8333333333, ans=0.2 2024-09-16 23:21:09,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=516100.1666666667, ans=0.025 2024-09-16 23:21:11,865 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.63 vs. limit=10.0 2024-09-16 23:21:55,239 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.106e+02 2.233e+02 2.457e+02 3.336e+02, threshold=4.466e+02, percent-clipped=0.0 2024-09-16 23:21:58,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-09-16 23:22:05,454 INFO [train.py:1198] (0/2) Epoch 29, batch 3250, loss[loss=0.2208, ctc_loss=0.1481, cr_loss=0.3635, over 20993.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.3761, over 4091231.92 frames. ], batch size: 52, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:22:05,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=516213.5, ans=0.0 2024-09-16 23:22:13,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=516213.5, ans=0.125 2024-09-16 23:22:18,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.85 vs. limit=10.0 2024-09-16 23:22:19,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516241.8333333333, ans=0.1 2024-09-16 23:23:11,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-09-16 23:23:21,202 INFO [train.py:1198] (0/2) Epoch 29, batch 3300, loss[loss=0.2312, ctc_loss=0.1507, cr_loss=0.4026, over 20985.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3762, over 4088117.91 frames. ], batch size: 55, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:23:36,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=516383.5, ans=0.125 2024-09-16 23:23:42,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=516383.5, ans=0.125 2024-09-16 23:23:59,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=516411.8333333333, ans=0.2 2024-09-16 23:24:03,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=516411.8333333333, ans=0.125 2024-09-16 23:24:31,854 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.126e+02 2.291e+02 2.446e+02 4.467e+02, threshold=4.582e+02, percent-clipped=1.0 2024-09-16 23:24:39,313 INFO [train.py:1198] (0/2) Epoch 29, batch 3350, loss[loss=0.2299, ctc_loss=0.1549, cr_loss=0.375, over 20974.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1533, cr_loss=0.3764, over 4076219.64 frames. ], batch size: 55, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:25:06,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=516525.1666666667, ans=0.125 2024-09-16 23:25:26,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=516581.8333333333, ans=0.125 2024-09-16 23:25:33,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-09-16 23:25:42,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=516610.1666666667, ans=0.0 2024-09-16 23:25:50,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2024-09-16 23:25:53,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=516638.5, ans=0.0 2024-09-16 23:25:54,417 INFO [train.py:1198] (0/2) Epoch 29, batch 3400, loss[loss=0.2353, ctc_loss=0.1561, cr_loss=0.3963, over 20625.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1531, cr_loss=0.3768, over 4086618.91 frames. ], batch size: 66, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:26:13,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=516666.8333333333, ans=0.2 2024-09-16 23:26:17,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=516666.8333333333, ans=0.125 2024-09-16 23:26:35,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=516695.1666666667, ans=0.0 2024-09-16 23:26:37,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=516695.1666666667, ans=0.125 2024-09-16 23:26:46,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=516723.5, ans=0.025 2024-09-16 23:26:46,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-09-16 23:27:02,692 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.118e+02 2.242e+02 2.428e+02 3.600e+02, threshold=4.484e+02, percent-clipped=0.0 2024-09-16 23:27:10,397 INFO [train.py:1198] (0/2) Epoch 29, batch 3450, loss[loss=0.2473, ctc_loss=0.1651, cr_loss=0.411, over 20725.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1538, cr_loss=0.3776, over 4088896.93 frames. ], batch size: 66, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:27:16,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516780.1666666667, ans=0.1 2024-09-16 23:27:36,564 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:28:13,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=516893.5, ans=0.125 2024-09-16 23:28:28,947 INFO [train.py:1198] (0/2) Epoch 29, batch 3500, loss[loss=0.2037, ctc_loss=0.1341, cr_loss=0.3479, over 21000.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1529, cr_loss=0.3765, over 4083343.21 frames. ], batch size: 52, lr: 2.86e-03, grad_scale: 32.0 2024-09-16 23:29:37,195 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.182e+02 2.322e+02 2.506e+02 4.001e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-16 23:29:44,933 INFO [train.py:1198] (0/2) Epoch 29, batch 3550, loss[loss=0.25, ctc_loss=0.1687, cr_loss=0.4063, over 20858.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1531, cr_loss=0.377, over 4076334.30 frames. ], batch size: 65, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:30:09,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.95 vs. limit=10.0 2024-09-16 23:30:21,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=517120.1666666667, ans=0.1 2024-09-16 23:30:27,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=517120.1666666667, ans=0.025 2024-09-16 23:30:36,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=517148.5, ans=0.2 2024-09-16 23:31:02,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=517205.1666666667, ans=0.125 2024-09-16 23:31:03,714 INFO [train.py:1198] (0/2) Epoch 29, batch 3600, loss[loss=0.1907, ctc_loss=0.1262, cr_loss=0.3226, over 21061.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1526, cr_loss=0.3762, over 4087283.41 frames. ], batch size: 53, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:31:22,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=517233.5, ans=0.0 2024-09-16 23:31:51,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=517290.1666666667, ans=0.125 2024-09-16 23:32:11,473 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.134e+02 2.235e+02 2.407e+02 2.856e+02, threshold=4.470e+02, percent-clipped=0.0 2024-09-16 23:32:19,280 INFO [train.py:1198] (0/2) Epoch 29, batch 3650, loss[loss=0.2371, ctc_loss=0.1609, cr_loss=0.3812, over 20731.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1522, cr_loss=0.3757, over 4096302.03 frames. ], batch size: 71, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:32:19,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=517346.8333333333, ans=0.125 2024-09-16 23:32:50,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=517403.5, ans=0.125 2024-09-16 23:32:56,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=517403.5, ans=0.1 2024-09-16 23:33:02,244 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:33:05,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=517431.8333333333, ans=0.0 2024-09-16 23:33:22,185 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2024-09-16 23:33:37,826 INFO [train.py:1198] (0/2) Epoch 29, batch 3700, loss[loss=0.2327, ctc_loss=0.1553, cr_loss=0.3871, over 20870.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1521, cr_loss=0.3765, over 4100841.00 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:34:02,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=517516.8333333333, ans=0.125 2024-09-16 23:34:23,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=517573.5, ans=0.0 2024-09-16 23:34:30,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=517573.5, ans=0.2 2024-09-16 23:34:45,237 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.137e+02 2.276e+02 2.424e+02 2.892e+02, threshold=4.552e+02, percent-clipped=0.0 2024-09-16 23:34:53,009 INFO [train.py:1198] (0/2) Epoch 29, batch 3750, loss[loss=0.2273, ctc_loss=0.1513, cr_loss=0.3801, over 20954.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1521, cr_loss=0.3762, over 4104504.76 frames. ], batch size: 58, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:34:56,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=517630.1666666667, ans=0.5 2024-09-16 23:34:59,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=517630.1666666667, ans=0.125 2024-09-16 23:35:13,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=517658.5, ans=0.125 2024-09-16 23:35:13,670 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-16 23:35:57,649 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:36:05,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=517743.5, ans=0.125 2024-09-16 23:36:12,603 INFO [train.py:1198] (0/2) Epoch 29, batch 3800, loss[loss=0.2139, ctc_loss=0.1422, cr_loss=0.3587, over 21002.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1507, cr_loss=0.373, over 4108162.48 frames. ], batch size: 63, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:36:17,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=517771.8333333333, ans=0.0 2024-09-16 23:36:26,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=517800.1666666667, ans=0.0 2024-09-16 23:36:29,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=517800.1666666667, ans=0.0 2024-09-16 23:36:49,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.93 vs. limit=10.0 2024-09-16 23:37:20,676 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.153e+02 2.283e+02 2.431e+02 3.165e+02, threshold=4.566e+02, percent-clipped=0.0 2024-09-16 23:37:28,115 INFO [train.py:1198] (0/2) Epoch 29, batch 3850, loss[loss=0.2275, ctc_loss=0.1546, cr_loss=0.3647, over 21016.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1515, cr_loss=0.374, over 4103527.96 frames. ], batch size: 61, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:37:34,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=517913.5, ans=0.025 2024-09-16 23:37:49,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=517941.8333333333, ans=0.125 2024-09-16 23:38:09,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=517970.1666666667, ans=0.04949747468305833 2024-09-16 23:38:19,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=517998.5, ans=0.0 2024-09-16 23:38:33,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=518026.8333333333, ans=0.125 2024-09-16 23:38:37,917 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:38:41,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=518026.8333333333, ans=0.0 2024-09-16 23:38:43,641 INFO [train.py:1198] (0/2) Epoch 29, batch 3900, loss[loss=0.2148, ctc_loss=0.1409, cr_loss=0.3696, over 20942.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1525, cr_loss=0.3756, over 4103008.70 frames. ], batch size: 58, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:39:09,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=518083.5, ans=0.05 2024-09-16 23:39:09,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=518083.5, ans=0.125 2024-09-16 23:39:22,232 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.85 vs. limit=10.0 2024-09-16 23:39:45,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=518168.5, ans=0.0 2024-09-16 23:39:51,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=518168.5, ans=0.125 2024-09-16 23:39:54,596 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.123e+02 2.282e+02 2.483e+02 3.210e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-16 23:40:02,119 INFO [train.py:1198] (0/2) Epoch 29, batch 3950, loss[loss=0.2426, ctc_loss=0.165, cr_loss=0.388, over 20866.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1516, cr_loss=0.3746, over 4105603.72 frames. ], batch size: 65, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:40:02,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=518196.8333333333, ans=0.125 2024-09-16 23:40:12,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=518196.8333333333, ans=10.0 2024-09-16 23:40:50,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=518281.8333333333, ans=0.0 2024-09-16 23:40:55,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=518281.8333333333, ans=0.95 2024-09-16 23:41:03,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=518310.1666666667, ans=0.125 2024-09-16 23:41:17,414 INFO [train.py:1198] (0/2) Epoch 29, batch 4000, loss[loss=0.2189, ctc_loss=0.1437, cr_loss=0.3757, over 21070.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1517, cr_loss=0.3752, over 4103981.26 frames. ], batch size: 56, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:41:17,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=518338.5, ans=0.2 2024-09-16 23:41:51,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518395.1666666667, ans=0.1 2024-09-16 23:42:08,026 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-16 23:42:09,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=518423.5, ans=0.0 2024-09-16 23:42:26,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518451.8333333333, ans=0.1 2024-09-16 23:42:27,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=518451.8333333333, ans=0.125 2024-09-16 23:42:28,724 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.114e+02 2.276e+02 2.427e+02 3.959e+02, threshold=4.553e+02, percent-clipped=0.0 2024-09-16 23:42:36,497 INFO [train.py:1198] (0/2) Epoch 29, batch 4050, loss[loss=0.2181, ctc_loss=0.1483, cr_loss=0.3491, over 21064.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1511, cr_loss=0.3744, over 4104148.64 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:42:52,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=518508.5, ans=0.1 2024-09-16 23:42:58,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=518508.5, ans=0.125 2024-09-16 23:43:37,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=518593.5, ans=0.025 2024-09-16 23:43:51,988 INFO [train.py:1198] (0/2) Epoch 29, batch 4100, loss[loss=0.2069, ctc_loss=0.1332, cr_loss=0.3688, over 21069.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1519, cr_loss=0.3758, over 4098028.57 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:44:09,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=518650.1666666667, ans=0.1 2024-09-16 23:44:14,144 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=12.0 2024-09-16 23:44:29,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=518678.5, ans=0.04949747468305833 2024-09-16 23:44:33,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=518678.5, ans=0.2 2024-09-16 23:44:43,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.38 vs. limit=6.0 2024-09-16 23:45:03,422 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.153e+02 2.279e+02 2.457e+02 3.517e+02, threshold=4.558e+02, percent-clipped=0.0 2024-09-16 23:45:11,028 INFO [train.py:1198] (0/2) Epoch 29, batch 4150, loss[loss=0.2798, ctc_loss=0.199, cr_loss=0.404, over 14182.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.152, cr_loss=0.376, over 4098084.04 frames. ], batch size: 150, lr: 2.85e-03, grad_scale: 64.0 2024-09-16 23:45:48,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=518820.1666666667, ans=0.125 2024-09-16 23:46:07,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=518848.5, ans=0.0 2024-09-16 23:46:26,148 INFO [train.py:1198] (0/2) Epoch 29, batch 4200, loss[loss=0.2364, ctc_loss=0.1606, cr_loss=0.3792, over 20948.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1516, cr_loss=0.3755, over 4099706.20 frames. ], batch size: 60, lr: 2.85e-03, grad_scale: 64.0 2024-09-16 23:46:38,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2024-09-16 23:46:44,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=518933.5, ans=0.125 2024-09-16 23:47:14,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-16 23:47:15,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=518990.1666666667, ans=0.1 2024-09-16 23:47:38,065 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.098e+02 2.247e+02 2.454e+02 3.348e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-16 23:47:44,081 INFO [train.py:1198] (0/2) Epoch 29, batch 4250, loss[loss=0.2247, ctc_loss=0.1465, cr_loss=0.3905, over 21053.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1512, cr_loss=0.3753, over 4107000.02 frames. ], batch size: 53, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:47:53,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=519046.8333333333, ans=0.2 2024-09-16 23:48:03,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-16 23:48:07,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=519075.1666666667, ans=0.2 2024-09-16 23:48:23,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=519103.5, ans=0.2 2024-09-16 23:48:51,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=519160.1666666667, ans=0.1 2024-09-16 23:49:00,247 INFO [train.py:1198] (0/2) Epoch 29, batch 4300, loss[loss=0.2925, ctc_loss=0.2079, cr_loss=0.423, over 14436.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.3751, over 4109662.00 frames. ], batch size: 149, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:49:54,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=22.5 2024-09-16 23:50:09,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.142e+02 2.329e+02 2.478e+02 4.020e+02, threshold=4.659e+02, percent-clipped=0.0 2024-09-16 23:50:16,039 INFO [train.py:1198] (0/2) Epoch 29, batch 4350, loss[loss=0.2492, ctc_loss=0.1669, cr_loss=0.4116, over 20943.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1512, cr_loss=0.375, over 4109674.97 frames. ], batch size: 60, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:50:34,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=519358.5, ans=0.04949747468305833 2024-09-16 23:50:43,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=519358.5, ans=0.125 2024-09-16 23:50:47,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519386.8333333333, ans=0.1 2024-09-16 23:50:49,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=519386.8333333333, ans=0.125 2024-09-16 23:51:17,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.29 vs. limit=22.5 2024-09-16 23:51:34,124 INFO [train.py:1198] (0/2) Epoch 29, batch 4400, loss[loss=0.2365, ctc_loss=0.1599, cr_loss=0.3834, over 21043.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.151, cr_loss=0.3746, over 4122034.39 frames. ], batch size: 62, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:51:42,247 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.23 vs. limit=10.0 2024-09-16 23:52:04,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=519528.5, ans=0.0 2024-09-16 23:52:14,501 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.54 vs. limit=15.0 2024-09-16 23:52:16,061 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-16 23:52:43,678 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.151e+02 2.295e+02 2.423e+02 3.130e+02, threshold=4.591e+02, percent-clipped=0.0 2024-09-16 23:52:49,909 INFO [train.py:1198] (0/2) Epoch 29, batch 4450, loss[loss=0.1957, ctc_loss=0.1265, cr_loss=0.3459, over 19944.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.151, cr_loss=0.3743, over 4120769.56 frames. ], batch size: 44, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:52:52,173 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-09-16 23:53:25,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=22.5 2024-09-16 23:53:27,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=519670.1666666667, ans=0.125 2024-09-16 23:53:49,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=519698.5, ans=0.1 2024-09-16 23:54:07,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=519755.1666666667, ans=15.0 2024-09-16 23:54:08,502 INFO [train.py:1198] (0/2) Epoch 29, batch 4500, loss[loss=0.2238, ctc_loss=0.1514, cr_loss=0.362, over 21017.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1513, cr_loss=0.3749, over 4126084.25 frames. ], batch size: 63, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:54:16,487 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:54:27,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=22.5 2024-09-16 23:54:31,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=519783.5, ans=0.2 2024-09-16 23:55:17,800 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.140e+02 2.237e+02 2.446e+02 4.193e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-16 23:55:23,795 INFO [train.py:1198] (0/2) Epoch 29, batch 4550, loss[loss=0.2386, ctc_loss=0.1586, cr_loss=0.3997, over 20779.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1517, cr_loss=0.3753, over 4118876.01 frames. ], batch size: 56, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:55:37,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=519925.1666666667, ans=0.125 2024-09-16 23:55:46,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=519925.1666666667, ans=0.025 2024-09-16 23:55:49,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=519925.1666666667, ans=0.125 2024-09-16 23:56:16,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=519981.8333333333, ans=0.2 2024-09-16 23:56:20,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=519981.8333333333, ans=0.125 2024-09-16 23:56:32,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=520010.1666666667, ans=0.125 2024-09-16 23:56:39,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=15.0 2024-09-16 23:56:42,779 INFO [train.py:1198] (0/2) Epoch 29, batch 4600, loss[loss=0.2325, ctc_loss=0.1571, cr_loss=0.3766, over 20698.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1519, cr_loss=0.3759, over 4113026.79 frames. ], batch size: 68, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:56:52,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=520038.5, ans=0.2 2024-09-16 23:56:58,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=520066.8333333333, ans=0.125 2024-09-16 23:57:14,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=520095.1666666667, ans=0.125 2024-09-16 23:57:27,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=520123.5, ans=0.0 2024-09-16 23:57:52,114 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.114e+02 2.234e+02 2.448e+02 7.926e+02, threshold=4.469e+02, percent-clipped=3.0 2024-09-16 23:57:58,241 INFO [train.py:1198] (0/2) Epoch 29, batch 4650, loss[loss=0.1873, ctc_loss=0.1217, cr_loss=0.3277, over 20962.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1521, cr_loss=0.3756, over 4096055.65 frames. ], batch size: 49, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:58:01,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520180.1666666667, ans=0.1 2024-09-16 23:58:15,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=520208.5, ans=0.1 2024-09-16 23:59:11,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=520293.5, ans=0.125 2024-09-16 23:59:17,121 INFO [train.py:1198] (0/2) Epoch 29, batch 4700, loss[loss=0.1932, ctc_loss=0.1258, cr_loss=0.3365, over 20941.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1513, cr_loss=0.374, over 4098105.13 frames. ], batch size: 50, lr: 2.85e-03, grad_scale: 32.0 2024-09-16 23:59:35,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=520350.1666666667, ans=0.125 2024-09-17 00:00:08,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=520406.8333333333, ans=0.0 2024-09-17 00:00:26,130 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.182e+02 2.312e+02 2.519e+02 3.478e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-17 00:00:28,237 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:00:32,317 INFO [train.py:1198] (0/2) Epoch 29, batch 4750, loss[loss=0.2406, ctc_loss=0.1653, cr_loss=0.3763, over 20740.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1518, cr_loss=0.3749, over 4101623.62 frames. ], batch size: 71, lr: 2.85e-03, grad_scale: 32.0 2024-09-17 00:01:01,487 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.55 vs. limit=10.0 2024-09-17 00:01:19,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=520548.5, ans=0.0 2024-09-17 00:01:31,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-09-17 00:01:47,724 INFO [train.py:1198] (0/2) Epoch 29, batch 4800, loss[loss=0.228, ctc_loss=0.151, cr_loss=0.3849, over 21056.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.152, cr_loss=0.3747, over 4079735.73 frames. ], batch size: 59, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:01:59,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=520605.1666666667, ans=0.125 2024-09-17 00:02:16,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=520633.5, ans=0.125 2024-09-17 00:03:00,846 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.174e+02 2.313e+02 2.580e+02 4.357e+02, threshold=4.627e+02, percent-clipped=0.0 2024-09-17 00:03:07,006 INFO [train.py:1198] (0/2) Epoch 29, batch 4850, loss[loss=0.2713, ctc_loss=0.1818, cr_loss=0.4473, over 20844.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1512, cr_loss=0.3738, over 4084001.44 frames. ], batch size: 65, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:03:26,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=22.5 2024-09-17 00:03:35,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-09-17 00:03:56,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=520831.8333333333, ans=0.0 2024-09-17 00:03:59,566 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:04:20,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=520860.1666666667, ans=0.125 2024-09-17 00:04:26,493 INFO [train.py:1198] (0/2) Epoch 29, batch 4900, loss[loss=0.2559, ctc_loss=0.1731, cr_loss=0.4136, over 20663.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1513, cr_loss=0.3739, over 4067809.77 frames. ], batch size: 66, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:04:49,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=520916.8333333333, ans=0.125 2024-09-17 00:05:02,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=520945.1666666667, ans=0.125 2024-09-17 00:05:02,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=520945.1666666667, ans=0.07 2024-09-17 00:05:35,350 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.144e+02 2.276e+02 2.506e+02 4.473e+02, threshold=4.551e+02, percent-clipped=0.0 2024-09-17 00:05:41,340 INFO [train.py:1198] (0/2) Epoch 29, batch 4950, loss[loss=0.211, ctc_loss=0.1379, cr_loss=0.3655, over 20899.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1519, cr_loss=0.3748, over 4065833.06 frames. ], batch size: 54, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:05:43,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=521030.1666666667, ans=0.025 2024-09-17 00:06:08,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=521058.5, ans=10.0 2024-09-17 00:06:35,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=521115.1666666667, ans=0.125 2024-09-17 00:06:56,288 INFO [train.py:1198] (0/2) Epoch 29, batch 5000, loss[loss=0.2123, ctc_loss=0.1395, cr_loss=0.3642, over 21066.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.152, cr_loss=0.3751, over 4081361.66 frames. ], batch size: 59, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:07:07,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.51 vs. limit=22.5 2024-09-17 00:07:26,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=521228.5, ans=0.125 2024-09-17 00:07:32,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=521228.5, ans=0.0 2024-09-17 00:08:02,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.06 vs. limit=10.0 2024-09-17 00:08:04,573 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.104e+02 2.201e+02 2.382e+02 2.853e+02, threshold=4.401e+02, percent-clipped=0.0 2024-09-17 00:08:09,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521313.5, ans=0.1 2024-09-17 00:08:10,561 INFO [train.py:1198] (0/2) Epoch 29, batch 5050, loss[loss=0.2235, ctc_loss=0.1481, cr_loss=0.3767, over 20794.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1527, cr_loss=0.3757, over 4070229.85 frames. ], batch size: 53, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:08:19,804 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-184000.pt 2024-09-17 00:09:02,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=521398.5, ans=0.2 2024-09-17 00:09:10,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=521426.8333333333, ans=0.125 2024-09-17 00:09:23,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-17 00:09:25,727 INFO [train.py:1198] (0/2) Epoch 29, batch 5100, loss[loss=0.2325, ctc_loss=0.1527, cr_loss=0.399, over 20886.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1523, cr_loss=0.375, over 4074049.84 frames. ], batch size: 54, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:09:30,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=521455.1666666667, ans=0.125 2024-09-17 00:09:36,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521455.1666666667, ans=0.1 2024-09-17 00:09:43,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=521483.5, ans=0.125 2024-09-17 00:10:22,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=521540.1666666667, ans=0.015 2024-09-17 00:10:37,322 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.158e+02 2.245e+02 2.459e+02 4.128e+02, threshold=4.490e+02, percent-clipped=0.0 2024-09-17 00:10:43,253 INFO [train.py:1198] (0/2) Epoch 29, batch 5150, loss[loss=0.2313, ctc_loss=0.1546, cr_loss=0.3831, over 20671.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1516, cr_loss=0.3745, over 4084072.58 frames. ], batch size: 68, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:10:58,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=521625.1666666667, ans=0.0 2024-09-17 00:11:14,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=521653.5, ans=0.025 2024-09-17 00:11:27,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=521681.8333333333, ans=0.2 2024-09-17 00:11:38,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=521681.8333333333, ans=0.125 2024-09-17 00:11:46,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=521710.1666666667, ans=0.025 2024-09-17 00:11:57,463 INFO [train.py:1198] (0/2) Epoch 29, batch 5200, loss[loss=0.2429, ctc_loss=0.1648, cr_loss=0.3903, over 21020.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1513, cr_loss=0.3737, over 4078245.37 frames. ], batch size: 61, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:12:06,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521738.5, ans=0.1 2024-09-17 00:12:09,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521738.5, ans=0.1 2024-09-17 00:12:17,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=521766.8333333333, ans=0.125 2024-09-17 00:12:17,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=521766.8333333333, ans=0.125 2024-09-17 00:12:54,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521823.5, ans=0.1 2024-09-17 00:13:05,920 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.155e+02 2.285e+02 2.429e+02 3.077e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-17 00:13:11,964 INFO [train.py:1198] (0/2) Epoch 29, batch 5250, loss[loss=0.1833, ctc_loss=0.1224, cr_loss=0.3045, over 20958.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.152, cr_loss=0.3752, over 4064521.75 frames. ], batch size: 50, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:13:18,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=521880.1666666667, ans=0.125 2024-09-17 00:13:23,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2024-09-17 00:13:33,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=521908.5, ans=0.0 2024-09-17 00:14:05,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=521965.1666666667, ans=0.1 2024-09-17 00:14:28,966 INFO [train.py:1198] (0/2) Epoch 29, batch 5300, loss[loss=0.2177, ctc_loss=0.1446, cr_loss=0.3658, over 20948.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1523, cr_loss=0.3759, over 4070810.26 frames. ], batch size: 60, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:14:34,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=22.5 2024-09-17 00:14:47,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=522050.1666666667, ans=0.125 2024-09-17 00:14:59,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2024-09-17 00:15:37,333 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.107e+02 2.241e+02 2.435e+02 4.969e+02, threshold=4.483e+02, percent-clipped=1.0 2024-09-17 00:15:37,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=522135.1666666667, ans=0.0 2024-09-17 00:15:42,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=522163.5, ans=0.125 2024-09-17 00:15:43,128 INFO [train.py:1198] (0/2) Epoch 29, batch 5350, loss[loss=0.2377, ctc_loss=0.1588, cr_loss=0.3944, over 19887.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1523, cr_loss=0.376, over 4077256.62 frames. ], batch size: 80, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:15:49,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=522163.5, ans=0.0 2024-09-17 00:15:57,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=522191.8333333333, ans=0.2 2024-09-17 00:16:23,472 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=15.0 2024-09-17 00:16:36,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=522248.5, ans=15.0 2024-09-17 00:16:40,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=522248.5, ans=0.0 2024-09-17 00:16:58,064 INFO [train.py:1198] (0/2) Epoch 29, batch 5400, loss[loss=0.2016, ctc_loss=0.1313, cr_loss=0.3515, over 20971.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1521, cr_loss=0.3759, over 4082118.35 frames. ], batch size: 48, lr: 2.84e-03, grad_scale: 16.0 2024-09-17 00:17:05,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=522305.1666666667, ans=0.0 2024-09-17 00:17:08,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=522305.1666666667, ans=0.125 2024-09-17 00:17:22,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=522333.5, ans=0.0 2024-09-17 00:17:24,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-09-17 00:17:37,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-17 00:18:08,460 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.138e+02 2.259e+02 2.409e+02 3.323e+02, threshold=4.517e+02, percent-clipped=0.0 2024-09-17 00:18:13,180 INFO [train.py:1198] (0/2) Epoch 29, batch 5450, loss[loss=0.2634, ctc_loss=0.1791, cr_loss=0.4217, over 20824.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1512, cr_loss=0.3747, over 4096363.20 frames. ], batch size: 59, lr: 2.84e-03, grad_scale: 16.0 2024-09-17 00:18:13,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=522446.8333333333, ans=0.125 2024-09-17 00:18:30,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=522475.1666666667, ans=0.2 2024-09-17 00:18:40,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2024-09-17 00:18:53,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522503.5, ans=0.1 2024-09-17 00:19:30,228 INFO [train.py:1198] (0/2) Epoch 29, batch 5500, loss[loss=0.2366, ctc_loss=0.1556, cr_loss=0.4051, over 20675.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1506, cr_loss=0.3742, over 4097954.79 frames. ], batch size: 71, lr: 2.84e-03, grad_scale: 16.0 2024-09-17 00:19:35,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=522588.5, ans=0.025 2024-09-17 00:19:47,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=522616.8333333333, ans=15.0 2024-09-17 00:20:14,583 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.03 vs. limit=10.0 2024-09-17 00:20:40,012 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.163e+02 2.287e+02 2.452e+02 3.728e+02, threshold=4.574e+02, percent-clipped=0.0 2024-09-17 00:20:44,594 INFO [train.py:1198] (0/2) Epoch 29, batch 5550, loss[loss=0.2061, ctc_loss=0.1348, cr_loss=0.3565, over 20943.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1499, cr_loss=0.373, over 4097999.07 frames. ], batch size: 48, lr: 2.84e-03, grad_scale: 16.0 2024-09-17 00:21:35,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=522815.1666666667, ans=0.125 2024-09-17 00:21:58,886 INFO [train.py:1198] (0/2) Epoch 29, batch 5600, loss[loss=0.2031, ctc_loss=0.1325, cr_loss=0.3532, over 20925.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1499, cr_loss=0.3725, over 4088424.56 frames. ], batch size: 48, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:22:01,036 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.60 vs. limit=10.0 2024-09-17 00:22:05,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=522871.8333333333, ans=0.125 2024-09-17 00:22:12,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=522900.1666666667, ans=0.1 2024-09-17 00:22:24,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=522900.1666666667, ans=0.125 2024-09-17 00:22:42,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=522956.8333333333, ans=0.0 2024-09-17 00:23:11,351 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.139e+02 2.265e+02 2.452e+02 4.630e+02, threshold=4.530e+02, percent-clipped=1.0 2024-09-17 00:23:15,866 INFO [train.py:1198] (0/2) Epoch 29, batch 5650, loss[loss=0.2566, ctc_loss=0.1741, cr_loss=0.4122, over 20040.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1504, cr_loss=0.3737, over 4083549.83 frames. ], batch size: 80, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:24:22,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=523126.8333333333, ans=0.125 2024-09-17 00:24:30,966 INFO [train.py:1198] (0/2) Epoch 29, batch 5700, loss[loss=0.2613, ctc_loss=0.1813, cr_loss=0.4, over 18275.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1508, cr_loss=0.3742, over 4081085.91 frames. ], batch size: 108, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:24:34,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=523155.1666666667, ans=0.025 2024-09-17 00:24:38,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=523155.1666666667, ans=0.0 2024-09-17 00:25:40,988 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.159e+02 2.287e+02 2.512e+02 3.342e+02, threshold=4.574e+02, percent-clipped=0.0 2024-09-17 00:25:44,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=523296.8333333333, ans=0.125 2024-09-17 00:25:45,561 INFO [train.py:1198] (0/2) Epoch 29, batch 5750, loss[loss=0.2075, ctc_loss=0.1405, cr_loss=0.3348, over 20957.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1507, cr_loss=0.3733, over 4075599.83 frames. ], batch size: 49, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:25:54,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=523296.8333333333, ans=0.0 2024-09-17 00:25:58,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=523325.1666666667, ans=0.125 2024-09-17 00:26:00,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=523325.1666666667, ans=0.1 2024-09-17 00:26:21,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=523353.5, ans=0.5 2024-09-17 00:26:27,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=523353.5, ans=0.2 2024-09-17 00:26:28,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=523381.8333333333, ans=0.0 2024-09-17 00:26:48,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=523410.1666666667, ans=0.0 2024-09-17 00:26:48,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.57 vs. limit=10.0 2024-09-17 00:26:59,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2024-09-17 00:26:59,699 INFO [train.py:1198] (0/2) Epoch 29, batch 5800, loss[loss=0.2417, ctc_loss=0.1639, cr_loss=0.3893, over 20673.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1513, cr_loss=0.3742, over 4080484.09 frames. ], batch size: 68, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:27:10,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=523438.5, ans=0.125 2024-09-17 00:27:54,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=523523.5, ans=0.0 2024-09-17 00:28:10,915 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.135e+02 2.265e+02 2.409e+02 4.759e+02, threshold=4.529e+02, percent-clipped=1.0 2024-09-17 00:28:15,338 INFO [train.py:1198] (0/2) Epoch 29, batch 5850, loss[loss=0.2113, ctc_loss=0.1416, cr_loss=0.3482, over 21061.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1506, cr_loss=0.3736, over 4091117.42 frames. ], batch size: 62, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:28:17,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=523580.1666666667, ans=0.125 2024-09-17 00:28:27,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=523580.1666666667, ans=0.2 2024-09-17 00:28:39,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=523608.5, ans=0.5 2024-09-17 00:29:05,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=523665.1666666667, ans=0.0 2024-09-17 00:29:29,795 INFO [train.py:1198] (0/2) Epoch 29, batch 5900, loss[loss=0.2427, ctc_loss=0.1613, cr_loss=0.4071, over 20946.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1498, cr_loss=0.3725, over 4087718.85 frames. ], batch size: 60, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:29:33,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=523721.8333333333, ans=0.125 2024-09-17 00:30:05,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=523778.5, ans=0.125 2024-09-17 00:30:39,759 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.133e+02 2.255e+02 2.458e+02 4.200e+02, threshold=4.509e+02, percent-clipped=0.0 2024-09-17 00:30:44,212 INFO [train.py:1198] (0/2) Epoch 29, batch 5950, loss[loss=0.2216, ctc_loss=0.1472, cr_loss=0.3721, over 21000.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1507, cr_loss=0.3736, over 4090195.83 frames. ], batch size: 61, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:31:00,911 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2024-09-17 00:31:13,354 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=15.0 2024-09-17 00:32:01,151 INFO [train.py:1198] (0/2) Epoch 29, batch 6000, loss[loss=0.2159, ctc_loss=0.1428, cr_loss=0.3657, over 21068.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1513, cr_loss=0.3746, over 4086287.57 frames. ], batch size: 53, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:32:01,151 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 00:32:11,747 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4827, 3.2421, 3.7920, 3.8201, 3.2697, 3.7022, 2.5066, 2.4825], device='cuda:0') 2024-09-17 00:32:22,677 INFO [train.py:1230] (0/2) Epoch 29, validation: loss=0.04135, ctc_loss=0.04135, cr_loss=1.22e-14, over 944034.00 frames. 2024-09-17 00:32:22,677 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 00:32:37,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=524033.5, ans=0.025 2024-09-17 00:32:51,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=524061.8333333333, ans=0.125 2024-09-17 00:33:32,545 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.225e+02 2.385e+02 2.574e+02 4.481e+02, threshold=4.771e+02, percent-clipped=0.0 2024-09-17 00:33:36,801 INFO [train.py:1198] (0/2) Epoch 29, batch 6050, loss[loss=0.2403, ctc_loss=0.1584, cr_loss=0.4096, over 20778.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1514, cr_loss=0.3741, over 4086521.37 frames. ], batch size: 56, lr: 2.84e-03, grad_scale: 32.0 2024-09-17 00:34:18,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=524203.5, ans=0.0 2024-09-17 00:34:28,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=524231.8333333333, ans=0.125 2024-09-17 00:34:51,703 INFO [train.py:1198] (0/2) Epoch 29, batch 6100, loss[loss=0.2433, ctc_loss=0.1635, cr_loss=0.3991, over 21067.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1528, cr_loss=0.3763, over 4066123.01 frames. ], batch size: 62, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:35:09,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=524316.8333333334, ans=0.125 2024-09-17 00:35:28,786 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-17 00:35:57,776 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.706e-02 2024-09-17 00:36:00,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=524401.8333333334, ans=0.0 2024-09-17 00:36:01,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.182e+02 2.347e+02 2.516e+02 3.483e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-17 00:36:06,266 INFO [train.py:1198] (0/2) Epoch 29, batch 6150, loss[loss=0.2015, ctc_loss=0.1346, cr_loss=0.3346, over 20923.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1534, cr_loss=0.3768, over 4056913.13 frames. ], batch size: 49, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:36:44,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524486.8333333334, ans=0.1 2024-09-17 00:36:58,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=524515.1666666666, ans=0.0 2024-09-17 00:37:20,230 INFO [train.py:1198] (0/2) Epoch 29, batch 6200, loss[loss=0.2412, ctc_loss=0.1621, cr_loss=0.3954, over 20967.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1535, cr_loss=0.3772, over 4057399.99 frames. ], batch size: 64, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:37:39,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=524600.1666666666, ans=0.0 2024-09-17 00:38:13,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=524656.8333333334, ans=0.0 2024-09-17 00:38:18,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524685.1666666666, ans=0.1 2024-09-17 00:38:21,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=524685.1666666666, ans=0.0 2024-09-17 00:38:24,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524685.1666666666, ans=0.1 2024-09-17 00:38:28,619 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.176e+02 2.354e+02 2.588e+02 3.496e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-17 00:38:32,846 INFO [train.py:1198] (0/2) Epoch 29, batch 6250, loss[loss=0.2504, ctc_loss=0.1707, cr_loss=0.3989, over 20969.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1546, cr_loss=0.3774, over 3993465.99 frames. ], batch size: 64, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:38:47,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524741.8333333334, ans=0.1 2024-09-17 00:39:00,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=524741.8333333334, ans=0.0 2024-09-17 00:39:11,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=524770.1666666666, ans=0.2 2024-09-17 00:39:37,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=524826.8333333334, ans=0.125 2024-09-17 00:39:46,105 INFO [train.py:1198] (0/2) Epoch 29, batch 6300, loss[loss=0.1938, ctc_loss=0.1252, cr_loss=0.3434, over 20332.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1574, cr_loss=0.38, over 3911168.85 frames. ], batch size: 45, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:39:53,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524855.1666666666, ans=0.1 2024-09-17 00:40:02,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=22.5 2024-09-17 00:40:27,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=524940.1666666666, ans=0.0 2024-09-17 00:40:41,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=524968.5, ans=0.125 2024-09-17 00:40:48,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524968.5, ans=0.1 2024-09-17 00:40:52,532 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.211e+02 2.422e+02 2.609e+02 4.840e+02, threshold=4.844e+02, percent-clipped=1.0 2024-09-17 00:40:57,102 INFO [train.py:1198] (0/2) Epoch 29, batch 6350, loss[loss=0.2642, ctc_loss=0.1852, cr_loss=0.395, over 14814.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1602, cr_loss=0.3808, over 3782460.26 frames. ], batch size: 149, lr: 2.83e-03, grad_scale: 32.0 2024-09-17 00:41:08,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=524996.8333333334, ans=0.025 2024-09-17 00:41:19,469 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.91 vs. limit=22.5 2024-09-17 00:41:55,548 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-29.pt 2024-09-17 00:42:47,309 INFO [train.py:1198] (0/2) Epoch 30, batch 0, loss[loss=0.2359, ctc_loss=0.1579, cr_loss=0.39, over 20871.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1579, cr_loss=0.39, over 20871.00 frames. ], batch size: 57, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:42:47,310 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 00:42:56,930 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.9069, 4.9210, 4.8447, 4.4056], device='cuda:0') 2024-09-17 00:43:04,004 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3334, 4.3055, 4.2723, 3.9025], device='cuda:0') 2024-09-17 00:43:05,647 INFO [train.py:1230] (0/2) Epoch 30, validation: loss=0.04125, ctc_loss=0.04125, cr_loss=1.237e-14, over 944034.00 frames. 2024-09-17 00:43:05,648 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 00:43:58,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=525198.0, ans=0.0 2024-09-17 00:43:59,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=12.0 2024-09-17 00:44:21,508 INFO [train.py:1198] (0/2) Epoch 30, batch 50, loss[loss=0.1735, ctc_loss=0.1149, cr_loss=0.2934, over 20973.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1465, cr_loss=0.3692, over 940756.06 frames. ], batch size: 49, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:44:30,543 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.150e+02 2.396e+02 2.756e+02 3.152e+02, threshold=4.793e+02, percent-clipped=0.0 2024-09-17 00:44:37,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525283.0, ans=0.1 2024-09-17 00:44:37,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-17 00:44:56,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=525311.3333333334, ans=0.07 2024-09-17 00:45:02,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=525311.3333333334, ans=0.125 2024-09-17 00:45:20,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=525368.0, ans=0.0 2024-09-17 00:45:36,156 INFO [train.py:1198] (0/2) Epoch 30, batch 100, loss[loss=0.2286, ctc_loss=0.1553, cr_loss=0.3662, over 20878.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.153, cr_loss=0.3778, over 1619808.15 frames. ], batch size: 54, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:45:45,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=525396.3333333334, ans=0.0 2024-09-17 00:45:56,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525424.6666666666, ans=0.1 2024-09-17 00:46:37,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=525481.3333333334, ans=0.2 2024-09-17 00:46:41,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=525509.6666666666, ans=0.0 2024-09-17 00:46:54,835 INFO [train.py:1198] (0/2) Epoch 30, batch 150, loss[loss=0.2148, ctc_loss=0.1446, cr_loss=0.3509, over 20983.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1534, cr_loss=0.3778, over 2171779.48 frames. ], batch size: 52, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:47:03,657 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.145e+02 2.323e+02 2.514e+02 3.152e+02, threshold=4.646e+02, percent-clipped=0.0 2024-09-17 00:47:05,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=525538.0, ans=0.0 2024-09-17 00:47:29,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=525594.6666666666, ans=0.125 2024-09-17 00:47:49,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=525623.0, ans=0.125 2024-09-17 00:47:52,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=525623.0, ans=0.125 2024-09-17 00:47:56,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=525651.3333333334, ans=0.0 2024-09-17 00:48:12,911 INFO [train.py:1198] (0/2) Epoch 30, batch 200, loss[loss=0.2499, ctc_loss=0.1684, cr_loss=0.4079, over 20700.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1526, cr_loss=0.3773, over 2611140.71 frames. ], batch size: 68, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:48:19,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525679.6666666666, ans=0.1 2024-09-17 00:48:22,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=22.5 2024-09-17 00:49:25,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=525793.0, ans=0.07 2024-09-17 00:49:27,998 INFO [train.py:1198] (0/2) Epoch 30, batch 250, loss[loss=0.2258, ctc_loss=0.149, cr_loss=0.3838, over 20884.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1523, cr_loss=0.377, over 2942413.91 frames. ], batch size: 57, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:49:37,148 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.151e+02 2.228e+02 2.428e+02 4.713e+02, threshold=4.455e+02, percent-clipped=1.0 2024-09-17 00:49:58,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=525878.0, ans=0.0 2024-09-17 00:50:43,804 INFO [train.py:1198] (0/2) Epoch 30, batch 300, loss[loss=0.2357, ctc_loss=0.161, cr_loss=0.3735, over 20975.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.153, cr_loss=0.3785, over 3204387.20 frames. ], batch size: 64, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:51:05,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=525991.3333333334, ans=0.125 2024-09-17 00:51:13,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.81 vs. limit=12.0 2024-09-17 00:51:38,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=526048.0, ans=0.0 2024-09-17 00:51:46,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=526076.3333333334, ans=0.125 2024-09-17 00:51:59,795 INFO [train.py:1198] (0/2) Epoch 30, batch 350, loss[loss=0.2163, ctc_loss=0.1438, cr_loss=0.3629, over 20899.00 frames. ], tot_loss[loss=0.2289, ctc_loss=0.1531, cr_loss=0.379, over 3398442.83 frames. ], batch size: 57, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:52:12,158 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.167e+02 2.278e+02 2.443e+02 3.101e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-17 00:52:23,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=526133.0, ans=0.125 2024-09-17 00:52:44,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=526161.3333333334, ans=0.0 2024-09-17 00:52:56,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=526189.6666666666, ans=0.125 2024-09-17 00:53:21,569 INFO [train.py:1198] (0/2) Epoch 30, batch 400, loss[loss=0.2375, ctc_loss=0.1624, cr_loss=0.3754, over 21016.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1524, cr_loss=0.3778, over 3541454.05 frames. ], batch size: 63, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:53:40,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=526274.6666666666, ans=0.125 2024-09-17 00:54:04,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0 2024-09-17 00:54:18,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2024-09-17 00:54:37,518 INFO [train.py:1198] (0/2) Epoch 30, batch 450, loss[loss=0.2537, ctc_loss=0.1697, cr_loss=0.42, over 21020.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1512, cr_loss=0.3755, over 3669061.94 frames. ], batch size: 61, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:54:46,490 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.686e+02 2.124e+02 2.232e+02 2.429e+02 3.758e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-17 00:54:49,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=526388.0, ans=0.125 2024-09-17 00:54:54,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526416.3333333334, ans=0.1 2024-09-17 00:54:57,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=526416.3333333334, ans=0.125 2024-09-17 00:54:58,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=526416.3333333334, ans=0.0 2024-09-17 00:55:01,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=526416.3333333334, ans=0.09899494936611666 2024-09-17 00:55:13,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=526444.6666666666, ans=0.125 2024-09-17 00:55:52,554 INFO [train.py:1198] (0/2) Epoch 30, batch 500, loss[loss=0.2264, ctc_loss=0.1525, cr_loss=0.3695, over 21076.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1513, cr_loss=0.3755, over 3771625.45 frames. ], batch size: 59, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:55:53,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-09-17 00:55:57,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526529.6666666666, ans=0.1 2024-09-17 00:56:15,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=526558.0, ans=0.0 2024-09-17 00:56:26,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526586.3333333334, ans=0.1 2024-09-17 00:56:26,698 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.76 vs. limit=10.0 2024-09-17 00:56:27,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=526586.3333333334, ans=0.125 2024-09-17 00:57:07,836 INFO [train.py:1198] (0/2) Epoch 30, batch 550, loss[loss=0.2224, ctc_loss=0.1463, cr_loss=0.3807, over 20931.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1516, cr_loss=0.3757, over 3832576.99 frames. ], batch size: 60, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:57:16,988 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.211e+02 2.385e+02 2.602e+02 5.099e+02, threshold=4.770e+02, percent-clipped=1.0 2024-09-17 00:57:24,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-17 00:58:13,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=526784.6666666666, ans=0.04949747468305833 2024-09-17 00:58:18,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=526784.6666666666, ans=0.125 2024-09-17 00:58:25,980 INFO [train.py:1198] (0/2) Epoch 30, batch 600, loss[loss=0.1979, ctc_loss=0.1315, cr_loss=0.332, over 21043.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.151, cr_loss=0.3752, over 3896506.58 frames. ], batch size: 56, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:58:37,316 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-17 00:58:40,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=526841.3333333334, ans=0.0 2024-09-17 00:58:43,463 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:58:59,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=22.5 2024-09-17 00:59:04,908 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-17 00:59:16,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=12.0 2024-09-17 00:59:17,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526898.0, ans=0.1 2024-09-17 00:59:45,011 INFO [train.py:1198] (0/2) Epoch 30, batch 650, loss[loss=0.2196, ctc_loss=0.1445, cr_loss=0.3754, over 21018.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1511, cr_loss=0.3755, over 3934660.82 frames. ], batch size: 63, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 00:59:53,946 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.141e+02 2.266e+02 2.459e+02 4.953e+02, threshold=4.533e+02, percent-clipped=1.0 2024-09-17 01:00:20,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=527011.3333333334, ans=0.0 2024-09-17 01:00:23,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=527011.3333333334, ans=0.0 2024-09-17 01:00:31,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527039.6666666666, ans=0.1 2024-09-17 01:01:01,154 INFO [train.py:1198] (0/2) Epoch 30, batch 700, loss[loss=0.1904, ctc_loss=0.1259, cr_loss=0.3227, over 19956.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1502, cr_loss=0.3739, over 3968724.21 frames. ], batch size: 44, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:01:06,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=527096.3333333334, ans=0.125 2024-09-17 01:01:45,996 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-09-17 01:01:54,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=527181.3333333334, ans=0.125 2024-09-17 01:02:16,565 INFO [train.py:1198] (0/2) Epoch 30, batch 750, loss[loss=0.2111, ctc_loss=0.1396, cr_loss=0.3575, over 20981.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1497, cr_loss=0.373, over 3997441.25 frames. ], batch size: 55, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:02:24,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=527238.0, ans=0.0 2024-09-17 01:02:25,564 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.139e+02 2.297e+02 2.430e+02 2.989e+02, threshold=4.593e+02, percent-clipped=0.0 2024-09-17 01:02:28,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=527238.0, ans=0.0 2024-09-17 01:02:36,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=527266.3333333334, ans=0.2 2024-09-17 01:03:33,216 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-09-17 01:03:34,919 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=22.5 2024-09-17 01:03:35,339 INFO [train.py:1198] (0/2) Epoch 30, batch 800, loss[loss=0.226, ctc_loss=0.1511, cr_loss=0.3749, over 20926.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.374, over 4012135.18 frames. ], batch size: 60, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:03:38,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=527379.6666666666, ans=0.0 2024-09-17 01:03:44,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=527379.6666666666, ans=0.5 2024-09-17 01:04:04,498 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:04:53,956 INFO [train.py:1198] (0/2) Epoch 30, batch 850, loss[loss=0.1966, ctc_loss=0.1299, cr_loss=0.3333, over 20986.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.151, cr_loss=0.3747, over 4037251.56 frames. ], batch size: 50, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:05:02,914 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.146e+02 2.289e+02 2.431e+02 3.556e+02, threshold=4.578e+02, percent-clipped=0.0 2024-09-17 01:05:24,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=22.5 2024-09-17 01:05:30,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527578.0, ans=0.1 2024-09-17 01:05:51,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527606.3333333334, ans=0.1 2024-09-17 01:06:00,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=527634.6666666666, ans=0.125 2024-09-17 01:06:08,617 INFO [train.py:1198] (0/2) Epoch 30, batch 900, loss[loss=0.2827, ctc_loss=0.1974, cr_loss=0.4261, over 14737.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.152, cr_loss=0.3763, over 4035851.41 frames. ], batch size: 149, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:06:31,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=527691.3333333334, ans=0.0 2024-09-17 01:06:45,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=527719.6666666666, ans=0.0 2024-09-17 01:07:18,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=527776.3333333334, ans=0.0 2024-09-17 01:07:23,871 INFO [train.py:1198] (0/2) Epoch 30, batch 950, loss[loss=0.1862, ctc_loss=0.121, cr_loss=0.3262, over 19390.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1514, cr_loss=0.3756, over 4060618.29 frames. ], batch size: 43, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:07:30,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=527804.6666666666, ans=0.0 2024-09-17 01:07:32,725 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.120e+02 2.234e+02 2.375e+02 2.946e+02, threshold=4.468e+02, percent-clipped=0.0 2024-09-17 01:07:35,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=527804.6666666666, ans=0.125 2024-09-17 01:07:38,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=527833.0, ans=10.0 2024-09-17 01:07:55,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=527861.3333333334, ans=0.09899494936611666 2024-09-17 01:08:06,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527861.3333333334, ans=0.1 2024-09-17 01:08:21,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=527889.6666666666, ans=0.0 2024-09-17 01:08:33,693 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=22.5 2024-09-17 01:08:39,136 INFO [train.py:1198] (0/2) Epoch 30, batch 1000, loss[loss=0.2475, ctc_loss=0.1646, cr_loss=0.4142, over 21031.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.3752, over 4078926.61 frames. ], batch size: 63, lr: 2.78e-03, grad_scale: 64.0 2024-09-17 01:08:40,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527946.3333333334, ans=0.1 2024-09-17 01:08:46,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=527946.3333333334, ans=0.125 2024-09-17 01:08:48,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-09-17 01:09:57,082 INFO [train.py:1198] (0/2) Epoch 30, batch 1050, loss[loss=0.2146, ctc_loss=0.1416, cr_loss=0.3647, over 21077.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1514, cr_loss=0.3752, over 4083524.64 frames. ], batch size: 59, lr: 2.78e-03, grad_scale: 64.0 2024-09-17 01:10:05,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.26 vs. limit=10.0 2024-09-17 01:10:06,223 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.120e+02 2.221e+02 2.349e+02 3.101e+02, threshold=4.442e+02, percent-clipped=0.0 2024-09-17 01:10:09,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=528088.0, ans=0.0 2024-09-17 01:10:50,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=22.5 2024-09-17 01:11:05,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=528201.3333333334, ans=0.0 2024-09-17 01:11:05,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=528201.3333333334, ans=0.09899494936611666 2024-09-17 01:11:07,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2024-09-17 01:11:15,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=528229.6666666666, ans=0.0 2024-09-17 01:11:16,827 INFO [train.py:1198] (0/2) Epoch 30, batch 1100, loss[loss=0.2703, ctc_loss=0.1876, cr_loss=0.4133, over 20693.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1519, cr_loss=0.3761, over 4075711.65 frames. ], batch size: 68, lr: 2.78e-03, grad_scale: 64.0 2024-09-17 01:11:38,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=528258.0, ans=0.0 2024-09-17 01:11:53,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=528286.3333333334, ans=0.0 2024-09-17 01:12:18,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=528343.0, ans=0.2 2024-09-17 01:12:32,392 INFO [train.py:1198] (0/2) Epoch 30, batch 1150, loss[loss=0.2255, ctc_loss=0.152, cr_loss=0.3674, over 20874.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1523, cr_loss=0.3766, over 4073465.74 frames. ], batch size: 57, lr: 2.78e-03, grad_scale: 64.0 2024-09-17 01:12:40,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=528371.3333333334, ans=0.125 2024-09-17 01:12:41,269 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.162e+02 2.278e+02 2.515e+02 3.617e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-17 01:13:12,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-17 01:13:27,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=528456.3333333334, ans=0.125 2024-09-17 01:13:48,034 INFO [train.py:1198] (0/2) Epoch 30, batch 1200, loss[loss=0.2377, ctc_loss=0.1601, cr_loss=0.3879, over 20710.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1514, cr_loss=0.3754, over 4087315.80 frames. ], batch size: 71, lr: 2.78e-03, grad_scale: 32.0 2024-09-17 01:14:08,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=22.5 2024-09-17 01:14:59,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=528626.3333333334, ans=0.125 2024-09-17 01:15:07,168 INFO [train.py:1198] (0/2) Epoch 30, batch 1250, loss[loss=0.1857, ctc_loss=0.1215, cr_loss=0.3207, over 19877.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.15, cr_loss=0.3731, over 4100889.08 frames. ], batch size: 44, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:15:17,474 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.152e+02 2.256e+02 2.360e+02 3.023e+02, threshold=4.513e+02, percent-clipped=0.0 2024-09-17 01:15:20,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=528683.0, ans=0.2 2024-09-17 01:16:03,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=528739.6666666666, ans=0.0 2024-09-17 01:16:03,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=528739.6666666666, ans=0.125 2024-09-17 01:16:25,778 INFO [train.py:1198] (0/2) Epoch 30, batch 1300, loss[loss=0.2474, ctc_loss=0.1672, cr_loss=0.401, over 21040.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1504, cr_loss=0.3738, over 4103396.82 frames. ], batch size: 62, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:16:27,972 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-17 01:16:46,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=22.5 2024-09-17 01:17:19,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=528881.3333333334, ans=0.0 2024-09-17 01:17:41,444 INFO [train.py:1198] (0/2) Epoch 30, batch 1350, loss[loss=0.2185, ctc_loss=0.1452, cr_loss=0.3665, over 21032.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.15, cr_loss=0.3727, over 4114643.83 frames. ], batch size: 62, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:17:45,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=22.5 2024-09-17 01:17:52,190 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.140e+02 2.245e+02 2.378e+02 4.804e+02, threshold=4.491e+02, percent-clipped=1.0 2024-09-17 01:18:14,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=528994.6666666666, ans=0.125 2024-09-17 01:18:25,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=529023.0, ans=0.1 2024-09-17 01:18:56,301 INFO [train.py:1198] (0/2) Epoch 30, batch 1400, loss[loss=0.2351, ctc_loss=0.1573, cr_loss=0.3889, over 20996.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1505, cr_loss=0.3738, over 4106252.60 frames. ], batch size: 63, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:19:04,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-09-17 01:19:05,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=529079.6666666666, ans=0.125 2024-09-17 01:19:10,381 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:19:46,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=529164.6666666666, ans=0.125 2024-09-17 01:19:52,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=529164.6666666666, ans=0.125 2024-09-17 01:20:11,901 INFO [train.py:1198] (0/2) Epoch 30, batch 1450, loss[loss=0.1909, ctc_loss=0.1232, cr_loss=0.3387, over 20954.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1504, cr_loss=0.3735, over 4092395.93 frames. ], batch size: 48, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:20:16,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529221.3333333334, ans=0.1 2024-09-17 01:20:22,593 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.128e+02 2.266e+02 2.464e+02 3.893e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-17 01:20:23,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2024-09-17 01:20:25,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=529249.6666666666, ans=0.0 2024-09-17 01:21:30,933 INFO [train.py:1198] (0/2) Epoch 30, batch 1500, loss[loss=0.2313, ctc_loss=0.1585, cr_loss=0.3641, over 21058.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1502, cr_loss=0.3729, over 4105168.35 frames. ], batch size: 62, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:21:48,372 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:21:49,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=529391.3333333334, ans=0.125 2024-09-17 01:22:34,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=529476.3333333334, ans=0.125 2024-09-17 01:22:44,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=529476.3333333334, ans=0.0 2024-09-17 01:22:50,334 INFO [train.py:1198] (0/2) Epoch 30, batch 1550, loss[loss=0.2123, ctc_loss=0.1422, cr_loss=0.3505, over 20836.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.151, cr_loss=0.3748, over 4108874.98 frames. ], batch size: 59, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:23:00,944 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.158e+02 2.296e+02 2.497e+02 3.223e+02, threshold=4.592e+02, percent-clipped=0.0 2024-09-17 01:23:03,443 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2024-09-17 01:23:05,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=529533.0, ans=0.125 2024-09-17 01:23:06,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=529533.0, ans=0.125 2024-09-17 01:23:06,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=529533.0, ans=0.2 2024-09-17 01:23:43,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=529589.6666666666, ans=0.025 2024-09-17 01:23:45,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=529589.6666666666, ans=0.125 2024-09-17 01:24:06,415 INFO [train.py:1198] (0/2) Epoch 30, batch 1600, loss[loss=0.216, ctc_loss=0.1435, cr_loss=0.3622, over 20976.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1504, cr_loss=0.3733, over 4100544.55 frames. ], batch size: 55, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:24:38,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=529703.0, ans=0.125 2024-09-17 01:25:11,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=529759.6666666666, ans=0.2 2024-09-17 01:25:21,988 INFO [train.py:1198] (0/2) Epoch 30, batch 1650, loss[loss=0.2272, ctc_loss=0.152, cr_loss=0.3757, over 20877.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1498, cr_loss=0.3719, over 4105221.59 frames. ], batch size: 57, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:25:32,627 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.129e+02 2.229e+02 2.370e+02 3.077e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-17 01:25:42,566 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=22.5 2024-09-17 01:26:00,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-09-17 01:26:04,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=529844.6666666666, ans=0.2 2024-09-17 01:26:33,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=529901.3333333334, ans=0.2 2024-09-17 01:26:40,425 INFO [train.py:1198] (0/2) Epoch 30, batch 1700, loss[loss=0.2725, ctc_loss=0.1918, cr_loss=0.4031, over 14500.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.149, cr_loss=0.3706, over 4106896.25 frames. ], batch size: 149, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:26:48,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=529929.6666666666, ans=0.1 2024-09-17 01:27:00,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=529958.0, ans=0.0 2024-09-17 01:27:08,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529958.0, ans=0.1 2024-09-17 01:27:12,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=529986.3333333334, ans=0.125 2024-09-17 01:27:47,373 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:27:59,092 INFO [train.py:1198] (0/2) Epoch 30, batch 1750, loss[loss=0.2519, ctc_loss=0.1697, cr_loss=0.411, over 20262.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1495, cr_loss=0.3718, over 4102539.09 frames. ], batch size: 74, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:28:08,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=530071.3333333334, ans=0.125 2024-09-17 01:28:09,794 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.130e+02 2.256e+02 2.425e+02 4.114e+02, threshold=4.511e+02, percent-clipped=0.0 2024-09-17 01:28:10,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=530071.3333333334, ans=0.0 2024-09-17 01:28:56,887 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-09-17 01:29:11,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.18 vs. limit=15.0 2024-09-17 01:29:14,773 INFO [train.py:1198] (0/2) Epoch 30, batch 1800, loss[loss=0.2273, ctc_loss=0.1516, cr_loss=0.3783, over 20825.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1488, cr_loss=0.3707, over 4116266.14 frames. ], batch size: 59, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:29:18,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=530213.0, ans=0.1 2024-09-17 01:29:35,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=530241.3333333334, ans=0.09899494936611666 2024-09-17 01:30:30,934 INFO [train.py:1198] (0/2) Epoch 30, batch 1850, loss[loss=0.1786, ctc_loss=0.1175, cr_loss=0.3057, over 19927.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1485, cr_loss=0.3701, over 4118830.75 frames. ], batch size: 44, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:30:41,498 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.656e+02 2.167e+02 2.258e+02 2.458e+02 5.473e+02, threshold=4.517e+02, percent-clipped=1.0 2024-09-17 01:30:53,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-09-17 01:30:55,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=530383.0, ans=0.125 2024-09-17 01:31:26,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.88 vs. limit=15.0 2024-09-17 01:31:46,358 INFO [train.py:1198] (0/2) Epoch 30, batch 1900, loss[loss=0.2564, ctc_loss=0.1712, cr_loss=0.426, over 20989.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1496, cr_loss=0.3724, over 4116071.19 frames. ], batch size: 64, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:31:47,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=530496.3333333334, ans=0.125 2024-09-17 01:31:48,353 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:32:23,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=530553.0, ans=0.0 2024-09-17 01:32:40,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=530581.3333333334, ans=0.2 2024-09-17 01:32:49,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2024-09-17 01:33:04,675 INFO [train.py:1198] (0/2) Epoch 30, batch 1950, loss[loss=0.2215, ctc_loss=0.1503, cr_loss=0.3559, over 20593.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1491, cr_loss=0.3716, over 4114995.66 frames. ], batch size: 68, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:33:15,436 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.152e+02 2.322e+02 2.436e+02 3.414e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-17 01:33:27,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=530666.3333333334, ans=0.125 2024-09-17 01:34:20,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=530751.3333333334, ans=0.0 2024-09-17 01:34:23,734 INFO [train.py:1198] (0/2) Epoch 30, batch 2000, loss[loss=0.2509, ctc_loss=0.1708, cr_loss=0.4006, over 20864.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1493, cr_loss=0.3716, over 4103538.57 frames. ], batch size: 65, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:34:31,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=530779.6666666666, ans=0.1 2024-09-17 01:34:39,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=530808.0, ans=0.025 2024-09-17 01:35:23,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=12.0 2024-09-17 01:35:35,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=530893.0, ans=0.125 2024-09-17 01:35:39,234 INFO [train.py:1198] (0/2) Epoch 30, batch 2050, loss[loss=0.1997, ctc_loss=0.1266, cr_loss=0.3658, over 21011.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1497, cr_loss=0.3722, over 4107695.97 frames. ], batch size: 52, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:35:49,864 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.120e+02 2.239e+02 2.413e+02 5.839e+02, threshold=4.478e+02, percent-clipped=1.0 2024-09-17 01:36:01,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=530949.6666666666, ans=0.125 2024-09-17 01:36:32,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=22.5 2024-09-17 01:36:55,845 INFO [train.py:1198] (0/2) Epoch 30, batch 2100, loss[loss=0.2082, ctc_loss=0.1379, cr_loss=0.3514, over 20955.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1486, cr_loss=0.3706, over 4115292.20 frames. ], batch size: 51, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:37:02,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=531063.0, ans=0.125 2024-09-17 01:37:15,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=531091.3333333334, ans=0.0 2024-09-17 01:37:38,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-09-17 01:37:49,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=531148.0, ans=0.125 2024-09-17 01:38:13,913 INFO [train.py:1198] (0/2) Epoch 30, batch 2150, loss[loss=0.2524, ctc_loss=0.1784, cr_loss=0.3699, over 14597.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1498, cr_loss=0.3718, over 4100794.45 frames. ], batch size: 150, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:38:24,205 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.139e+02 2.270e+02 2.532e+02 3.032e+02, threshold=4.540e+02, percent-clipped=0.0 2024-09-17 01:38:46,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=531261.3333333334, ans=15.0 2024-09-17 01:38:57,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=531289.6666666666, ans=0.125 2024-09-17 01:39:00,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=531289.6666666666, ans=0.025 2024-09-17 01:39:32,484 INFO [train.py:1198] (0/2) Epoch 30, batch 2200, loss[loss=0.2078, ctc_loss=0.1406, cr_loss=0.3363, over 20978.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1501, cr_loss=0.3725, over 4103974.36 frames. ], batch size: 49, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:40:42,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=531459.6666666666, ans=0.0 2024-09-17 01:40:48,037 INFO [train.py:1198] (0/2) Epoch 30, batch 2250, loss[loss=0.2102, ctc_loss=0.1391, cr_loss=0.3556, over 20771.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1506, cr_loss=0.374, over 4101483.63 frames. ], batch size: 53, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:40:58,147 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2024-09-17 01:40:58,882 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.156e+02 2.262e+02 2.516e+02 3.058e+02, threshold=4.525e+02, percent-clipped=0.0 2024-09-17 01:40:59,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=531488.0, ans=0.0 2024-09-17 01:41:02,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=22.5 2024-09-17 01:41:16,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-09-17 01:41:33,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=531573.0, ans=0.125 2024-09-17 01:42:05,086 INFO [train.py:1198] (0/2) Epoch 30, batch 2300, loss[loss=0.2343, ctc_loss=0.1565, cr_loss=0.3891, over 20685.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.3741, over 4102654.10 frames. ], batch size: 71, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:42:51,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-17 01:43:20,474 INFO [train.py:1198] (0/2) Epoch 30, batch 2350, loss[loss=0.199, ctc_loss=0.1292, cr_loss=0.3486, over 21088.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.151, cr_loss=0.3749, over 4100483.49 frames. ], batch size: 56, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:43:23,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=531771.3333333334, ans=0.0 2024-09-17 01:43:31,072 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.138e+02 2.313e+02 2.458e+02 5.191e+02, threshold=4.627e+02, percent-clipped=2.0 2024-09-17 01:43:36,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=531799.6666666666, ans=0.125 2024-09-17 01:43:48,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=531799.6666666666, ans=0.125 2024-09-17 01:44:00,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=531828.0, ans=0.125 2024-09-17 01:44:06,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=531828.0, ans=0.125 2024-09-17 01:44:06,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=531828.0, ans=0.0 2024-09-17 01:44:13,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-17 01:44:30,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-09-17 01:44:32,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=531884.6666666666, ans=0.2 2024-09-17 01:44:39,259 INFO [train.py:1198] (0/2) Epoch 30, batch 2400, loss[loss=0.2566, ctc_loss=0.1838, cr_loss=0.3638, over 14901.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1513, cr_loss=0.3757, over 4098913.13 frames. ], batch size: 149, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:44:47,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=22.5 2024-09-17 01:45:06,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=531941.3333333334, ans=0.125 2024-09-17 01:45:16,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-09-17 01:45:17,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=531969.6666666666, ans=0.125 2024-09-17 01:45:22,594 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2024-09-17 01:45:25,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=531969.6666666666, ans=0.0 2024-09-17 01:45:33,485 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.31 vs. limit=15.0 2024-09-17 01:45:49,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532026.3333333334, ans=0.1 2024-09-17 01:45:57,920 INFO [train.py:1198] (0/2) Epoch 30, batch 2450, loss[loss=0.2069, ctc_loss=0.1377, cr_loss=0.346, over 21013.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1512, cr_loss=0.3759, over 4101077.81 frames. ], batch size: 52, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:46:08,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.168e+02 2.254e+02 2.444e+02 3.250e+02, threshold=4.508e+02, percent-clipped=0.0 2024-09-17 01:46:10,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=532054.6666666666, ans=0.125 2024-09-17 01:46:45,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=532139.6666666666, ans=0.2 2024-09-17 01:47:09,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=532168.0, ans=0.05 2024-09-17 01:47:11,735 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2024-09-17 01:47:12,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=532196.3333333334, ans=0.2 2024-09-17 01:47:14,029 INFO [train.py:1198] (0/2) Epoch 30, batch 2500, loss[loss=0.2295, ctc_loss=0.1516, cr_loss=0.3899, over 21043.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1516, cr_loss=0.3768, over 4088131.58 frames. ], batch size: 62, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:48:00,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=532281.3333333334, ans=0.1 2024-09-17 01:48:28,969 INFO [train.py:1198] (0/2) Epoch 30, batch 2550, loss[loss=0.2684, ctc_loss=0.1913, cr_loss=0.3854, over 13977.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1519, cr_loss=0.3768, over 4074733.39 frames. ], batch size: 151, lr: 2.77e-03, grad_scale: 32.0 2024-09-17 01:48:39,728 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.156e+02 2.232e+02 2.452e+02 5.570e+02, threshold=4.464e+02, percent-clipped=1.0 2024-09-17 01:48:56,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=532366.3333333334, ans=0.07 2024-09-17 01:48:59,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=532394.6666666666, ans=0.2 2024-09-17 01:49:12,512 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-09-17 01:49:47,731 INFO [train.py:1198] (0/2) Epoch 30, batch 2600, loss[loss=0.2224, ctc_loss=0.1487, cr_loss=0.3683, over 20776.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1512, cr_loss=0.3759, over 4074091.38 frames. ], batch size: 56, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:50:01,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=532508.0, ans=15.0 2024-09-17 01:50:03,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=532508.0, ans=0.035 2024-09-17 01:50:26,382 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.16 vs. limit=22.5 2024-09-17 01:51:05,901 INFO [train.py:1198] (0/2) Epoch 30, batch 2650, loss[loss=0.2267, ctc_loss=0.1516, cr_loss=0.3757, over 21024.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1506, cr_loss=0.3747, over 4090164.65 frames. ], batch size: 52, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:51:09,751 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-09-17 01:51:15,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=532621.3333333334, ans=0.125 2024-09-17 01:51:16,686 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.122e+02 2.251e+02 2.469e+02 4.413e+02, threshold=4.502e+02, percent-clipped=0.0 2024-09-17 01:51:24,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532649.6666666666, ans=0.1 2024-09-17 01:51:29,014 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-188000.pt 2024-09-17 01:51:37,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=532678.0, ans=0.125 2024-09-17 01:51:53,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=532706.3333333334, ans=10.0 2024-09-17 01:51:56,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=532706.3333333334, ans=0.0 2024-09-17 01:52:23,345 INFO [train.py:1198] (0/2) Epoch 30, batch 2700, loss[loss=0.2442, ctc_loss=0.162, cr_loss=0.4111, over 21037.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1508, cr_loss=0.3749, over 4096007.73 frames. ], batch size: 62, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:52:41,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=532791.3333333334, ans=0.025 2024-09-17 01:52:49,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=532791.3333333334, ans=10.0 2024-09-17 01:52:49,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=22.5 2024-09-17 01:53:15,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.37 vs. limit=10.0 2024-09-17 01:53:20,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=532848.0, ans=0.125 2024-09-17 01:53:21,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.51 vs. limit=22.5 2024-09-17 01:53:38,678 INFO [train.py:1198] (0/2) Epoch 30, batch 2750, loss[loss=0.2278, ctc_loss=0.1551, cr_loss=0.3635, over 20933.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1518, cr_loss=0.3765, over 4101863.61 frames. ], batch size: 60, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:53:46,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=532904.6666666666, ans=0.125 2024-09-17 01:53:48,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=532904.6666666666, ans=0.125 2024-09-17 01:53:49,277 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.183e+02 2.309e+02 2.496e+02 2.932e+02, threshold=4.619e+02, percent-clipped=0.0 2024-09-17 01:54:54,469 INFO [train.py:1198] (0/2) Epoch 30, batch 2800, loss[loss=0.1819, ctc_loss=0.1211, cr_loss=0.304, over 20991.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1526, cr_loss=0.3773, over 4066391.86 frames. ], batch size: 52, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:55:08,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=533046.3333333334, ans=0.07 2024-09-17 01:55:28,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-09-17 01:55:34,132 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=12.0 2024-09-17 01:55:40,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=12.0 2024-09-17 01:55:47,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=533131.3333333334, ans=0.2 2024-09-17 01:55:56,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=533159.6666666666, ans=0.04949747468305833 2024-09-17 01:56:13,071 INFO [train.py:1198] (0/2) Epoch 30, batch 2850, loss[loss=0.196, ctc_loss=0.1309, cr_loss=0.3254, over 20953.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1519, cr_loss=0.376, over 4076857.14 frames. ], batch size: 50, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:56:20,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=533188.0, ans=0.125 2024-09-17 01:56:23,550 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.178e+02 2.325e+02 2.456e+02 4.976e+02, threshold=4.650e+02, percent-clipped=1.0 2024-09-17 01:57:02,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=533273.0, ans=0.125 2024-09-17 01:57:11,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=22.5 2024-09-17 01:57:31,988 INFO [train.py:1198] (0/2) Epoch 30, batch 2900, loss[loss=0.2071, ctc_loss=0.1356, cr_loss=0.3575, over 20875.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1511, cr_loss=0.3744, over 4092193.36 frames. ], batch size: 57, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:57:47,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=533358.0, ans=0.125 2024-09-17 01:58:08,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-17 01:58:47,301 INFO [train.py:1198] (0/2) Epoch 30, batch 2950, loss[loss=0.2316, ctc_loss=0.1498, cr_loss=0.4093, over 20886.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1516, cr_loss=0.3748, over 4093571.53 frames. ], batch size: 54, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 01:58:57,869 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.174e+02 2.288e+02 2.456e+02 3.838e+02, threshold=4.575e+02, percent-clipped=0.0 2024-09-17 01:59:13,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=533499.6666666666, ans=0.125 2024-09-17 01:59:16,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=533528.0, ans=0.0 2024-09-17 02:00:03,362 INFO [train.py:1198] (0/2) Epoch 30, batch 3000, loss[loss=0.186, ctc_loss=0.1215, cr_loss=0.3226, over 19874.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.151, cr_loss=0.3737, over 4091544.10 frames. ], batch size: 44, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:00:03,363 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 02:00:15,696 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7523, 2.7125, 3.4656, 2.5306], device='cuda:0') 2024-09-17 02:00:24,460 INFO [train.py:1230] (0/2) Epoch 30, validation: loss=0.04164, ctc_loss=0.04164, cr_loss=1.25e-14, over 944034.00 frames. 2024-09-17 02:00:24,460 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 02:00:35,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=533613.0, ans=0.0 2024-09-17 02:00:41,991 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2024-09-17 02:00:46,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=533641.3333333334, ans=0.125 2024-09-17 02:01:01,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=533669.6666666666, ans=0.04949747468305833 2024-09-17 02:01:33,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=533726.3333333334, ans=0.2 2024-09-17 02:01:43,705 INFO [train.py:1198] (0/2) Epoch 30, batch 3050, loss[loss=0.203, ctc_loss=0.1371, cr_loss=0.3295, over 20782.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1511, cr_loss=0.3738, over 4086609.01 frames. ], batch size: 56, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:01:54,276 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.114e+02 2.234e+02 2.399e+02 3.160e+02, threshold=4.467e+02, percent-clipped=0.0 2024-09-17 02:02:08,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=533783.0, ans=0.125 2024-09-17 02:02:29,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=533811.3333333334, ans=0.0 2024-09-17 02:02:50,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533868.0, ans=0.1 2024-09-17 02:03:00,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=533896.3333333334, ans=0.07 2024-09-17 02:03:01,876 INFO [train.py:1198] (0/2) Epoch 30, batch 3100, loss[loss=0.1791, ctc_loss=0.1158, cr_loss=0.3169, over 20943.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1514, cr_loss=0.3746, over 4084513.98 frames. ], batch size: 49, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:03:34,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=533953.0, ans=0.125 2024-09-17 02:03:51,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=533981.3333333334, ans=0.125 2024-09-17 02:04:00,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=533981.3333333334, ans=0.0 2024-09-17 02:04:17,644 INFO [train.py:1198] (0/2) Epoch 30, batch 3150, loss[loss=0.2386, ctc_loss=0.1594, cr_loss=0.3961, over 20680.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1517, cr_loss=0.3757, over 4098865.17 frames. ], batch size: 66, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:04:28,290 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.145e+02 2.284e+02 2.407e+02 3.555e+02, threshold=4.568e+02, percent-clipped=0.0 2024-09-17 02:04:44,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=22.5 2024-09-17 02:05:00,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=534094.6666666666, ans=0.125 2024-09-17 02:05:08,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=534123.0, ans=0.125 2024-09-17 02:05:11,238 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:05:12,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=534123.0, ans=0.2 2024-09-17 02:05:21,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=534151.3333333334, ans=0.125 2024-09-17 02:05:33,768 INFO [train.py:1198] (0/2) Epoch 30, batch 3200, loss[loss=0.1961, ctc_loss=0.1312, cr_loss=0.3245, over 21054.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1523, cr_loss=0.3771, over 4091943.55 frames. ], batch size: 53, lr: 2.76e-03, grad_scale: 64.0 2024-09-17 02:05:46,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=534179.6666666666, ans=0.125 2024-09-17 02:05:53,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=534208.0, ans=0.05 2024-09-17 02:06:15,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2024-09-17 02:06:21,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=534264.6666666666, ans=0.2 2024-09-17 02:06:46,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=534293.0, ans=0.0 2024-09-17 02:06:53,510 INFO [train.py:1198] (0/2) Epoch 30, batch 3250, loss[loss=0.2368, ctc_loss=0.1638, cr_loss=0.3647, over 20889.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1519, cr_loss=0.3765, over 4097899.68 frames. ], batch size: 54, lr: 2.76e-03, grad_scale: 64.0 2024-09-17 02:07:04,280 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.178e+02 2.333e+02 2.477e+02 3.944e+02, threshold=4.666e+02, percent-clipped=0.0 2024-09-17 02:07:12,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=534349.6666666666, ans=0.125 2024-09-17 02:07:15,934 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-17 02:07:32,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=534378.0, ans=0.125 2024-09-17 02:07:32,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=534378.0, ans=0.0 2024-09-17 02:07:53,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=534434.6666666666, ans=0.125 2024-09-17 02:08:13,307 INFO [train.py:1198] (0/2) Epoch 30, batch 3300, loss[loss=0.1926, ctc_loss=0.1255, cr_loss=0.3358, over 20962.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.152, cr_loss=0.3764, over 4092247.38 frames. ], batch size: 51, lr: 2.76e-03, grad_scale: 64.0 2024-09-17 02:09:29,689 INFO [train.py:1198] (0/2) Epoch 30, batch 3350, loss[loss=0.225, ctc_loss=0.153, cr_loss=0.36, over 20975.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1516, cr_loss=0.3762, over 4093980.98 frames. ], batch size: 58, lr: 2.76e-03, grad_scale: 64.0 2024-09-17 02:09:30,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.80 vs. limit=10.0 2024-09-17 02:09:33,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=534604.6666666666, ans=0.125 2024-09-17 02:09:40,571 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.130e+02 2.265e+02 2.418e+02 3.586e+02, threshold=4.530e+02, percent-clipped=0.0 2024-09-17 02:09:54,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=534633.0, ans=0.2 2024-09-17 02:10:13,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=534689.6666666666, ans=0.0 2024-09-17 02:10:24,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=534689.6666666666, ans=0.125 2024-09-17 02:10:24,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=534689.6666666666, ans=0.125 2024-09-17 02:10:45,073 INFO [train.py:1198] (0/2) Epoch 30, batch 3400, loss[loss=0.2502, ctc_loss=0.1697, cr_loss=0.4026, over 20356.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1511, cr_loss=0.3757, over 4099198.23 frames. ], batch size: 74, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:10:51,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=534746.3333333334, ans=0.125 2024-09-17 02:11:03,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=534774.6666666666, ans=0.125 2024-09-17 02:11:27,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=534803.0, ans=0.125 2024-09-17 02:11:29,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=534831.3333333334, ans=0.125 2024-09-17 02:11:36,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=534831.3333333334, ans=0.125 2024-09-17 02:11:43,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.79 vs. limit=5.0 2024-09-17 02:11:46,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2024-09-17 02:11:49,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=534859.6666666666, ans=0.125 2024-09-17 02:12:00,674 INFO [train.py:1198] (0/2) Epoch 30, batch 3450, loss[loss=0.1952, ctc_loss=0.1291, cr_loss=0.3305, over 21007.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1502, cr_loss=0.3742, over 4095921.85 frames. ], batch size: 52, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:12:17,396 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.142e+02 2.255e+02 2.437e+02 3.136e+02, threshold=4.509e+02, percent-clipped=0.0 2024-09-17 02:12:28,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=534916.3333333334, ans=0.025 2024-09-17 02:12:33,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=534944.6666666666, ans=0.0 2024-09-17 02:12:34,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=534944.6666666666, ans=0.0 2024-09-17 02:12:39,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=534944.6666666666, ans=0.125 2024-09-17 02:13:20,538 INFO [train.py:1198] (0/2) Epoch 30, batch 3500, loss[loss=0.1934, ctc_loss=0.1254, cr_loss=0.34, over 20979.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1501, cr_loss=0.3738, over 4089745.77 frames. ], batch size: 50, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:13:28,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535029.6666666666, ans=0.1 2024-09-17 02:13:47,243 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2024-09-17 02:14:39,201 INFO [train.py:1198] (0/2) Epoch 30, batch 3550, loss[loss=0.2432, ctc_loss=0.1674, cr_loss=0.379, over 18617.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1504, cr_loss=0.3741, over 4084067.96 frames. ], batch size: 108, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:14:49,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=535171.3333333334, ans=0.0 2024-09-17 02:14:52,559 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.161e+02 2.313e+02 2.481e+02 4.633e+02, threshold=4.626e+02, percent-clipped=2.0 2024-09-17 02:15:21,865 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-09-17 02:15:33,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535256.3333333334, ans=0.1 2024-09-17 02:15:54,826 INFO [train.py:1198] (0/2) Epoch 30, batch 3600, loss[loss=0.2133, ctc_loss=0.1421, cr_loss=0.3562, over 21062.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.3749, over 4089600.87 frames. ], batch size: 62, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:16:20,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535341.3333333334, ans=0.1 2024-09-17 02:16:21,124 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2024-09-17 02:16:40,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=535398.0, ans=0.5 2024-09-17 02:16:48,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=535398.0, ans=0.0 2024-09-17 02:17:06,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=12.0 2024-09-17 02:17:10,301 INFO [train.py:1198] (0/2) Epoch 30, batch 3650, loss[loss=0.2069, ctc_loss=0.1349, cr_loss=0.3604, over 20942.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1505, cr_loss=0.3737, over 4083936.33 frames. ], batch size: 60, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:17:16,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=535454.6666666666, ans=0.2 2024-09-17 02:17:22,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535454.6666666666, ans=0.1 2024-09-17 02:17:23,744 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.121e+02 2.237e+02 2.396e+02 3.207e+02, threshold=4.473e+02, percent-clipped=0.0 2024-09-17 02:17:27,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=535483.0, ans=0.0 2024-09-17 02:18:01,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=535539.6666666666, ans=0.125 2024-09-17 02:18:07,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=535539.6666666666, ans=0.025 2024-09-17 02:18:28,793 INFO [train.py:1198] (0/2) Epoch 30, batch 3700, loss[loss=0.225, ctc_loss=0.153, cr_loss=0.36, over 21024.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1515, cr_loss=0.3761, over 4088214.83 frames. ], batch size: 62, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:18:44,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=535624.6666666666, ans=0.2 2024-09-17 02:18:52,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2024-09-17 02:18:55,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=22.5 2024-09-17 02:19:11,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=535653.0, ans=0.125 2024-09-17 02:19:47,712 INFO [train.py:1198] (0/2) Epoch 30, batch 3750, loss[loss=0.2187, ctc_loss=0.1421, cr_loss=0.3828, over 20882.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1517, cr_loss=0.3764, over 4081261.26 frames. ], batch size: 54, lr: 2.76e-03, grad_scale: 32.0 2024-09-17 02:20:01,432 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.114e+02 2.254e+02 2.437e+02 3.443e+02, threshold=4.509e+02, percent-clipped=0.0 2024-09-17 02:20:12,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=535766.3333333334, ans=0.0 2024-09-17 02:20:12,814 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2024-09-17 02:20:30,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-09-17 02:21:03,367 INFO [train.py:1198] (0/2) Epoch 30, batch 3800, loss[loss=0.1811, ctc_loss=0.1171, cr_loss=0.32, over 20954.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1512, cr_loss=0.3752, over 4089112.23 frames. ], batch size: 48, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:21:11,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=535879.6666666666, ans=0.0 2024-09-17 02:21:18,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=535908.0, ans=10.0 2024-09-17 02:21:29,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=535908.0, ans=0.0 2024-09-17 02:22:00,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=22.5 2024-09-17 02:22:07,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=535993.0, ans=0.125 2024-09-17 02:22:09,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=535993.0, ans=0.025 2024-09-17 02:22:18,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=536021.3333333334, ans=0.125 2024-09-17 02:22:19,340 INFO [train.py:1198] (0/2) Epoch 30, batch 3850, loss[loss=0.2568, ctc_loss=0.1735, cr_loss=0.4165, over 20690.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1519, cr_loss=0.3759, over 4075743.36 frames. ], batch size: 68, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:22:34,612 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.116e+02 2.273e+02 2.465e+02 5.111e+02, threshold=4.545e+02, percent-clipped=1.0 2024-09-17 02:22:55,568 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2024-09-17 02:23:13,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=536106.3333333334, ans=0.2 2024-09-17 02:23:21,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536134.6666666666, ans=0.1 2024-09-17 02:23:34,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=536163.0, ans=0.125 2024-09-17 02:23:34,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=536163.0, ans=0.0 2024-09-17 02:23:36,060 INFO [train.py:1198] (0/2) Epoch 30, batch 3900, loss[loss=0.1876, ctc_loss=0.1223, cr_loss=0.3265, over 20780.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.151, cr_loss=0.3745, over 4095000.56 frames. ], batch size: 53, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:23:36,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=536163.0, ans=0.0 2024-09-17 02:24:14,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=536219.6666666666, ans=0.125 2024-09-17 02:24:17,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=536219.6666666666, ans=0.125 2024-09-17 02:24:54,544 INFO [train.py:1198] (0/2) Epoch 30, batch 3950, loss[loss=0.2643, ctc_loss=0.1845, cr_loss=0.3988, over 14287.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1498, cr_loss=0.3723, over 4090715.22 frames. ], batch size: 151, lr: 2.76e-03, grad_scale: 16.0 2024-09-17 02:25:09,571 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.134e+02 2.247e+02 2.389e+02 4.042e+02, threshold=4.494e+02, percent-clipped=0.0 2024-09-17 02:26:13,971 INFO [train.py:1198] (0/2) Epoch 30, batch 4000, loss[loss=0.2376, ctc_loss=0.1599, cr_loss=0.3884, over 20961.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1504, cr_loss=0.373, over 4097409.57 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:26:31,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=536474.6666666666, ans=0.0 2024-09-17 02:27:03,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=536531.3333333334, ans=0.0 2024-09-17 02:27:03,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.87 vs. limit=15.0 2024-09-17 02:27:09,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536531.3333333334, ans=0.1 2024-09-17 02:27:10,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=536531.3333333334, ans=0.125 2024-09-17 02:27:30,175 INFO [train.py:1198] (0/2) Epoch 30, batch 4050, loss[loss=0.2623, ctc_loss=0.1773, cr_loss=0.4248, over 18266.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1508, cr_loss=0.3734, over 4088076.77 frames. ], batch size: 108, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:27:30,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=536588.0, ans=0.125 2024-09-17 02:27:45,276 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.180e+02 2.307e+02 2.416e+02 4.153e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-17 02:27:59,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=536644.6666666666, ans=0.125 2024-09-17 02:27:59,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=536644.6666666666, ans=0.0 2024-09-17 02:28:17,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=536673.0, ans=0.0 2024-09-17 02:28:44,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=536729.6666666666, ans=0.0 2024-09-17 02:28:45,818 INFO [train.py:1198] (0/2) Epoch 30, batch 4100, loss[loss=0.2412, ctc_loss=0.1619, cr_loss=0.3961, over 21002.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1509, cr_loss=0.374, over 4098719.31 frames. ], batch size: 63, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:28:49,435 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2024-09-17 02:28:55,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=536729.6666666666, ans=0.125 2024-09-17 02:29:01,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=22.5 2024-09-17 02:29:05,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=12.0 2024-09-17 02:29:43,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=536814.6666666666, ans=0.125 2024-09-17 02:30:03,837 INFO [train.py:1198] (0/2) Epoch 30, batch 4150, loss[loss=0.2457, ctc_loss=0.1639, cr_loss=0.4089, over 20955.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1516, cr_loss=0.3752, over 4106140.84 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:30:18,963 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.090e+02 2.210e+02 2.360e+02 3.416e+02, threshold=4.421e+02, percent-clipped=0.0 2024-09-17 02:30:20,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=536899.6666666666, ans=0.0 2024-09-17 02:30:46,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=536928.0, ans=0.09899494936611666 2024-09-17 02:31:21,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=537013.0, ans=0.025 2024-09-17 02:31:22,533 INFO [train.py:1198] (0/2) Epoch 30, batch 4200, loss[loss=0.1909, ctc_loss=0.1245, cr_loss=0.3318, over 20958.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.152, cr_loss=0.3756, over 4095613.60 frames. ], batch size: 48, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:31:22,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=537013.0, ans=0.2 2024-09-17 02:31:29,646 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.64 vs. limit=15.0 2024-09-17 02:32:26,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=22.5 2024-09-17 02:32:38,255 INFO [train.py:1198] (0/2) Epoch 30, batch 4250, loss[loss=0.2035, ctc_loss=0.1317, cr_loss=0.3591, over 20969.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1527, cr_loss=0.3768, over 4093715.14 frames. ], batch size: 51, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:32:53,545 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.166e+02 2.280e+02 2.499e+02 3.236e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 02:32:57,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=537183.0, ans=0.125 2024-09-17 02:33:03,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=537183.0, ans=0.125 2024-09-17 02:33:25,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=537239.6666666666, ans=0.95 2024-09-17 02:33:53,806 INFO [train.py:1198] (0/2) Epoch 30, batch 4300, loss[loss=0.2771, ctc_loss=0.1958, cr_loss=0.4069, over 14366.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1523, cr_loss=0.3761, over 4088656.57 frames. ], batch size: 149, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:34:35,617 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=22.5 2024-09-17 02:34:58,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=537409.6666666666, ans=0.125 2024-09-17 02:35:13,333 INFO [train.py:1198] (0/2) Epoch 30, batch 4350, loss[loss=0.233, ctc_loss=0.1553, cr_loss=0.3887, over 20929.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1527, cr_loss=0.377, over 4083899.75 frames. ], batch size: 60, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:35:18,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.62 vs. limit=10.0 2024-09-17 02:35:21,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=537438.0, ans=0.2 2024-09-17 02:35:22,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=537438.0, ans=0.125 2024-09-17 02:35:26,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.99 vs. limit=10.0 2024-09-17 02:35:28,307 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.215e+02 2.336e+02 2.568e+02 4.986e+02, threshold=4.672e+02, percent-clipped=1.0 2024-09-17 02:35:33,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=537466.3333333334, ans=0.05 2024-09-17 02:35:50,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=537494.6666666666, ans=0.125 2024-09-17 02:36:02,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.46 vs. limit=15.0 2024-09-17 02:36:08,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=537523.0, ans=0.125 2024-09-17 02:36:29,265 INFO [train.py:1198] (0/2) Epoch 30, batch 4400, loss[loss=0.2696, ctc_loss=0.1831, cr_loss=0.4324, over 20059.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1517, cr_loss=0.3755, over 4101526.31 frames. ], batch size: 80, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:36:30,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=537579.6666666666, ans=15.0 2024-09-17 02:36:49,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=537608.0, ans=0.125 2024-09-17 02:36:58,592 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:37:17,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2024-09-17 02:37:48,852 INFO [train.py:1198] (0/2) Epoch 30, batch 4450, loss[loss=0.1956, ctc_loss=0.1274, cr_loss=0.3411, over 20958.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1517, cr_loss=0.3757, over 4104120.23 frames. ], batch size: 49, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:38:04,105 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.157e+02 2.310e+02 2.428e+02 3.096e+02, threshold=4.619e+02, percent-clipped=0.0 2024-09-17 02:39:05,425 INFO [train.py:1198] (0/2) Epoch 30, batch 4500, loss[loss=0.2158, ctc_loss=0.1451, cr_loss=0.3537, over 20783.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1504, cr_loss=0.3735, over 4101206.79 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:39:25,674 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:39:52,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=537948.0, ans=0.125 2024-09-17 02:39:58,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=537948.0, ans=0.125 2024-09-17 02:40:20,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=538004.6666666666, ans=0.125 2024-09-17 02:40:21,811 INFO [train.py:1198] (0/2) Epoch 30, batch 4550, loss[loss=0.197, ctc_loss=0.1275, cr_loss=0.3477, over 21057.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1499, cr_loss=0.3732, over 4099213.60 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:40:31,661 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.89 vs. limit=10.0 2024-09-17 02:40:32,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=538004.6666666666, ans=0.125 2024-09-17 02:40:38,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.115e+02 2.239e+02 2.437e+02 3.730e+02, threshold=4.479e+02, percent-clipped=0.0 2024-09-17 02:40:40,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2024-09-17 02:41:26,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=538118.0, ans=0.05 2024-09-17 02:41:30,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=538118.0, ans=0.2 2024-09-17 02:41:40,238 INFO [train.py:1198] (0/2) Epoch 30, batch 4600, loss[loss=0.1906, ctc_loss=0.1264, cr_loss=0.3211, over 20977.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.151, cr_loss=0.3745, over 4090929.23 frames. ], batch size: 52, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:42:12,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=538203.0, ans=0.1 2024-09-17 02:42:27,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=538231.3333333334, ans=0.125 2024-09-17 02:42:45,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=538259.6666666666, ans=0.125 2024-09-17 02:42:53,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=538259.6666666666, ans=0.025 2024-09-17 02:42:53,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2024-09-17 02:42:59,178 INFO [train.py:1198] (0/2) Epoch 30, batch 4650, loss[loss=0.2244, ctc_loss=0.152, cr_loss=0.3621, over 21013.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1504, cr_loss=0.3733, over 4082307.91 frames. ], batch size: 61, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:43:15,556 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.100e+02 2.325e+02 2.528e+02 3.398e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-17 02:43:23,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538316.3333333334, ans=0.125 2024-09-17 02:43:36,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=538344.6666666666, ans=0.2 2024-09-17 02:43:47,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-09-17 02:43:56,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=538373.0, ans=0.0 2024-09-17 02:44:14,742 INFO [train.py:1198] (0/2) Epoch 30, batch 4700, loss[loss=0.2492, ctc_loss=0.176, cr_loss=0.3659, over 14638.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.15, cr_loss=0.3726, over 4086915.07 frames. ], batch size: 149, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:44:33,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=538458.0, ans=0.125 2024-09-17 02:44:45,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=538486.3333333334, ans=0.125 2024-09-17 02:44:47,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=538486.3333333334, ans=0.04949747468305833 2024-09-17 02:45:08,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=538514.6666666666, ans=0.2 2024-09-17 02:45:17,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=538543.0, ans=0.04949747468305833 2024-09-17 02:45:30,720 INFO [train.py:1198] (0/2) Epoch 30, batch 4750, loss[loss=0.219, ctc_loss=0.145, cr_loss=0.3702, over 20790.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1496, cr_loss=0.3726, over 4097526.23 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:45:36,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=538571.3333333334, ans=0.125 2024-09-17 02:45:47,236 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.139e+02 2.224e+02 2.394e+02 3.412e+02, threshold=4.447e+02, percent-clipped=0.0 2024-09-17 02:45:48,247 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-17 02:45:56,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=538599.6666666666, ans=0.05 2024-09-17 02:46:00,128 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2024-09-17 02:46:11,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=538628.0, ans=0.125 2024-09-17 02:46:11,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538628.0, ans=0.125 2024-09-17 02:46:33,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538684.6666666666, ans=0.125 2024-09-17 02:46:49,147 INFO [train.py:1198] (0/2) Epoch 30, batch 4800, loss[loss=0.2255, ctc_loss=0.1503, cr_loss=0.3762, over 20938.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1498, cr_loss=0.3729, over 4086671.90 frames. ], batch size: 60, lr: 2.75e-03, grad_scale: 32.0 2024-09-17 02:46:52,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2024-09-17 02:46:53,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=538713.0, ans=0.035 2024-09-17 02:46:56,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=538713.0, ans=0.025 2024-09-17 02:47:13,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.69 vs. limit=5.0 2024-09-17 02:47:45,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538798.0, ans=0.1 2024-09-17 02:48:07,433 INFO [train.py:1198] (0/2) Epoch 30, batch 4850, loss[loss=0.2459, ctc_loss=0.1646, cr_loss=0.4063, over 20963.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1501, cr_loss=0.3736, over 4090542.76 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:48:09,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=22.5 2024-09-17 02:48:10,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538854.6666666666, ans=0.125 2024-09-17 02:48:25,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.164e+02 2.334e+02 2.583e+02 4.515e+02, threshold=4.668e+02, percent-clipped=2.0 2024-09-17 02:48:27,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=538883.0, ans=0.0 2024-09-17 02:48:48,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=538911.3333333334, ans=0.0 2024-09-17 02:49:23,182 INFO [train.py:1198] (0/2) Epoch 30, batch 4900, loss[loss=0.2408, ctc_loss=0.1609, cr_loss=0.3997, over 21056.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.3739, over 4089129.06 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:49:33,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=538996.3333333334, ans=0.04949747468305833 2024-09-17 02:50:02,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=539053.0, ans=0.1 2024-09-17 02:50:24,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=539109.6666666666, ans=0.025 2024-09-17 02:50:32,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=539109.6666666666, ans=0.125 2024-09-17 02:50:37,799 INFO [train.py:1198] (0/2) Epoch 30, batch 4950, loss[loss=0.2276, ctc_loss=0.1524, cr_loss=0.376, over 20837.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.151, cr_loss=0.3755, over 4100458.29 frames. ], batch size: 59, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:50:47,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2024-09-17 02:50:57,061 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.145e+02 2.240e+02 2.413e+02 3.481e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-17 02:51:52,509 INFO [train.py:1198] (0/2) Epoch 30, batch 5000, loss[loss=0.2389, ctc_loss=0.162, cr_loss=0.3846, over 20940.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1504, cr_loss=0.3749, over 4114326.37 frames. ], batch size: 60, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:51:58,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=539279.6666666666, ans=0.0 2024-09-17 02:52:10,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=539308.0, ans=0.025 2024-09-17 02:52:18,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539308.0, ans=0.1 2024-09-17 02:53:07,312 INFO [train.py:1198] (0/2) Epoch 30, batch 5050, loss[loss=0.1981, ctc_loss=0.1294, cr_loss=0.3432, over 20802.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3739, over 4123350.99 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:53:18,628 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2024-09-17 02:53:26,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.163e+02 2.269e+02 2.419e+02 4.295e+02, threshold=4.538e+02, percent-clipped=0.0 2024-09-17 02:53:41,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-09-17 02:53:41,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=539478.0, ans=0.0 2024-09-17 02:53:43,742 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-09-17 02:53:49,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=539478.0, ans=15.0 2024-09-17 02:54:00,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=539506.3333333334, ans=0.0 2024-09-17 02:54:23,719 INFO [train.py:1198] (0/2) Epoch 30, batch 5100, loss[loss=0.2402, ctc_loss=0.1607, cr_loss=0.3976, over 20901.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1507, cr_loss=0.3747, over 4094481.98 frames. ], batch size: 54, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:54:34,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=539563.0, ans=0.2 2024-09-17 02:54:34,424 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:54:49,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=539591.3333333334, ans=0.0 2024-09-17 02:55:37,749 INFO [train.py:1198] (0/2) Epoch 30, batch 5150, loss[loss=0.178, ctc_loss=0.1148, cr_loss=0.3157, over 19872.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1509, cr_loss=0.3744, over 4079644.15 frames. ], batch size: 44, lr: 2.75e-03, grad_scale: 8.0 2024-09-17 02:55:39,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=539704.6666666666, ans=0.125 2024-09-17 02:55:39,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-17 02:55:43,065 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-17 02:55:57,072 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.156e+02 2.270e+02 2.462e+02 5.193e+02, threshold=4.539e+02, percent-clipped=1.0 2024-09-17 02:56:04,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=539733.0, ans=0.125 2024-09-17 02:56:10,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=539761.3333333334, ans=0.125 2024-09-17 02:56:51,829 INFO [train.py:1198] (0/2) Epoch 30, batch 5200, loss[loss=0.2428, ctc_loss=0.1598, cr_loss=0.4148, over 20829.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1516, cr_loss=0.3757, over 4081391.10 frames. ], batch size: 65, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:57:06,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=539846.3333333334, ans=0.125 2024-09-17 02:57:08,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=539874.6666666666, ans=0.125 2024-09-17 02:57:26,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=539903.0, ans=0.125 2024-09-17 02:57:39,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=539931.3333333334, ans=0.025 2024-09-17 02:58:06,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=539959.6666666666, ans=0.125 2024-09-17 02:58:09,115 INFO [train.py:1198] (0/2) Epoch 30, batch 5250, loss[loss=0.2291, ctc_loss=0.1543, cr_loss=0.3739, over 20974.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1519, cr_loss=0.3756, over 4085936.91 frames. ], batch size: 58, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:58:28,721 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.145e+02 2.254e+02 2.424e+02 4.153e+02, threshold=4.507e+02, percent-clipped=0.0 2024-09-17 02:58:33,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=540016.3333333334, ans=0.0 2024-09-17 02:59:09,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=540101.3333333334, ans=0.125 2024-09-17 02:59:23,865 INFO [train.py:1198] (0/2) Epoch 30, batch 5300, loss[loss=0.2054, ctc_loss=0.1334, cr_loss=0.3601, over 20954.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1507, cr_loss=0.3741, over 4092315.22 frames. ], batch size: 49, lr: 2.75e-03, grad_scale: 16.0 2024-09-17 02:59:39,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=540158.0, ans=0.0 2024-09-17 02:59:58,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=540186.3333333334, ans=0.1 2024-09-17 03:00:29,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.81 vs. limit=5.0 2024-09-17 03:00:36,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=22.5 2024-09-17 03:00:38,824 INFO [train.py:1198] (0/2) Epoch 30, batch 5350, loss[loss=0.2326, ctc_loss=0.1528, cr_loss=0.399, over 21059.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1507, cr_loss=0.3737, over 4096926.46 frames. ], batch size: 56, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:00:49,822 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-09-17 03:00:58,054 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.139e+02 2.237e+02 2.366e+02 2.975e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-17 03:01:01,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=540299.6666666666, ans=0.125 2024-09-17 03:01:17,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=540328.0, ans=0.125 2024-09-17 03:01:19,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=540328.0, ans=0.125 2024-09-17 03:01:41,857 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:01:53,453 INFO [train.py:1198] (0/2) Epoch 30, batch 5400, loss[loss=0.1892, ctc_loss=0.1226, cr_loss=0.3328, over 20956.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.3736, over 4104754.96 frames. ], batch size: 49, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:02:14,629 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:02:26,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=540469.6666666666, ans=0.125 2024-09-17 03:02:37,405 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-17 03:02:42,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2024-09-17 03:03:04,887 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.72 vs. limit=15.0 2024-09-17 03:03:08,403 INFO [train.py:1198] (0/2) Epoch 30, batch 5450, loss[loss=0.2297, ctc_loss=0.1541, cr_loss=0.3779, over 20873.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1511, cr_loss=0.3749, over 4092416.25 frames. ], batch size: 57, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:03:10,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-17 03:03:27,844 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.169e+02 2.265e+02 2.380e+02 3.417e+02, threshold=4.530e+02, percent-clipped=0.0 2024-09-17 03:03:54,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-09-17 03:04:21,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=540668.0, ans=0.07 2024-09-17 03:04:25,189 INFO [train.py:1198] (0/2) Epoch 30, batch 5500, loss[loss=0.2438, ctc_loss=0.1624, cr_loss=0.4071, over 20775.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.3749, over 4100250.39 frames. ], batch size: 56, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:04:34,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=540696.3333333334, ans=0.0 2024-09-17 03:05:39,873 INFO [train.py:1198] (0/2) Epoch 30, batch 5550, loss[loss=0.2753, ctc_loss=0.1897, cr_loss=0.4278, over 18524.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1513, cr_loss=0.3753, over 4086642.05 frames. ], batch size: 108, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:05:49,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=540838.0, ans=0.95 2024-09-17 03:05:59,440 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.136e+02 2.246e+02 2.482e+02 5.208e+02, threshold=4.491e+02, percent-clipped=1.0 2024-09-17 03:06:20,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=540894.6666666666, ans=0.1 2024-09-17 03:06:48,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=540951.3333333334, ans=0.5 2024-09-17 03:06:51,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=540951.3333333334, ans=0.0 2024-09-17 03:06:57,184 INFO [train.py:1198] (0/2) Epoch 30, batch 5600, loss[loss=0.2078, ctc_loss=0.1377, cr_loss=0.3507, over 20887.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1509, cr_loss=0.3745, over 4084727.88 frames. ], batch size: 54, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:07:08,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=540979.6666666666, ans=0.0 2024-09-17 03:07:45,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=541064.6666666666, ans=0.0 2024-09-17 03:08:11,544 INFO [train.py:1198] (0/2) Epoch 30, batch 5650, loss[loss=0.2072, ctc_loss=0.1374, cr_loss=0.3492, over 20896.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1516, cr_loss=0.375, over 4075871.03 frames. ], batch size: 54, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:08:30,731 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.219e+02 2.344e+02 2.549e+02 4.118e+02, threshold=4.688e+02, percent-clipped=0.0 2024-09-17 03:08:54,295 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-09-17 03:09:01,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=541206.3333333334, ans=0.125 2024-09-17 03:09:02,846 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:09:10,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=541234.6666666666, ans=0.0 2024-09-17 03:09:26,223 INFO [train.py:1198] (0/2) Epoch 30, batch 5700, loss[loss=0.1792, ctc_loss=0.1157, cr_loss=0.3175, over 20993.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1509, cr_loss=0.3735, over 4089511.96 frames. ], batch size: 49, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:09:53,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=541291.3333333334, ans=0.5 2024-09-17 03:10:41,348 INFO [train.py:1198] (0/2) Epoch 30, batch 5750, loss[loss=0.1954, ctc_loss=0.1255, cr_loss=0.3499, over 20988.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1514, cr_loss=0.375, over 4095955.09 frames. ], batch size: 52, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:10:50,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=541404.6666666666, ans=0.0 2024-09-17 03:10:55,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=541433.0, ans=0.025 2024-09-17 03:11:02,329 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.112e+02 2.229e+02 2.445e+02 4.655e+02, threshold=4.458e+02, percent-clipped=0.0 2024-09-17 03:11:04,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541433.0, ans=0.1 2024-09-17 03:11:07,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541433.0, ans=0.1 2024-09-17 03:11:22,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=541461.3333333334, ans=0.125 2024-09-17 03:11:31,566 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2024-09-17 03:11:35,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541489.6666666666, ans=0.1 2024-09-17 03:11:46,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=541518.0, ans=0.09899494936611666 2024-09-17 03:11:56,086 INFO [train.py:1198] (0/2) Epoch 30, batch 5800, loss[loss=0.234, ctc_loss=0.1567, cr_loss=0.3863, over 21001.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1514, cr_loss=0.3754, over 4100518.62 frames. ], batch size: 63, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:12:48,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=541631.3333333334, ans=0.0 2024-09-17 03:12:58,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=541659.6666666666, ans=0.125 2024-09-17 03:13:00,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-09-17 03:13:06,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=541659.6666666666, ans=0.2 2024-09-17 03:13:13,492 INFO [train.py:1198] (0/2) Epoch 30, batch 5850, loss[loss=0.2222, ctc_loss=0.1479, cr_loss=0.3715, over 21070.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1511, cr_loss=0.3754, over 4098170.56 frames. ], batch size: 53, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:13:34,356 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.173e+02 2.328e+02 2.470e+02 4.926e+02, threshold=4.656e+02, percent-clipped=1.0 2024-09-17 03:13:34,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=541716.3333333334, ans=0.0 2024-09-17 03:13:40,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=541716.3333333334, ans=0.125 2024-09-17 03:13:45,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=541744.6666666666, ans=0.2 2024-09-17 03:14:14,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=541801.3333333334, ans=0.0 2024-09-17 03:14:28,777 INFO [train.py:1198] (0/2) Epoch 30, batch 5900, loss[loss=0.2237, ctc_loss=0.1499, cr_loss=0.3688, over 20681.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1514, cr_loss=0.3752, over 4087785.23 frames. ], batch size: 68, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:14:46,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2024-09-17 03:14:57,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=541886.3333333334, ans=0.2 2024-09-17 03:15:18,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541914.6666666666, ans=0.1 2024-09-17 03:15:45,427 INFO [train.py:1198] (0/2) Epoch 30, batch 5950, loss[loss=0.2085, ctc_loss=0.1386, cr_loss=0.3497, over 21069.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1508, cr_loss=0.3744, over 4091599.13 frames. ], batch size: 53, lr: 2.74e-03, grad_scale: 16.0 2024-09-17 03:15:47,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=541971.3333333334, ans=0.125 2024-09-17 03:16:06,446 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.125e+02 2.235e+02 2.360e+02 2.949e+02, threshold=4.470e+02, percent-clipped=0.0 2024-09-17 03:16:24,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=542028.0, ans=0.0 2024-09-17 03:16:29,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-09-17 03:16:45,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-09-17 03:16:59,609 INFO [train.py:1198] (0/2) Epoch 30, batch 6000, loss[loss=0.1886, ctc_loss=0.1224, cr_loss=0.3311, over 21060.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1502, cr_loss=0.3738, over 4088109.18 frames. ], batch size: 56, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:16:59,610 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 03:17:20,483 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.5134, 5.0649, 5.3259, 4.7227], device='cuda:0') 2024-09-17 03:17:22,464 INFO [train.py:1230] (0/2) Epoch 30, validation: loss=0.04091, ctc_loss=0.04091, cr_loss=1.274e-14, over 944034.00 frames. 2024-09-17 03:17:22,465 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 03:17:47,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-17 03:18:18,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2024-09-17 03:18:36,980 INFO [train.py:1198] (0/2) Epoch 30, batch 6050, loss[loss=0.209, ctc_loss=0.1374, cr_loss=0.3584, over 21048.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3736, over 4094399.66 frames. ], batch size: 56, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:18:44,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=542254.6666666666, ans=0.05 2024-09-17 03:18:50,836 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-09-17 03:18:57,650 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.198e+02 2.318e+02 2.450e+02 3.539e+02, threshold=4.636e+02, percent-clipped=0.0 2024-09-17 03:19:11,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=542311.3333333334, ans=0.125 2024-09-17 03:19:52,771 INFO [train.py:1198] (0/2) Epoch 30, batch 6100, loss[loss=0.225, ctc_loss=0.1493, cr_loss=0.3783, over 20895.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1506, cr_loss=0.3744, over 4096885.13 frames. ], batch size: 54, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:20:00,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=542396.3333333334, ans=0.125 2024-09-17 03:20:11,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=542424.6666666666, ans=0.1 2024-09-17 03:20:19,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.04 vs. limit=22.5 2024-09-17 03:21:09,161 INFO [train.py:1198] (0/2) Epoch 30, batch 6150, loss[loss=0.2795, ctc_loss=0.192, cr_loss=0.4376, over 18364.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.151, cr_loss=0.3751, over 4103589.94 frames. ], batch size: 109, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:21:29,723 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.183e+02 2.320e+02 2.515e+02 4.172e+02, threshold=4.640e+02, percent-clipped=0.0 2024-09-17 03:22:18,842 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:22:22,942 INFO [train.py:1198] (0/2) Epoch 30, batch 6200, loss[loss=0.2692, ctc_loss=0.1807, cr_loss=0.4427, over 20013.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1518, cr_loss=0.3758, over 4078548.62 frames. ], batch size: 80, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:23:37,133 INFO [train.py:1198] (0/2) Epoch 30, batch 6250, loss[loss=0.2203, ctc_loss=0.1463, cr_loss=0.3699, over 20995.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1522, cr_loss=0.3757, over 4044273.67 frames. ], batch size: 55, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:23:41,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=542821.3333333334, ans=0.125 2024-09-17 03:23:57,753 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.236e+02 2.421e+02 2.583e+02 4.278e+02, threshold=4.841e+02, percent-clipped=0.0 2024-09-17 03:24:02,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=542849.6666666666, ans=0.0 2024-09-17 03:24:02,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2024-09-17 03:24:11,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=542878.0, ans=0.1 2024-09-17 03:24:26,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=542906.3333333334, ans=0.05 2024-09-17 03:24:47,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=542934.6666666666, ans=0.0 2024-09-17 03:24:50,046 INFO [train.py:1198] (0/2) Epoch 30, batch 6300, loss[loss=0.2868, ctc_loss=0.2055, cr_loss=0.4064, over 14002.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1552, cr_loss=0.3798, over 4002614.13 frames. ], batch size: 150, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:25:37,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=543048.0, ans=0.125 2024-09-17 03:25:51,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=543076.3333333334, ans=12.0 2024-09-17 03:26:00,272 INFO [train.py:1198] (0/2) Epoch 30, batch 6350, loss[loss=0.2736, ctc_loss=0.1933, cr_loss=0.4014, over 13903.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1607, cr_loss=0.3841, over 3835335.89 frames. ], batch size: 149, lr: 2.74e-03, grad_scale: 32.0 2024-09-17 03:26:20,555 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.478e+02 2.628e+02 2.838e+02 3.418e+02, threshold=5.255e+02, percent-clipped=0.0 2024-09-17 03:26:33,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=543161.3333333334, ans=0.2 2024-09-17 03:26:39,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2024-09-17 03:26:54,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=543189.6666666666, ans=0.0 2024-09-17 03:26:58,608 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-30.pt 2024-09-17 03:27:50,477 INFO [train.py:1198] (0/2) Epoch 31, batch 0, loss[loss=0.2317, ctc_loss=0.1536, cr_loss=0.3902, over 20879.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1536, cr_loss=0.3902, over 20879.00 frames. ], batch size: 57, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:27:50,478 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 03:28:09,051 INFO [train.py:1230] (0/2) Epoch 31, validation: loss=0.04055, ctc_loss=0.04055, cr_loss=1.258e-14, over 944034.00 frames. 2024-09-17 03:28:09,052 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 03:28:35,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.79 vs. limit=10.0 2024-09-17 03:28:54,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=543305.8333333334, ans=0.125 2024-09-17 03:29:00,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=543305.8333333334, ans=0.95 2024-09-17 03:29:02,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=543305.8333333334, ans=0.125 2024-09-17 03:29:07,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=543334.1666666666, ans=0.125 2024-09-17 03:29:14,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=543334.1666666666, ans=0.125 2024-09-17 03:29:24,172 INFO [train.py:1198] (0/2) Epoch 31, batch 50, loss[loss=0.2491, ctc_loss=0.1672, cr_loss=0.4096, over 20845.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1528, cr_loss=0.3763, over 913930.21 frames. ], batch size: 65, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:30:00,774 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.128e+02 2.271e+02 2.598e+02 3.725e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-17 03:30:05,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=543419.1666666666, ans=0.0 2024-09-17 03:30:07,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=543419.1666666666, ans=0.1 2024-09-17 03:30:07,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=543419.1666666666, ans=0.125 2024-09-17 03:30:14,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=543447.5, ans=0.2 2024-09-17 03:30:41,933 INFO [train.py:1198] (0/2) Epoch 31, batch 100, loss[loss=0.2566, ctc_loss=0.1765, cr_loss=0.4007, over 14211.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1503, cr_loss=0.3729, over 1621817.16 frames. ], batch size: 149, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:31:03,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=543532.5, ans=0.125 2024-09-17 03:31:55,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=543617.5, ans=0.0 2024-09-17 03:31:59,818 INFO [train.py:1198] (0/2) Epoch 31, batch 150, loss[loss=0.2288, ctc_loss=0.1553, cr_loss=0.3677, over 20955.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1527, cr_loss=0.3772, over 2168279.98 frames. ], batch size: 60, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:32:15,515 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:32:34,332 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.119e+02 2.302e+02 2.446e+02 8.776e+02, threshold=4.604e+02, percent-clipped=1.0 2024-09-17 03:32:51,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=543730.8333333334, ans=0.0 2024-09-17 03:33:10,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=543759.1666666666, ans=0.125 2024-09-17 03:33:15,077 INFO [train.py:1198] (0/2) Epoch 31, batch 200, loss[loss=0.2516, ctc_loss=0.171, cr_loss=0.4027, over 21009.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.153, cr_loss=0.3774, over 2591092.82 frames. ], batch size: 61, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:33:25,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=543787.5, ans=0.1 2024-09-17 03:33:37,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=543815.8333333334, ans=0.0 2024-09-17 03:34:16,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2024-09-17 03:34:30,219 INFO [train.py:1198] (0/2) Epoch 31, batch 250, loss[loss=0.2104, ctc_loss=0.1403, cr_loss=0.3504, over 20944.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1527, cr_loss=0.3777, over 2919765.26 frames. ], batch size: 49, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:34:30,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=543929.1666666666, ans=0.125 2024-09-17 03:34:41,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=543929.1666666666, ans=0.125 2024-09-17 03:35:04,940 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.139e+02 2.233e+02 2.438e+02 5.737e+02, threshold=4.465e+02, percent-clipped=1.0 2024-09-17 03:35:06,514 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-192000.pt 2024-09-17 03:35:25,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=544014.1666666666, ans=0.1 2024-09-17 03:35:40,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=544042.5, ans=0.2 2024-09-17 03:35:49,354 INFO [train.py:1198] (0/2) Epoch 31, batch 300, loss[loss=0.2253, ctc_loss=0.1497, cr_loss=0.3779, over 20780.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1519, cr_loss=0.3769, over 3183546.19 frames. ], batch size: 53, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:36:23,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=544127.5, ans=0.0 2024-09-17 03:36:30,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=544127.5, ans=0.0 2024-09-17 03:36:35,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=544155.8333333334, ans=0.2 2024-09-17 03:36:56,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=544184.1666666666, ans=0.125 2024-09-17 03:37:08,430 INFO [train.py:1198] (0/2) Epoch 31, batch 350, loss[loss=0.2192, ctc_loss=0.1487, cr_loss=0.353, over 19889.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1509, cr_loss=0.3761, over 3390748.19 frames. ], batch size: 44, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:37:19,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=544212.5, ans=0.2 2024-09-17 03:37:42,906 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.146e+02 2.269e+02 2.420e+02 4.863e+02, threshold=4.538e+02, percent-clipped=1.0 2024-09-17 03:37:50,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=544269.1666666666, ans=0.0 2024-09-17 03:38:22,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=22.5 2024-09-17 03:38:23,592 INFO [train.py:1198] (0/2) Epoch 31, batch 400, loss[loss=0.2444, ctc_loss=0.1651, cr_loss=0.3965, over 20866.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1512, cr_loss=0.376, over 3552125.85 frames. ], batch size: 57, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:38:51,462 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:39:06,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=544410.8333333334, ans=0.125 2024-09-17 03:39:12,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=544439.1666666666, ans=0.125 2024-09-17 03:39:27,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-09-17 03:39:36,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=544467.5, ans=0.125 2024-09-17 03:39:39,515 INFO [train.py:1198] (0/2) Epoch 31, batch 450, loss[loss=0.187, ctc_loss=0.1234, cr_loss=0.3178, over 19433.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1507, cr_loss=0.3745, over 3671666.83 frames. ], batch size: 43, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:39:39,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=544495.8333333334, ans=0.0 2024-09-17 03:40:15,275 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 2.124e+02 2.223e+02 2.373e+02 3.148e+02, threshold=4.446e+02, percent-clipped=0.0 2024-09-17 03:40:30,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=544580.8333333334, ans=0.0 2024-09-17 03:40:36,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=544580.8333333334, ans=0.125 2024-09-17 03:40:42,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=544609.1666666666, ans=0.125 2024-09-17 03:40:56,150 INFO [train.py:1198] (0/2) Epoch 31, batch 500, loss[loss=0.2126, ctc_loss=0.1424, cr_loss=0.351, over 21025.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1506, cr_loss=0.3749, over 3755081.66 frames. ], batch size: 63, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:41:17,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=544665.8333333334, ans=0.125 2024-09-17 03:41:34,391 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:41:46,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=544722.5, ans=0.0 2024-09-17 03:42:14,243 INFO [train.py:1198] (0/2) Epoch 31, batch 550, loss[loss=0.2074, ctc_loss=0.1367, cr_loss=0.3538, over 20786.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1501, cr_loss=0.3744, over 3826346.59 frames. ], batch size: 53, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:42:23,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=544779.1666666666, ans=0.0 2024-09-17 03:42:40,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=544807.5, ans=0.0 2024-09-17 03:42:52,267 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.124e+02 2.240e+02 2.429e+02 3.419e+02, threshold=4.481e+02, percent-clipped=0.0 2024-09-17 03:43:01,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=544864.1666666666, ans=0.09899494936611666 2024-09-17 03:43:12,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=544864.1666666666, ans=0.125 2024-09-17 03:43:19,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=544892.5, ans=0.125 2024-09-17 03:43:21,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=544892.5, ans=0.2 2024-09-17 03:43:21,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=544892.5, ans=0.125 2024-09-17 03:43:33,216 INFO [train.py:1198] (0/2) Epoch 31, batch 600, loss[loss=0.2433, ctc_loss=0.1635, cr_loss=0.3992, over 20955.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1497, cr_loss=0.3733, over 3876424.21 frames. ], batch size: 64, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:44:03,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=544977.5, ans=0.2 2024-09-17 03:44:06,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=544977.5, ans=0.0 2024-09-17 03:44:15,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=544977.5, ans=0.125 2024-09-17 03:44:25,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545005.8333333334, ans=0.1 2024-09-17 03:44:48,522 INFO [train.py:1198] (0/2) Epoch 31, batch 650, loss[loss=0.2479, ctc_loss=0.1681, cr_loss=0.3988, over 20357.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.3754, over 3921518.81 frames. ], batch size: 74, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:45:03,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=545090.8333333334, ans=0.0 2024-09-17 03:45:23,000 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.754e+02 2.106e+02 2.256e+02 2.442e+02 2.933e+02, threshold=4.512e+02, percent-clipped=0.0 2024-09-17 03:46:03,460 INFO [train.py:1198] (0/2) Epoch 31, batch 700, loss[loss=0.2295, ctc_loss=0.1534, cr_loss=0.3801, over 21020.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1513, cr_loss=0.3754, over 3951504.03 frames. ], batch size: 63, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:46:09,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=545204.1666666666, ans=0.125 2024-09-17 03:46:22,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545232.5, ans=0.1 2024-09-17 03:46:45,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=545260.8333333334, ans=0.125 2024-09-17 03:46:53,944 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=22.5 2024-09-17 03:47:22,023 INFO [train.py:1198] (0/2) Epoch 31, batch 750, loss[loss=0.217, ctc_loss=0.1447, cr_loss=0.3615, over 20675.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1514, cr_loss=0.3762, over 3993316.53 frames. ], batch size: 71, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:47:25,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-17 03:47:26,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=545345.8333333334, ans=0.0 2024-09-17 03:47:52,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=545402.5, ans=0.125 2024-09-17 03:47:56,736 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.156e+02 2.302e+02 2.451e+02 2.970e+02, threshold=4.605e+02, percent-clipped=0.0 2024-09-17 03:48:00,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=545402.5, ans=0.0 2024-09-17 03:48:00,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=545402.5, ans=0.125 2024-09-17 03:48:04,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=545402.5, ans=0.09899494936611666 2024-09-17 03:48:08,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-17 03:48:39,982 INFO [train.py:1198] (0/2) Epoch 31, batch 800, loss[loss=0.2099, ctc_loss=0.1394, cr_loss=0.3522, over 20974.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1511, cr_loss=0.3755, over 4015733.92 frames. ], batch size: 55, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:48:55,896 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-09-17 03:48:58,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=545515.8333333334, ans=0.0 2024-09-17 03:49:55,529 INFO [train.py:1198] (0/2) Epoch 31, batch 850, loss[loss=0.2531, ctc_loss=0.1704, cr_loss=0.4135, over 20664.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1509, cr_loss=0.3747, over 4023691.58 frames. ], batch size: 68, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:49:55,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=545629.1666666666, ans=0.125 2024-09-17 03:50:30,058 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.121e+02 2.292e+02 2.437e+02 3.125e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-17 03:51:10,863 INFO [train.py:1198] (0/2) Epoch 31, batch 900, loss[loss=0.2098, ctc_loss=0.1418, cr_loss=0.3401, over 20783.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.151, cr_loss=0.3744, over 4047004.36 frames. ], batch size: 56, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:51:41,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=545827.5, ans=0.025 2024-09-17 03:52:14,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=545884.1666666666, ans=0.125 2024-09-17 03:52:27,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2024-09-17 03:52:29,744 INFO [train.py:1198] (0/2) Epoch 31, batch 950, loss[loss=0.2121, ctc_loss=0.1412, cr_loss=0.3547, over 20856.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1496, cr_loss=0.3721, over 4065730.94 frames. ], batch size: 57, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:52:33,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=545912.5, ans=0.125 2024-09-17 03:52:40,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=545912.5, ans=0.2 2024-09-17 03:53:04,063 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.183e+02 2.329e+02 2.428e+02 3.182e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-17 03:53:16,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=545997.5, ans=0.0 2024-09-17 03:53:21,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=545997.5, ans=0.015 2024-09-17 03:53:44,829 INFO [train.py:1198] (0/2) Epoch 31, batch 1000, loss[loss=0.1892, ctc_loss=0.1236, cr_loss=0.3282, over 21068.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1495, cr_loss=0.3726, over 4080262.80 frames. ], batch size: 53, lr: 2.69e-03, grad_scale: 32.0 2024-09-17 03:53:46,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=546054.1666666666, ans=0.025 2024-09-17 03:53:48,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=546054.1666666666, ans=0.1 2024-09-17 03:53:58,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=546082.5, ans=0.125 2024-09-17 03:54:27,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=546110.8333333334, ans=0.07 2024-09-17 03:55:00,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=546167.5, ans=0.125 2024-09-17 03:55:03,070 INFO [train.py:1198] (0/2) Epoch 31, batch 1050, loss[loss=0.2138, ctc_loss=0.1416, cr_loss=0.3608, over 21041.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1495, cr_loss=0.3727, over 4086181.96 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 03:55:03,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=546195.8333333334, ans=0.0 2024-09-17 03:55:18,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=546224.1666666666, ans=0.125 2024-09-17 03:55:30,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=546224.1666666666, ans=0.0 2024-09-17 03:55:36,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=546252.5, ans=0.125 2024-09-17 03:55:36,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=546252.5, ans=0.125 2024-09-17 03:55:38,061 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.166e+02 2.237e+02 2.389e+02 5.782e+02, threshold=4.475e+02, percent-clipped=2.0 2024-09-17 03:55:44,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=546252.5, ans=0.2 2024-09-17 03:56:17,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=546337.5, ans=0.0 2024-09-17 03:56:18,952 INFO [train.py:1198] (0/2) Epoch 31, batch 1100, loss[loss=0.1929, ctc_loss=0.129, cr_loss=0.3194, over 20778.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1486, cr_loss=0.3713, over 4095036.03 frames. ], batch size: 53, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 03:56:32,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=546365.8333333334, ans=0.1 2024-09-17 03:57:11,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=546422.5, ans=0.0 2024-09-17 03:57:19,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=546450.8333333334, ans=0.2 2024-09-17 03:57:24,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=546450.8333333334, ans=0.0 2024-09-17 03:57:26,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=546450.8333333334, ans=10.0 2024-09-17 03:57:33,741 INFO [train.py:1198] (0/2) Epoch 31, batch 1150, loss[loss=0.2357, ctc_loss=0.1536, cr_loss=0.4108, over 20959.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1489, cr_loss=0.3719, over 4106389.58 frames. ], batch size: 55, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 03:57:43,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=22.5 2024-09-17 03:57:49,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=546507.5, ans=0.0 2024-09-17 03:58:11,823 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.137e+02 2.309e+02 2.471e+02 6.425e+02, threshold=4.618e+02, percent-clipped=1.0 2024-09-17 03:58:27,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=546564.1666666666, ans=0.1 2024-09-17 03:58:52,849 INFO [train.py:1198] (0/2) Epoch 31, batch 1200, loss[loss=0.2356, ctc_loss=0.1576, cr_loss=0.3901, over 21001.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.149, cr_loss=0.372, over 4109349.80 frames. ], batch size: 61, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:00:04,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=546734.1666666666, ans=0.0 2024-09-17 04:00:11,546 INFO [train.py:1198] (0/2) Epoch 31, batch 1250, loss[loss=0.2335, ctc_loss=0.1547, cr_loss=0.3941, over 21045.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1493, cr_loss=0.3727, over 4116682.98 frames. ], batch size: 63, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:00:46,527 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.151e+02 2.284e+02 2.422e+02 3.136e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-17 04:00:53,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0 2024-09-17 04:01:12,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=546875.8333333334, ans=0.0 2024-09-17 04:01:27,009 INFO [train.py:1198] (0/2) Epoch 31, batch 1300, loss[loss=0.2004, ctc_loss=0.1312, cr_loss=0.3459, over 20987.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1491, cr_loss=0.3726, over 4118498.66 frames. ], batch size: 51, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:01:31,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=546904.1666666666, ans=0.125 2024-09-17 04:01:39,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=546904.1666666666, ans=0.125 2024-09-17 04:01:42,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=546932.5, ans=0.2 2024-09-17 04:02:29,218 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-17 04:02:33,580 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-09-17 04:02:39,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=547017.5, ans=0.125 2024-09-17 04:02:41,939 INFO [train.py:1198] (0/2) Epoch 31, batch 1350, loss[loss=0.2461, ctc_loss=0.1657, cr_loss=0.4023, over 20704.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1491, cr_loss=0.3718, over 4106411.61 frames. ], batch size: 71, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:02:45,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-17 04:03:16,705 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.183e+02 2.335e+02 2.523e+02 3.587e+02, threshold=4.670e+02, percent-clipped=0.0 2024-09-17 04:03:17,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=547102.5, ans=0.125 2024-09-17 04:03:23,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=547102.5, ans=0.0 2024-09-17 04:03:56,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547159.1666666666, ans=0.1 2024-09-17 04:03:59,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=547187.5, ans=10.0 2024-09-17 04:04:00,905 INFO [train.py:1198] (0/2) Epoch 31, batch 1400, loss[loss=0.2104, ctc_loss=0.1389, cr_loss=0.3575, over 21068.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1498, cr_loss=0.3729, over 4095973.93 frames. ], batch size: 53, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:04:07,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=547187.5, ans=0.1 2024-09-17 04:04:28,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=547215.8333333334, ans=0.0 2024-09-17 04:05:01,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=547300.8333333334, ans=0.125 2024-09-17 04:05:15,794 INFO [train.py:1198] (0/2) Epoch 31, batch 1450, loss[loss=0.232, ctc_loss=0.1553, cr_loss=0.3837, over 21046.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1499, cr_loss=0.3725, over 4101130.65 frames. ], batch size: 62, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:05:19,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=547329.1666666666, ans=0.2 2024-09-17 04:05:20,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=547329.1666666666, ans=0.125 2024-09-17 04:05:23,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=547329.1666666666, ans=0.125 2024-09-17 04:05:47,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=547385.8333333334, ans=0.2 2024-09-17 04:05:53,743 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.124e+02 2.269e+02 2.456e+02 3.778e+02, threshold=4.538e+02, percent-clipped=0.0 2024-09-17 04:06:34,673 INFO [train.py:1198] (0/2) Epoch 31, batch 1500, loss[loss=0.2224, ctc_loss=0.1511, cr_loss=0.3563, over 20885.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1495, cr_loss=0.3722, over 4100993.97 frames. ], batch size: 57, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:06:35,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=547470.8333333334, ans=0.125 2024-09-17 04:06:54,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=547499.1666666666, ans=0.0 2024-09-17 04:07:13,354 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-09-17 04:07:37,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.56 vs. limit=10.0 2024-09-17 04:07:50,381 INFO [train.py:1198] (0/2) Epoch 31, batch 1550, loss[loss=0.2229, ctc_loss=0.1467, cr_loss=0.3811, over 20725.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1495, cr_loss=0.3726, over 4103835.44 frames. ], batch size: 71, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:07:56,763 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:07:57,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=22.5 2024-09-17 04:08:10,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=547640.8333333334, ans=0.125 2024-09-17 04:08:13,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=547640.8333333334, ans=0.125 2024-09-17 04:08:24,958 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.137e+02 2.218e+02 2.342e+02 3.108e+02, threshold=4.436e+02, percent-clipped=0.0 2024-09-17 04:09:01,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=547725.8333333334, ans=0.0 2024-09-17 04:09:05,540 INFO [train.py:1198] (0/2) Epoch 31, batch 1600, loss[loss=0.2292, ctc_loss=0.1535, cr_loss=0.3785, over 20935.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1497, cr_loss=0.3719, over 4091627.18 frames. ], batch size: 67, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:09:10,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=547754.1666666666, ans=0.07 2024-09-17 04:09:22,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=547782.5, ans=0.125 2024-09-17 04:09:53,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=547839.1666666666, ans=0.125 2024-09-17 04:10:05,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=547839.1666666666, ans=0.125 2024-09-17 04:10:18,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=547867.5, ans=22.5 2024-09-17 04:10:23,634 INFO [train.py:1198] (0/2) Epoch 31, batch 1650, loss[loss=0.1808, ctc_loss=0.1171, cr_loss=0.3184, over 20971.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1491, cr_loss=0.3717, over 4098047.18 frames. ], batch size: 49, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:10:27,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=547895.8333333334, ans=0.5 2024-09-17 04:10:58,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.128e+02 2.245e+02 2.351e+02 2.893e+02, threshold=4.491e+02, percent-clipped=0.0 2024-09-17 04:11:05,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2024-09-17 04:11:42,077 INFO [train.py:1198] (0/2) Epoch 31, batch 1700, loss[loss=0.248, ctc_loss=0.166, cr_loss=0.41, over 20741.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1493, cr_loss=0.3725, over 4095346.22 frames. ], batch size: 71, lr: 2.68e-03, grad_scale: 64.0 2024-09-17 04:11:57,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=548065.8333333334, ans=0.125 2024-09-17 04:11:57,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=548065.8333333334, ans=0.0 2024-09-17 04:12:36,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=548122.5, ans=0.125 2024-09-17 04:12:37,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=548122.5, ans=0.025 2024-09-17 04:12:38,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=548122.5, ans=0.125 2024-09-17 04:12:38,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=548122.5, ans=0.0 2024-09-17 04:12:42,857 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-09-17 04:12:57,568 INFO [train.py:1198] (0/2) Epoch 31, batch 1750, loss[loss=0.2364, ctc_loss=0.1584, cr_loss=0.39, over 20854.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1498, cr_loss=0.3734, over 4105220.76 frames. ], batch size: 65, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:12:59,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=548179.1666666666, ans=0.125 2024-09-17 04:13:05,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=548179.1666666666, ans=0.0 2024-09-17 04:13:08,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=548179.1666666666, ans=0.125 2024-09-17 04:13:27,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-09-17 04:13:29,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548235.8333333334, ans=0.1 2024-09-17 04:13:33,966 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.140e+02 2.226e+02 2.346e+02 2.927e+02, threshold=4.452e+02, percent-clipped=0.0 2024-09-17 04:13:46,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=548264.1666666666, ans=0.1 2024-09-17 04:13:56,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=548292.5, ans=0.025 2024-09-17 04:14:01,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=548292.5, ans=0.125 2024-09-17 04:14:12,612 INFO [train.py:1198] (0/2) Epoch 31, batch 1800, loss[loss=0.2523, ctc_loss=0.1694, cr_loss=0.4146, over 20677.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1505, cr_loss=0.3747, over 4102595.17 frames. ], batch size: 68, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:14:15,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=548320.8333333334, ans=0.025 2024-09-17 04:14:15,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=548320.8333333334, ans=0.125 2024-09-17 04:14:17,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=548320.8333333334, ans=0.025 2024-09-17 04:15:31,151 INFO [train.py:1198] (0/2) Epoch 31, batch 1850, loss[loss=0.2046, ctc_loss=0.1354, cr_loss=0.3463, over 20946.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1499, cr_loss=0.373, over 4103137.07 frames. ], batch size: 49, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:15:55,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=548490.8333333334, ans=0.0 2024-09-17 04:16:01,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=548519.1666666666, ans=0.125 2024-09-17 04:16:07,659 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.123e+02 2.279e+02 2.476e+02 3.054e+02, threshold=4.557e+02, percent-clipped=0.0 2024-09-17 04:16:30,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=548575.8333333334, ans=0.125 2024-09-17 04:16:47,363 INFO [train.py:1198] (0/2) Epoch 31, batch 1900, loss[loss=0.2455, ctc_loss=0.165, cr_loss=0.4026, over 20805.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1498, cr_loss=0.3729, over 4114054.74 frames. ], batch size: 65, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:16:55,349 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:17:14,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=548632.5, ans=0.07 2024-09-17 04:17:29,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=548660.8333333334, ans=0.125 2024-09-17 04:17:31,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=548660.8333333334, ans=0.125 2024-09-17 04:17:44,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=548689.1666666666, ans=0.0 2024-09-17 04:18:05,346 INFO [train.py:1198] (0/2) Epoch 31, batch 1950, loss[loss=0.2216, ctc_loss=0.1478, cr_loss=0.369, over 21038.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.3741, over 4123289.36 frames. ], batch size: 63, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:18:41,775 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.132e+02 2.239e+02 2.428e+02 5.677e+02, threshold=4.478e+02, percent-clipped=1.0 2024-09-17 04:19:02,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=548830.8333333334, ans=0.125 2024-09-17 04:19:21,237 INFO [train.py:1198] (0/2) Epoch 31, batch 2000, loss[loss=0.2349, ctc_loss=0.1573, cr_loss=0.3881, over 20755.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.3741, over 4122499.28 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:19:42,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=548915.8333333334, ans=0.025 2024-09-17 04:20:00,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=548944.1666666666, ans=10.0 2024-09-17 04:20:23,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=549000.8333333334, ans=0.125 2024-09-17 04:20:29,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-09-17 04:20:36,389 INFO [train.py:1198] (0/2) Epoch 31, batch 2050, loss[loss=0.1822, ctc_loss=0.1195, cr_loss=0.3135, over 20972.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3736, over 4115845.29 frames. ], batch size: 51, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:20:50,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.41 vs. limit=12.0 2024-09-17 04:20:57,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=549057.5, ans=0.0 2024-09-17 04:21:02,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=549057.5, ans=0.125 2024-09-17 04:21:03,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2024-09-17 04:21:17,330 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.145e+02 2.250e+02 2.460e+02 3.586e+02, threshold=4.500e+02, percent-clipped=0.0 2024-09-17 04:21:29,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=549114.1666666666, ans=0.125 2024-09-17 04:21:52,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=549142.5, ans=0.025 2024-09-17 04:21:53,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-17 04:21:54,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=549170.8333333334, ans=0.025 2024-09-17 04:21:55,368 INFO [train.py:1198] (0/2) Epoch 31, batch 2100, loss[loss=0.1982, ctc_loss=0.1317, cr_loss=0.3328, over 20773.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1496, cr_loss=0.3736, over 4120328.54 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:21:57,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-17 04:22:04,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=549170.8333333334, ans=0.025 2024-09-17 04:22:32,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=549227.5, ans=0.125 2024-09-17 04:22:45,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=549255.8333333334, ans=0.125 2024-09-17 04:22:47,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549255.8333333334, ans=0.1 2024-09-17 04:22:57,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=549284.1666666666, ans=0.125 2024-09-17 04:22:59,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=549284.1666666666, ans=0.0 2024-09-17 04:23:02,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=549284.1666666666, ans=0.125 2024-09-17 04:23:08,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=549284.1666666666, ans=0.04949747468305833 2024-09-17 04:23:13,875 INFO [train.py:1198] (0/2) Epoch 31, batch 2150, loss[loss=0.2293, ctc_loss=0.153, cr_loss=0.3817, over 20885.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1497, cr_loss=0.3736, over 4118899.81 frames. ], batch size: 57, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:23:35,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=549340.8333333334, ans=0.125 2024-09-17 04:23:52,015 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.149e+02 2.255e+02 2.448e+02 6.767e+02, threshold=4.511e+02, percent-clipped=1.0 2024-09-17 04:24:01,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=549397.5, ans=0.125 2024-09-17 04:24:07,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=549397.5, ans=0.05 2024-09-17 04:24:09,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=549397.5, ans=0.125 2024-09-17 04:24:14,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=549425.8333333334, ans=0.0 2024-09-17 04:24:20,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=549425.8333333334, ans=0.0 2024-09-17 04:24:29,758 INFO [train.py:1198] (0/2) Epoch 31, batch 2200, loss[loss=0.2581, ctc_loss=0.1754, cr_loss=0.4133, over 20090.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1497, cr_loss=0.3738, over 4120124.91 frames. ], batch size: 80, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:24:31,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549454.1666666666, ans=0.1 2024-09-17 04:24:54,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2024-09-17 04:24:58,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=549510.8333333334, ans=0.2 2024-09-17 04:25:44,341 INFO [train.py:1198] (0/2) Epoch 31, batch 2250, loss[loss=0.243, ctc_loss=0.1623, cr_loss=0.4035, over 20944.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1501, cr_loss=0.3743, over 4116390.10 frames. ], batch size: 64, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:25:55,448 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-09-17 04:25:55,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=549595.8333333334, ans=15.0 2024-09-17 04:26:21,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=12.0 2024-09-17 04:26:22,274 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.195e+02 2.305e+02 2.510e+02 4.431e+02, threshold=4.611e+02, percent-clipped=0.0 2024-09-17 04:26:24,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.21 vs. limit=22.5 2024-09-17 04:26:40,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549680.8333333334, ans=0.1 2024-09-17 04:26:44,161 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=22.5 2024-09-17 04:26:48,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=549709.1666666666, ans=0.125 2024-09-17 04:27:02,945 INFO [train.py:1198] (0/2) Epoch 31, batch 2300, loss[loss=0.1987, ctc_loss=0.1303, cr_loss=0.3422, over 19906.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.15, cr_loss=0.3735, over 4095605.10 frames. ], batch size: 44, lr: 2.68e-03, grad_scale: 16.0 2024-09-17 04:27:03,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=549737.5, ans=0.2 2024-09-17 04:27:15,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=549737.5, ans=0.0 2024-09-17 04:27:40,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=549794.1666666666, ans=0.125 2024-09-17 04:27:57,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=549822.5, ans=0.125 2024-09-17 04:28:08,259 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:28:18,288 INFO [train.py:1198] (0/2) Epoch 31, batch 2350, loss[loss=0.2008, ctc_loss=0.1302, cr_loss=0.3532, over 20973.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1498, cr_loss=0.3732, over 4096108.08 frames. ], batch size: 51, lr: 2.68e-03, grad_scale: 16.0 2024-09-17 04:28:25,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-17 04:28:32,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=549907.5, ans=0.125 2024-09-17 04:28:44,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=549907.5, ans=0.125 2024-09-17 04:28:53,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=549935.8333333334, ans=0.0 2024-09-17 04:29:00,655 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.124e+02 2.282e+02 2.449e+02 3.157e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-17 04:29:09,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=549964.1666666666, ans=0.025 2024-09-17 04:29:30,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=549992.5, ans=0.125 2024-09-17 04:29:36,896 INFO [train.py:1198] (0/2) Epoch 31, batch 2400, loss[loss=0.1834, ctc_loss=0.1202, cr_loss=0.3159, over 19457.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1497, cr_loss=0.3726, over 4105026.72 frames. ], batch size: 43, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:29:52,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=550049.1666666666, ans=0.125 2024-09-17 04:30:09,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=550077.5, ans=0.0 2024-09-17 04:30:11,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=550077.5, ans=0.125 2024-09-17 04:30:13,733 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-17 04:30:14,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=550077.5, ans=0.025 2024-09-17 04:30:31,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=550105.8333333334, ans=0.125 2024-09-17 04:30:52,184 INFO [train.py:1198] (0/2) Epoch 31, batch 2450, loss[loss=0.2376, ctc_loss=0.1609, cr_loss=0.3832, over 21005.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1497, cr_loss=0.3725, over 4103320.87 frames. ], batch size: 63, lr: 2.68e-03, grad_scale: 32.0 2024-09-17 04:30:55,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=550162.5, ans=0.125 2024-09-17 04:30:55,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=550162.5, ans=0.0 2024-09-17 04:31:10,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550190.8333333334, ans=0.1 2024-09-17 04:31:31,141 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.172e+02 2.281e+02 2.421e+02 5.125e+02, threshold=4.561e+02, percent-clipped=1.0 2024-09-17 04:31:44,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550247.5, ans=0.1 2024-09-17 04:32:05,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=550304.1666666666, ans=0.1 2024-09-17 04:32:07,197 INFO [train.py:1198] (0/2) Epoch 31, batch 2500, loss[loss=0.2425, ctc_loss=0.1631, cr_loss=0.3972, over 21005.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1501, cr_loss=0.3731, over 4087436.76 frames. ], batch size: 61, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:32:16,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=550304.1666666666, ans=0.125 2024-09-17 04:32:48,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=550360.8333333334, ans=0.0 2024-09-17 04:32:58,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=550389.1666666666, ans=0.2 2024-09-17 04:33:03,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=550389.1666666666, ans=0.0 2024-09-17 04:33:12,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=550417.5, ans=0.125 2024-09-17 04:33:24,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=550445.8333333334, ans=0.05 2024-09-17 04:33:25,560 INFO [train.py:1198] (0/2) Epoch 31, batch 2550, loss[loss=0.2296, ctc_loss=0.1539, cr_loss=0.3784, over 20270.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1502, cr_loss=0.3732, over 4087477.18 frames. ], batch size: 74, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 04:33:40,140 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-09-17 04:33:42,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=550474.1666666666, ans=0.0 2024-09-17 04:34:06,403 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.119e+02 2.280e+02 2.422e+02 3.001e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 04:34:23,218 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:34:27,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=550559.1666666666, ans=0.125 2024-09-17 04:34:43,802 INFO [train.py:1198] (0/2) Epoch 31, batch 2600, loss[loss=0.2352, ctc_loss=0.1572, cr_loss=0.3898, over 20974.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1503, cr_loss=0.3734, over 4082849.61 frames. ], batch size: 64, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 04:34:57,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=550615.8333333334, ans=0.025 2024-09-17 04:35:06,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=550615.8333333334, ans=0.125 2024-09-17 04:35:40,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=550672.5, ans=0.125 2024-09-17 04:35:53,057 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.73 vs. limit=15.0 2024-09-17 04:35:59,574 INFO [train.py:1198] (0/2) Epoch 31, batch 2650, loss[loss=0.2138, ctc_loss=0.1455, cr_loss=0.3416, over 20560.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1503, cr_loss=0.3725, over 4085087.20 frames. ], batch size: 75, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 04:36:31,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=550785.8333333334, ans=0.0 2024-09-17 04:36:40,638 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.154e+02 2.329e+02 2.493e+02 4.391e+02, threshold=4.658e+02, percent-clipped=0.0 2024-09-17 04:36:51,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=550814.1666666666, ans=0.0 2024-09-17 04:36:52,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=550814.1666666666, ans=0.0 2024-09-17 04:37:15,069 INFO [train.py:1198] (0/2) Epoch 31, batch 2700, loss[loss=0.2613, ctc_loss=0.1838, cr_loss=0.3874, over 14394.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1506, cr_loss=0.3726, over 4077128.96 frames. ], batch size: 150, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 04:38:12,877 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:38:33,623 INFO [train.py:1198] (0/2) Epoch 31, batch 2750, loss[loss=0.2561, ctc_loss=0.1699, cr_loss=0.4309, over 20939.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1506, cr_loss=0.3731, over 4075774.29 frames. ], batch size: 64, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 04:38:54,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=551040.8333333334, ans=0.0 2024-09-17 04:39:13,862 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.177e+02 2.282e+02 2.437e+02 4.059e+02, threshold=4.563e+02, percent-clipped=0.0 2024-09-17 04:39:30,997 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-09-17 04:39:39,971 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:39:44,262 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:39:48,317 INFO [train.py:1198] (0/2) Epoch 31, batch 2800, loss[loss=0.2209, ctc_loss=0.1467, cr_loss=0.3709, over 20783.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.151, cr_loss=0.3738, over 4067239.31 frames. ], batch size: 56, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:41:06,999 INFO [train.py:1198] (0/2) Epoch 31, batch 2850, loss[loss=0.2011, ctc_loss=0.1325, cr_loss=0.3432, over 20869.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1515, cr_loss=0.3742, over 4052770.47 frames. ], batch size: 54, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:41:22,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2024-09-17 04:41:37,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2024-09-17 04:41:47,973 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.152e+02 2.291e+02 2.433e+02 3.295e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-17 04:42:00,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=551380.8333333334, ans=0.125 2024-09-17 04:42:22,929 INFO [train.py:1198] (0/2) Epoch 31, batch 2900, loss[loss=0.2203, ctc_loss=0.1468, cr_loss=0.3675, over 21055.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1506, cr_loss=0.3732, over 4071787.73 frames. ], batch size: 62, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:42:26,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=551437.5, ans=0.2 2024-09-17 04:42:55,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=551494.1666666666, ans=0.0 2024-09-17 04:43:00,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=551494.1666666666, ans=0.1 2024-09-17 04:43:10,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=551522.5, ans=0.0 2024-09-17 04:43:24,371 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-09-17 04:43:27,485 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=15.0 2024-09-17 04:43:30,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=551550.8333333334, ans=0.0 2024-09-17 04:43:34,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=551550.8333333334, ans=0.04949747468305833 2024-09-17 04:43:38,870 INFO [train.py:1198] (0/2) Epoch 31, batch 2950, loss[loss=0.1983, ctc_loss=0.13, cr_loss=0.3415, over 20997.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1508, cr_loss=0.3738, over 4077829.48 frames. ], batch size: 52, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:44:19,781 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:44:22,333 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.171e+02 2.322e+02 2.486e+02 3.288e+02, threshold=4.645e+02, percent-clipped=0.0 2024-09-17 04:44:29,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=551664.1666666666, ans=0.0 2024-09-17 04:44:56,183 INFO [train.py:1198] (0/2) Epoch 31, batch 3000, loss[loss=0.2049, ctc_loss=0.1364, cr_loss=0.3424, over 20002.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1509, cr_loss=0.3743, over 4089605.09 frames. ], batch size: 44, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:44:56,184 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 04:45:16,856 INFO [train.py:1230] (0/2) Epoch 31, validation: loss=0.04047, ctc_loss=0.04047, cr_loss=1.207e-14, over 944034.00 frames. 2024-09-17 04:45:16,857 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 04:45:47,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2024-09-17 04:45:56,705 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2024-09-17 04:46:22,397 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-09-17 04:46:23,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=551834.1666666666, ans=0.0 2024-09-17 04:46:35,414 INFO [train.py:1198] (0/2) Epoch 31, batch 3050, loss[loss=0.2171, ctc_loss=0.1471, cr_loss=0.3498, over 20714.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1498, cr_loss=0.373, over 4097333.63 frames. ], batch size: 71, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:47:03,528 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=12.0 2024-09-17 04:47:06,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=551919.1666666666, ans=0.025 2024-09-17 04:47:16,568 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.790e+02 2.157e+02 2.283e+02 2.453e+02 3.621e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-17 04:47:21,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=551947.5, ans=0.125 2024-09-17 04:47:28,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=551947.5, ans=0.125 2024-09-17 04:47:39,646 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:47:43,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=551975.8333333334, ans=0.0 2024-09-17 04:47:50,925 INFO [train.py:1198] (0/2) Epoch 31, batch 3100, loss[loss=0.2671, ctc_loss=0.1859, cr_loss=0.4061, over 18205.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1499, cr_loss=0.3723, over 4092538.33 frames. ], batch size: 108, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:48:30,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=552060.8333333334, ans=0.04949747468305833 2024-09-17 04:48:37,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=552089.1666666666, ans=0.0 2024-09-17 04:48:40,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=552089.1666666666, ans=0.125 2024-09-17 04:49:06,391 INFO [train.py:1198] (0/2) Epoch 31, batch 3150, loss[loss=0.2344, ctc_loss=0.1552, cr_loss=0.3958, over 20958.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1498, cr_loss=0.3726, over 4098307.72 frames. ], batch size: 58, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:49:06,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=552145.8333333334, ans=0.125 2024-09-17 04:49:48,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=552202.5, ans=0.2 2024-09-17 04:49:49,878 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.192e+02 2.298e+02 2.459e+02 5.368e+02, threshold=4.595e+02, percent-clipped=1.0 2024-09-17 04:50:05,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552230.8333333334, ans=0.1 2024-09-17 04:50:16,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=552259.1666666666, ans=0.0 2024-09-17 04:50:24,974 INFO [train.py:1198] (0/2) Epoch 31, batch 3200, loss[loss=0.229, ctc_loss=0.1548, cr_loss=0.3708, over 20942.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1505, cr_loss=0.3734, over 4089541.75 frames. ], batch size: 60, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:50:40,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552315.8333333334, ans=0.1 2024-09-17 04:51:12,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552372.5, ans=0.1 2024-09-17 04:51:24,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=552372.5, ans=0.0 2024-09-17 04:51:43,489 INFO [train.py:1198] (0/2) Epoch 31, batch 3250, loss[loss=0.2277, ctc_loss=0.155, cr_loss=0.3631, over 20955.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1505, cr_loss=0.3739, over 4097332.40 frames. ], batch size: 58, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:51:45,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=552429.1666666666, ans=15.0 2024-09-17 04:51:46,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=552429.1666666666, ans=0.025 2024-09-17 04:51:55,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=552429.1666666666, ans=0.125 2024-09-17 04:52:07,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=552457.5, ans=0.2 2024-09-17 04:52:09,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=552457.5, ans=0.2 2024-09-17 04:52:14,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2024-09-17 04:52:24,169 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.773e+02 2.170e+02 2.288e+02 2.542e+02 4.991e+02, threshold=4.575e+02, percent-clipped=1.0 2024-09-17 04:52:59,075 INFO [train.py:1198] (0/2) Epoch 31, batch 3300, loss[loss=0.218, ctc_loss=0.1429, cr_loss=0.3755, over 20886.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1505, cr_loss=0.3745, over 4098322.58 frames. ], batch size: 54, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:53:05,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=552570.8333333334, ans=0.2 2024-09-17 04:53:10,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=552570.8333333334, ans=0.125 2024-09-17 04:53:28,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=552627.5, ans=0.125 2024-09-17 04:53:48,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=552655.8333333334, ans=0.0 2024-09-17 04:54:00,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=552684.1666666666, ans=0.2 2024-09-17 04:54:04,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=552684.1666666666, ans=0.125 2024-09-17 04:54:07,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=552684.1666666666, ans=0.0 2024-09-17 04:54:15,005 INFO [train.py:1198] (0/2) Epoch 31, batch 3350, loss[loss=0.2651, ctc_loss=0.1802, cr_loss=0.4244, over 20634.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1499, cr_loss=0.3731, over 4099166.30 frames. ], batch size: 68, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:54:30,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=552740.8333333334, ans=0.2 2024-09-17 04:54:43,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=552740.8333333334, ans=0.125 2024-09-17 04:54:47,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=552769.1666666666, ans=0.025 2024-09-17 04:54:55,960 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.153e+02 2.240e+02 2.345e+02 3.285e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-17 04:54:59,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=552797.5, ans=0.125 2024-09-17 04:55:28,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=552825.8333333334, ans=0.125 2024-09-17 04:55:33,754 INFO [train.py:1198] (0/2) Epoch 31, batch 3400, loss[loss=0.1972, ctc_loss=0.1288, cr_loss=0.3416, over 20946.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1505, cr_loss=0.3745, over 4103502.30 frames. ], batch size: 50, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:55:37,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=552854.1666666666, ans=0.025 2024-09-17 04:56:17,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=12.0 2024-09-17 04:56:52,194 INFO [train.py:1198] (0/2) Epoch 31, batch 3450, loss[loss=0.2655, ctc_loss=0.1868, cr_loss=0.3935, over 14326.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1514, cr_loss=0.376, over 4085755.83 frames. ], batch size: 149, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:57:08,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=553024.1666666666, ans=22.5 2024-09-17 04:57:24,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553052.5, ans=0.1 2024-09-17 04:57:33,583 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.132e+02 2.281e+02 2.481e+02 3.658e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-17 04:58:01,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2024-09-17 04:58:05,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=553109.1666666666, ans=0.0 2024-09-17 04:58:08,124 INFO [train.py:1198] (0/2) Epoch 31, batch 3500, loss[loss=0.1977, ctc_loss=0.1316, cr_loss=0.3305, over 20980.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1518, cr_loss=0.3756, over 4052075.32 frames. ], batch size: 55, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:58:37,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=22.5 2024-09-17 04:58:53,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553222.5, ans=0.1 2024-09-17 04:58:59,201 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:59:09,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553250.8333333334, ans=0.1 2024-09-17 04:59:24,509 INFO [train.py:1198] (0/2) Epoch 31, batch 3550, loss[loss=0.2408, ctc_loss=0.1629, cr_loss=0.3897, over 20824.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1515, cr_loss=0.3758, over 4051022.73 frames. ], batch size: 65, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 04:59:43,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-09-17 05:00:05,224 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.151e+02 2.293e+02 2.461e+02 6.303e+02, threshold=4.586e+02, percent-clipped=1.0 2024-09-17 05:00:14,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553364.1666666666, ans=0.1 2024-09-17 05:00:23,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=553392.5, ans=0.0 2024-09-17 05:00:25,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=553392.5, ans=0.125 2024-09-17 05:00:33,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=553392.5, ans=0.125 2024-09-17 05:00:42,616 INFO [train.py:1198] (0/2) Epoch 31, batch 3600, loss[loss=0.2364, ctc_loss=0.16, cr_loss=0.3819, over 19371.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1519, cr_loss=0.3763, over 4047704.27 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 05:00:52,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=553420.8333333334, ans=0.0 2024-09-17 05:01:14,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=553477.5, ans=0.07 2024-09-17 05:01:43,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=553534.1666666666, ans=0.125 2024-09-17 05:01:58,442 INFO [train.py:1198] (0/2) Epoch 31, batch 3650, loss[loss=0.2337, ctc_loss=0.1551, cr_loss=0.393, over 20693.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1512, cr_loss=0.3753, over 4057926.75 frames. ], batch size: 71, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 05:02:06,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=553562.5, ans=0.125 2024-09-17 05:02:14,519 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=22.5 2024-09-17 05:02:21,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=553590.8333333334, ans=0.125 2024-09-17 05:02:42,324 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.122e+02 2.251e+02 2.423e+02 3.673e+02, threshold=4.502e+02, percent-clipped=0.0 2024-09-17 05:03:02,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=553675.8333333334, ans=0.1 2024-09-17 05:03:14,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=22.5 2024-09-17 05:03:16,916 INFO [train.py:1198] (0/2) Epoch 31, batch 3700, loss[loss=0.2387, ctc_loss=0.1609, cr_loss=0.3891, over 21040.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1512, cr_loss=0.3752, over 4056464.10 frames. ], batch size: 62, lr: 2.67e-03, grad_scale: 32.0 2024-09-17 05:04:23,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=553817.5, ans=0.2 2024-09-17 05:04:24,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=553817.5, ans=0.125 2024-09-17 05:04:31,831 INFO [train.py:1198] (0/2) Epoch 31, batch 3750, loss[loss=0.2527, ctc_loss=0.1681, cr_loss=0.4228, over 18292.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1507, cr_loss=0.3743, over 4045401.68 frames. ], batch size: 108, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 05:04:36,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=553845.8333333334, ans=0.125 2024-09-17 05:05:13,573 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.142e+02 2.232e+02 2.468e+02 3.336e+02, threshold=4.463e+02, percent-clipped=0.0 2024-09-17 05:05:14,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=553902.5, ans=0.0 2024-09-17 05:05:17,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=553930.8333333334, ans=0.0 2024-09-17 05:05:26,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=553930.8333333334, ans=0.04949747468305833 2024-09-17 05:05:33,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=553959.1666666666, ans=0.125 2024-09-17 05:05:46,948 INFO [train.py:1198] (0/2) Epoch 31, batch 3800, loss[loss=0.2649, ctc_loss=0.1791, cr_loss=0.429, over 20966.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.151, cr_loss=0.375, over 4045691.56 frames. ], batch size: 64, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 05:05:47,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=553987.5, ans=0.025 2024-09-17 05:05:50,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=22.5 2024-09-17 05:06:03,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=554015.8333333334, ans=0.0 2024-09-17 05:06:19,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=554044.1666666666, ans=0.0 2024-09-17 05:06:23,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=8.0 2024-09-17 05:06:23,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=554044.1666666666, ans=0.5 2024-09-17 05:06:28,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=554044.1666666666, ans=0.125 2024-09-17 05:06:52,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554100.8333333334, ans=0.1 2024-09-17 05:06:58,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=554100.8333333334, ans=0.95 2024-09-17 05:06:58,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=554100.8333333334, ans=0.125 2024-09-17 05:07:03,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=554100.8333333334, ans=0.125 2024-09-17 05:07:05,736 INFO [train.py:1198] (0/2) Epoch 31, batch 3850, loss[loss=0.2371, ctc_loss=0.1591, cr_loss=0.39, over 20665.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1513, cr_loss=0.3752, over 4045502.00 frames. ], batch size: 66, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 05:07:22,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=554157.5, ans=0.2 2024-09-17 05:07:33,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=554157.5, ans=0.125 2024-09-17 05:07:36,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554185.8333333334, ans=0.1 2024-09-17 05:07:48,307 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.132e+02 2.259e+02 2.463e+02 4.213e+02, threshold=4.517e+02, percent-clipped=0.0 2024-09-17 05:07:56,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=554214.1666666666, ans=0.125 2024-09-17 05:08:24,376 INFO [train.py:1198] (0/2) Epoch 31, batch 3900, loss[loss=0.2093, ctc_loss=0.1396, cr_loss=0.3485, over 21058.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1499, cr_loss=0.3731, over 4062071.92 frames. ], batch size: 56, lr: 2.67e-03, grad_scale: 16.0 2024-09-17 05:08:29,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=554270.8333333334, ans=0.07 2024-09-17 05:08:29,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=554270.8333333334, ans=0.0 2024-09-17 05:08:36,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=554270.8333333334, ans=0.025 2024-09-17 05:09:18,928 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:09:31,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=554384.1666666666, ans=0.025 2024-09-17 05:09:40,112 INFO [train.py:1198] (0/2) Epoch 31, batch 3950, loss[loss=0.225, ctc_loss=0.148, cr_loss=0.3848, over 20882.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1497, cr_loss=0.3729, over 4076269.64 frames. ], batch size: 57, lr: 2.66e-03, grad_scale: 16.0 2024-09-17 05:10:10,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-17 05:10:16,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=554469.1666666666, ans=0.125 2024-09-17 05:10:22,003 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.129e+02 2.258e+02 2.498e+02 3.573e+02, threshold=4.516e+02, percent-clipped=0.0 2024-09-17 05:10:23,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=554497.5, ans=0.125 2024-09-17 05:10:55,415 INFO [train.py:1198] (0/2) Epoch 31, batch 4000, loss[loss=0.195, ctc_loss=0.1273, cr_loss=0.3383, over 20882.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1491, cr_loss=0.372, over 4086185.79 frames. ], batch size: 54, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:10:56,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2024-09-17 05:11:54,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=554667.5, ans=0.0 2024-09-17 05:12:13,620 INFO [train.py:1198] (0/2) Epoch 31, batch 4050, loss[loss=0.2248, ctc_loss=0.1506, cr_loss=0.3709, over 20821.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1505, cr_loss=0.3741, over 4074793.54 frames. ], batch size: 59, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:12:21,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554695.8333333334, ans=0.1 2024-09-17 05:12:53,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=554752.5, ans=0.125 2024-09-17 05:12:55,756 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.197e+02 2.312e+02 2.523e+02 4.675e+02, threshold=4.623e+02, percent-clipped=1.0 2024-09-17 05:13:15,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=554809.1666666666, ans=0.125 2024-09-17 05:13:32,285 INFO [train.py:1198] (0/2) Epoch 31, batch 4100, loss[loss=0.2432, ctc_loss=0.165, cr_loss=0.391, over 21014.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1505, cr_loss=0.374, over 4084632.00 frames. ], batch size: 61, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:13:34,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-17 05:13:44,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=554837.5, ans=0.2 2024-09-17 05:14:22,507 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:14:29,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=554922.5, ans=0.125 2024-09-17 05:14:36,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2024-09-17 05:14:44,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=554950.8333333334, ans=0.0 2024-09-17 05:14:47,510 INFO [train.py:1198] (0/2) Epoch 31, batch 4150, loss[loss=0.2046, ctc_loss=0.1359, cr_loss=0.3437, over 20963.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.15, cr_loss=0.3732, over 4079295.05 frames. ], batch size: 49, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:14:53,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=554979.1666666666, ans=0.125 2024-09-17 05:15:16,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=555035.8333333334, ans=0.0 2024-09-17 05:15:29,600 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.147e+02 2.282e+02 2.495e+02 3.283e+02, threshold=4.564e+02, percent-clipped=0.0 2024-09-17 05:15:34,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2024-09-17 05:15:53,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=555092.5, ans=0.0 2024-09-17 05:15:55,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555092.5, ans=0.1 2024-09-17 05:15:55,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=555092.5, ans=0.025 2024-09-17 05:15:59,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=555092.5, ans=0.2 2024-09-17 05:16:02,543 INFO [train.py:1198] (0/2) Epoch 31, batch 4200, loss[loss=0.256, ctc_loss=0.1742, cr_loss=0.4086, over 20101.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1506, cr_loss=0.374, over 4068177.29 frames. ], batch size: 80, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:16:42,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=555177.5, ans=0.2 2024-09-17 05:17:08,325 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:17:18,555 INFO [train.py:1198] (0/2) Epoch 31, batch 4250, loss[loss=0.1618, ctc_loss=0.1055, cr_loss=0.2818, over 20995.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1511, cr_loss=0.3751, over 4062627.88 frames. ], batch size: 48, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:17:40,020 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:17:58,184 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-196000.pt 2024-09-17 05:18:05,336 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.132e+02 2.267e+02 2.432e+02 3.219e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-17 05:18:38,691 INFO [train.py:1198] (0/2) Epoch 31, batch 4300, loss[loss=0.2057, ctc_loss=0.1346, cr_loss=0.3553, over 20981.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1509, cr_loss=0.3748, over 4078742.35 frames. ], batch size: 55, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:18:54,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=555432.5, ans=0.125 2024-09-17 05:19:08,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-17 05:19:14,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=555460.8333333334, ans=0.0 2024-09-17 05:19:24,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=555460.8333333334, ans=0.0 2024-09-17 05:19:30,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=555489.1666666666, ans=0.0 2024-09-17 05:19:57,542 INFO [train.py:1198] (0/2) Epoch 31, batch 4350, loss[loss=0.2421, ctc_loss=0.163, cr_loss=0.3955, over 19392.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.151, cr_loss=0.3757, over 4090162.95 frames. ], batch size: 90, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:20:32,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=555602.5, ans=0.125 2024-09-17 05:20:39,596 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.174e+02 2.311e+02 2.470e+02 3.545e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-17 05:20:40,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=555602.5, ans=0.2 2024-09-17 05:20:56,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=555659.1666666666, ans=0.125 2024-09-17 05:21:12,902 INFO [train.py:1198] (0/2) Epoch 31, batch 4400, loss[loss=0.2572, ctc_loss=0.1773, cr_loss=0.3997, over 18521.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1511, cr_loss=0.3756, over 4079104.51 frames. ], batch size: 108, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:21:25,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=555687.5, ans=0.0 2024-09-17 05:21:25,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.08 vs. limit=15.0 2024-09-17 05:21:28,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=555715.8333333334, ans=0.125 2024-09-17 05:21:36,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-17 05:22:04,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=555772.5, ans=0.0 2024-09-17 05:22:09,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=555772.5, ans=0.0 2024-09-17 05:22:15,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555800.8333333334, ans=0.1 2024-09-17 05:22:28,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=12.0 2024-09-17 05:22:29,044 INFO [train.py:1198] (0/2) Epoch 31, batch 4450, loss[loss=0.2422, ctc_loss=0.1625, cr_loss=0.3988, over 20944.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.374, over 4091196.98 frames. ], batch size: 60, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:22:53,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.72 vs. limit=10.0 2024-09-17 05:23:11,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.145e+02 2.272e+02 2.416e+02 2.988e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-17 05:23:28,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=555914.1666666666, ans=0.0 2024-09-17 05:23:47,774 INFO [train.py:1198] (0/2) Epoch 31, batch 4500, loss[loss=0.2463, ctc_loss=0.1647, cr_loss=0.4078, over 20828.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1508, cr_loss=0.3753, over 4090290.92 frames. ], batch size: 59, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:23:51,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=555970.8333333334, ans=0.0 2024-09-17 05:23:52,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=555970.8333333334, ans=0.0 2024-09-17 05:25:03,812 INFO [train.py:1198] (0/2) Epoch 31, batch 4550, loss[loss=0.2029, ctc_loss=0.1346, cr_loss=0.3415, over 19781.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1512, cr_loss=0.3759, over 4077779.13 frames. ], batch size: 44, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:25:17,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=12.0 2024-09-17 05:25:22,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=556140.8333333334, ans=0.125 2024-09-17 05:25:36,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=556169.1666666666, ans=0.0 2024-09-17 05:25:40,683 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.57 vs. limit=15.0 2024-09-17 05:25:44,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=556169.1666666666, ans=0.0 2024-09-17 05:25:48,799 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.157e+02 2.260e+02 2.457e+02 3.814e+02, threshold=4.520e+02, percent-clipped=0.0 2024-09-17 05:26:02,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556197.5, ans=0.1 2024-09-17 05:26:22,093 INFO [train.py:1198] (0/2) Epoch 31, batch 4600, loss[loss=0.209, ctc_loss=0.1413, cr_loss=0.3387, over 20937.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.3739, over 4070977.52 frames. ], batch size: 50, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:26:23,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556254.1666666666, ans=0.1 2024-09-17 05:26:25,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=556254.1666666666, ans=0.125 2024-09-17 05:26:50,113 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:27:02,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=556310.8333333334, ans=0.125 2024-09-17 05:27:11,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=556339.1666666666, ans=0.125 2024-09-17 05:27:21,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=556367.5, ans=0.0 2024-09-17 05:27:35,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=556367.5, ans=0.025 2024-09-17 05:27:37,748 INFO [train.py:1198] (0/2) Epoch 31, batch 4650, loss[loss=0.2308, ctc_loss=0.1532, cr_loss=0.388, over 21073.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1501, cr_loss=0.3729, over 4070795.79 frames. ], batch size: 59, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:27:44,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556395.8333333334, ans=0.1 2024-09-17 05:27:45,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=556395.8333333334, ans=0.025 2024-09-17 05:27:47,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=556395.8333333334, ans=0.0 2024-09-17 05:27:57,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=556424.1666666666, ans=0.2 2024-09-17 05:28:20,167 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.132e+02 2.237e+02 2.361e+02 3.543e+02, threshold=4.475e+02, percent-clipped=0.0 2024-09-17 05:28:34,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=556480.8333333334, ans=0.2 2024-09-17 05:28:53,544 INFO [train.py:1198] (0/2) Epoch 31, batch 4700, loss[loss=0.2316, ctc_loss=0.1588, cr_loss=0.3642, over 20270.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1503, cr_loss=0.3733, over 4070072.74 frames. ], batch size: 74, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:29:22,701 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=12.0 2024-09-17 05:29:33,242 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-17 05:29:34,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=556594.1666666666, ans=0.035 2024-09-17 05:29:40,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=556622.5, ans=0.125 2024-09-17 05:29:56,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-09-17 05:30:12,036 INFO [train.py:1198] (0/2) Epoch 31, batch 4750, loss[loss=0.1895, ctc_loss=0.1234, cr_loss=0.3304, over 20995.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1502, cr_loss=0.3733, over 4079742.37 frames. ], batch size: 52, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:30:19,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=556679.1666666666, ans=0.2 2024-09-17 05:30:38,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=556707.5, ans=0.125 2024-09-17 05:30:50,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=556735.8333333334, ans=0.2 2024-09-17 05:30:57,298 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.170e+02 2.302e+02 2.491e+02 3.115e+02, threshold=4.604e+02, percent-clipped=0.0 2024-09-17 05:31:00,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=556764.1666666666, ans=0.09899494936611666 2024-09-17 05:31:08,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=556764.1666666666, ans=0.125 2024-09-17 05:31:11,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=556764.1666666666, ans=0.125 2024-09-17 05:31:30,648 INFO [train.py:1198] (0/2) Epoch 31, batch 4800, loss[loss=0.1915, ctc_loss=0.1257, cr_loss=0.329, over 20954.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1505, cr_loss=0.3739, over 4075644.51 frames. ], batch size: 49, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:32:21,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2024-09-17 05:32:36,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=556934.1666666666, ans=0.0 2024-09-17 05:32:46,173 INFO [train.py:1198] (0/2) Epoch 31, batch 4850, loss[loss=0.2368, ctc_loss=0.1573, cr_loss=0.3975, over 21030.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1506, cr_loss=0.3754, over 4087252.10 frames. ], batch size: 61, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:33:02,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.16 vs. limit=6.0 2024-09-17 05:33:17,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2024-09-17 05:33:23,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=557019.1666666666, ans=0.125 2024-09-17 05:33:25,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2024-09-17 05:33:26,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=557019.1666666666, ans=0.125 2024-09-17 05:33:28,875 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.148e+02 2.306e+02 2.446e+02 3.239e+02, threshold=4.612e+02, percent-clipped=0.0 2024-09-17 05:33:47,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-17 05:33:51,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=557075.8333333334, ans=0.125 2024-09-17 05:34:01,315 INFO [train.py:1198] (0/2) Epoch 31, batch 4900, loss[loss=0.2253, ctc_loss=0.1513, cr_loss=0.3699, over 20941.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1513, cr_loss=0.3769, over 4089912.80 frames. ], batch size: 64, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:34:19,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=557132.5, ans=15.0 2024-09-17 05:34:21,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=557132.5, ans=0.125 2024-09-17 05:34:42,316 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-17 05:34:59,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=557217.5, ans=0.0 2024-09-17 05:35:02,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-09-17 05:35:15,702 INFO [train.py:1198] (0/2) Epoch 31, batch 4950, loss[loss=0.2072, ctc_loss=0.1367, cr_loss=0.3522, over 20789.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1502, cr_loss=0.3742, over 4105017.55 frames. ], batch size: 53, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:35:22,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=557245.8333333334, ans=0.2 2024-09-17 05:35:47,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=557302.5, ans=0.2 2024-09-17 05:35:58,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557302.5, ans=0.1 2024-09-17 05:36:00,795 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.181e+02 2.267e+02 2.406e+02 3.577e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-17 05:36:05,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=557330.8333333334, ans=0.125 2024-09-17 05:36:19,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=557359.1666666666, ans=0.0 2024-09-17 05:36:34,063 INFO [train.py:1198] (0/2) Epoch 31, batch 5000, loss[loss=0.203, ctc_loss=0.1355, cr_loss=0.3375, over 21066.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1492, cr_loss=0.3731, over 4111768.47 frames. ], batch size: 53, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:37:24,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=557472.5, ans=0.2 2024-09-17 05:37:48,145 INFO [train.py:1198] (0/2) Epoch 31, batch 5050, loss[loss=0.2226, ctc_loss=0.1467, cr_loss=0.3793, over 20870.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1492, cr_loss=0.3733, over 4106573.43 frames. ], batch size: 57, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:38:00,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557529.1666666666, ans=0.1 2024-09-17 05:38:16,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=557585.8333333334, ans=0.0 2024-09-17 05:38:29,806 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.123e+02 2.260e+02 2.429e+02 7.931e+02, threshold=4.519e+02, percent-clipped=1.0 2024-09-17 05:38:32,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=22.5 2024-09-17 05:39:05,141 INFO [train.py:1198] (0/2) Epoch 31, batch 5100, loss[loss=0.1989, ctc_loss=0.1312, cr_loss=0.3385, over 21059.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.15, cr_loss=0.3741, over 4099397.01 frames. ], batch size: 56, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:39:14,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=557670.8333333334, ans=0.2 2024-09-17 05:40:19,411 INFO [train.py:1198] (0/2) Epoch 31, batch 5150, loss[loss=0.2208, ctc_loss=0.1461, cr_loss=0.3734, over 21049.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1506, cr_loss=0.3754, over 4094099.67 frames. ], batch size: 62, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:40:31,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=557812.5, ans=0.0 2024-09-17 05:40:40,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=557840.8333333334, ans=0.125 2024-09-17 05:41:01,610 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.161e+02 2.291e+02 2.422e+02 3.607e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-17 05:41:34,431 INFO [train.py:1198] (0/2) Epoch 31, batch 5200, loss[loss=0.23, ctc_loss=0.1505, cr_loss=0.3975, over 20929.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1498, cr_loss=0.3743, over 4109493.38 frames. ], batch size: 60, lr: 2.66e-03, grad_scale: 32.0 2024-09-17 05:41:43,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557954.1666666666, ans=0.1 2024-09-17 05:41:56,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=557982.5, ans=0.125 2024-09-17 05:41:59,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=557982.5, ans=0.035 2024-09-17 05:42:04,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=558010.8333333334, ans=0.5 2024-09-17 05:42:38,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=558067.5, ans=0.015 2024-09-17 05:42:48,828 INFO [train.py:1198] (0/2) Epoch 31, batch 5250, loss[loss=0.1919, ctc_loss=0.1261, cr_loss=0.3292, over 20937.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1499, cr_loss=0.3744, over 4107589.90 frames. ], batch size: 50, lr: 2.66e-03, grad_scale: 16.0 2024-09-17 05:43:31,581 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.151e+02 2.267e+02 2.415e+02 5.981e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-17 05:44:02,422 INFO [train.py:1198] (0/2) Epoch 31, batch 5300, loss[loss=0.2367, ctc_loss=0.1589, cr_loss=0.3888, over 20869.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1492, cr_loss=0.3731, over 4104554.45 frames. ], batch size: 57, lr: 2.66e-03, grad_scale: 16.0 2024-09-17 05:44:05,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558237.5, ans=0.1 2024-09-17 05:44:07,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=558237.5, ans=0.0 2024-09-17 05:44:45,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=558322.5, ans=0.0 2024-09-17 05:44:53,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=558322.5, ans=0.0 2024-09-17 05:45:12,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=558350.8333333334, ans=0.0 2024-09-17 05:45:16,911 INFO [train.py:1198] (0/2) Epoch 31, batch 5350, loss[loss=0.2594, ctc_loss=0.1764, cr_loss=0.4152, over 18388.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1489, cr_loss=0.3726, over 4102270.54 frames. ], batch size: 108, lr: 2.66e-03, grad_scale: 16.0 2024-09-17 05:45:18,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=558379.1666666666, ans=0.125 2024-09-17 05:45:59,922 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.054e-03 2024-09-17 05:46:02,503 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.188e+02 2.311e+02 2.479e+02 3.936e+02, threshold=4.621e+02, percent-clipped=1.0 2024-09-17 05:46:13,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=558464.1666666666, ans=0.2 2024-09-17 05:46:33,783 INFO [train.py:1198] (0/2) Epoch 31, batch 5400, loss[loss=0.2205, ctc_loss=0.1469, cr_loss=0.3681, over 20964.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1498, cr_loss=0.3747, over 4097781.29 frames. ], batch size: 58, lr: 2.66e-03, grad_scale: 16.0 2024-09-17 05:46:45,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=558520.8333333334, ans=0.125 2024-09-17 05:47:06,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=558577.5, ans=0.125 2024-09-17 05:47:13,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=558577.5, ans=0.125 2024-09-17 05:47:14,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=558577.5, ans=0.1 2024-09-17 05:47:36,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-17 05:47:37,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558634.1666666666, ans=0.1 2024-09-17 05:47:47,311 INFO [train.py:1198] (0/2) Epoch 31, batch 5450, loss[loss=0.2109, ctc_loss=0.1425, cr_loss=0.3419, over 21042.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1493, cr_loss=0.3736, over 4098211.31 frames. ], batch size: 53, lr: 2.65e-03, grad_scale: 16.0 2024-09-17 05:48:05,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=558690.8333333334, ans=0.2 2024-09-17 05:48:09,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=558690.8333333334, ans=0.125 2024-09-17 05:48:21,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558719.1666666666, ans=0.1 2024-09-17 05:48:32,930 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.211e+02 2.314e+02 2.502e+02 4.686e+02, threshold=4.629e+02, percent-clipped=1.0 2024-09-17 05:48:58,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=558775.8333333334, ans=0.125 2024-09-17 05:48:59,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=558775.8333333334, ans=0.07 2024-09-17 05:49:04,052 INFO [train.py:1198] (0/2) Epoch 31, batch 5500, loss[loss=0.2342, ctc_loss=0.1592, cr_loss=0.3753, over 20669.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1488, cr_loss=0.3726, over 4108755.70 frames. ], batch size: 71, lr: 2.65e-03, grad_scale: 16.0 2024-09-17 05:49:11,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=558804.1666666666, ans=0.0 2024-09-17 05:49:13,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=22.5 2024-09-17 05:49:40,002 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:49:44,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=558860.8333333334, ans=0.125 2024-09-17 05:50:06,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=558917.5, ans=0.125 2024-09-17 05:50:18,532 INFO [train.py:1198] (0/2) Epoch 31, batch 5550, loss[loss=0.2199, ctc_loss=0.1475, cr_loss=0.3621, over 21077.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1496, cr_loss=0.3741, over 4107431.28 frames. ], batch size: 59, lr: 2.65e-03, grad_scale: 16.0 2024-09-17 05:50:53,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2024-09-17 05:51:01,813 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.141e+02 2.293e+02 2.462e+02 3.507e+02, threshold=4.586e+02, percent-clipped=0.0 2024-09-17 05:51:32,969 INFO [train.py:1198] (0/2) Epoch 31, batch 5600, loss[loss=0.2439, ctc_loss=0.1609, cr_loss=0.4147, over 20942.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1498, cr_loss=0.3739, over 4099116.68 frames. ], batch size: 60, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:51:39,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2024-09-17 05:51:41,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559087.5, ans=0.1 2024-09-17 05:52:22,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=559172.5, ans=0.125 2024-09-17 05:52:47,400 INFO [train.py:1198] (0/2) Epoch 31, batch 5650, loss[loss=0.1672, ctc_loss=0.1084, cr_loss=0.2938, over 19898.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.15, cr_loss=0.3736, over 4098786.74 frames. ], batch size: 44, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:52:55,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=559229.1666666666, ans=0.0 2024-09-17 05:52:58,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-17 05:53:06,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=559257.5, ans=0.125 2024-09-17 05:53:17,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=559285.8333333334, ans=0.0 2024-09-17 05:53:30,805 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.165e+02 2.280e+02 2.460e+02 3.302e+02, threshold=4.561e+02, percent-clipped=0.0 2024-09-17 05:54:01,625 INFO [train.py:1198] (0/2) Epoch 31, batch 5700, loss[loss=0.2346, ctc_loss=0.1546, cr_loss=0.4, over 20355.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1503, cr_loss=0.3743, over 4092282.41 frames. ], batch size: 74, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:55:13,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=559484.1666666666, ans=0.0 2024-09-17 05:55:19,388 INFO [train.py:1198] (0/2) Epoch 31, batch 5750, loss[loss=0.2054, ctc_loss=0.1347, cr_loss=0.3535, over 20278.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1497, cr_loss=0.3737, over 4107700.71 frames. ], batch size: 45, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:55:28,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-09-17 05:56:02,380 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.173e+02 2.324e+02 2.491e+02 3.999e+02, threshold=4.648e+02, percent-clipped=0.0 2024-09-17 05:56:07,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.24 vs. limit=10.0 2024-09-17 05:56:11,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=559597.5, ans=0.125 2024-09-17 05:56:28,494 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=15.0 2024-09-17 05:56:33,436 INFO [train.py:1198] (0/2) Epoch 31, batch 5800, loss[loss=0.2023, ctc_loss=0.132, cr_loss=0.3515, over 20863.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1505, cr_loss=0.375, over 4094181.32 frames. ], batch size: 57, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:57:10,311 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:57:15,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=559710.8333333334, ans=0.0 2024-09-17 05:57:49,458 INFO [train.py:1198] (0/2) Epoch 31, batch 5850, loss[loss=0.2095, ctc_loss=0.1398, cr_loss=0.3483, over 20828.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1515, cr_loss=0.3769, over 4086415.56 frames. ], batch size: 59, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:58:18,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-17 05:58:23,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=559852.5, ans=0.2 2024-09-17 05:58:32,721 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.175e+02 2.331e+02 2.525e+02 3.423e+02, threshold=4.662e+02, percent-clipped=0.0 2024-09-17 05:59:03,938 INFO [train.py:1198] (0/2) Epoch 31, batch 5900, loss[loss=0.2564, ctc_loss=0.1768, cr_loss=0.3978, over 20679.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1519, cr_loss=0.3777, over 4091472.58 frames. ], batch size: 68, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 05:59:13,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=559937.5, ans=0.1 2024-09-17 05:59:52,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=560022.5, ans=0.2 2024-09-17 06:00:05,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=560050.8333333334, ans=0.025 2024-09-17 06:00:18,613 INFO [train.py:1198] (0/2) Epoch 31, batch 5950, loss[loss=0.2372, ctc_loss=0.1601, cr_loss=0.3854, over 21016.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1513, cr_loss=0.3766, over 4097504.26 frames. ], batch size: 63, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:00:19,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=560079.1666666666, ans=0.125 2024-09-17 06:00:26,640 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-09-17 06:00:33,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=560107.5, ans=0.125 2024-09-17 06:00:55,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=560135.8333333334, ans=0.0 2024-09-17 06:01:01,379 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.192e+02 2.322e+02 2.526e+02 3.100e+02, threshold=4.643e+02, percent-clipped=0.0 2024-09-17 06:01:07,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=560164.1666666666, ans=0.95 2024-09-17 06:01:32,611 INFO [train.py:1198] (0/2) Epoch 31, batch 6000, loss[loss=0.1918, ctc_loss=0.1253, cr_loss=0.3323, over 21063.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.1519, cr_loss=0.3773, over 4096337.75 frames. ], batch size: 53, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:01:32,612 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 06:01:54,231 INFO [train.py:1230] (0/2) Epoch 31, validation: loss=0.04065, ctc_loss=0.04065, cr_loss=1.26e-14, over 944034.00 frames. 2024-09-17 06:01:54,233 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 06:02:10,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=560249.1666666666, ans=0.125 2024-09-17 06:02:34,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=560277.5, ans=0.0 2024-09-17 06:03:10,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-17 06:03:12,653 INFO [train.py:1198] (0/2) Epoch 31, batch 6050, loss[loss=0.1965, ctc_loss=0.1264, cr_loss=0.3507, over 20967.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1507, cr_loss=0.3746, over 4092047.93 frames. ], batch size: 50, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:03:52,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=560419.1666666666, ans=0.0 2024-09-17 06:03:55,294 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.155e+02 2.299e+02 2.498e+02 3.704e+02, threshold=4.598e+02, percent-clipped=0.0 2024-09-17 06:04:18,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-17 06:04:27,547 INFO [train.py:1198] (0/2) Epoch 31, batch 6100, loss[loss=0.1916, ctc_loss=0.125, cr_loss=0.3329, over 19967.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1508, cr_loss=0.3755, over 4089868.12 frames. ], batch size: 44, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:04:30,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=560504.1666666666, ans=0.04949747468305833 2024-09-17 06:04:36,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-17 06:04:39,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2024-09-17 06:05:12,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=560589.1666666666, ans=0.025 2024-09-17 06:05:23,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=560589.1666666666, ans=0.125 2024-09-17 06:05:34,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=15.0 2024-09-17 06:05:37,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=560617.5, ans=15.0 2024-09-17 06:05:41,079 INFO [train.py:1198] (0/2) Epoch 31, batch 6150, loss[loss=0.2398, ctc_loss=0.1622, cr_loss=0.3881, over 20345.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1506, cr_loss=0.3748, over 4079852.87 frames. ], batch size: 74, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:06:01,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-17 06:06:23,752 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 2.140e+02 2.291e+02 2.415e+02 4.856e+02, threshold=4.582e+02, percent-clipped=1.0 2024-09-17 06:06:32,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=560730.8333333334, ans=0.0 2024-09-17 06:06:54,842 INFO [train.py:1198] (0/2) Epoch 31, batch 6200, loss[loss=0.2001, ctc_loss=0.1317, cr_loss=0.3418, over 21062.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1507, cr_loss=0.3743, over 4068661.77 frames. ], batch size: 53, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:07:27,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=560844.1666666666, ans=0.025 2024-09-17 06:07:40,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=560872.5, ans=0.05 2024-09-17 06:07:59,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=560900.8333333334, ans=0.1 2024-09-17 06:08:07,917 INFO [train.py:1198] (0/2) Epoch 31, batch 6250, loss[loss=0.248, ctc_loss=0.1668, cr_loss=0.4064, over 20017.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1508, cr_loss=0.374, over 4046371.99 frames. ], batch size: 80, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:08:40,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=560985.8333333334, ans=0.125 2024-09-17 06:08:50,570 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.170e+02 2.302e+02 2.558e+02 7.310e+02, threshold=4.603e+02, percent-clipped=1.0 2024-09-17 06:08:58,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561014.1666666666, ans=0.1 2024-09-17 06:09:21,692 INFO [train.py:1198] (0/2) Epoch 31, batch 6300, loss[loss=0.2128, ctc_loss=0.1401, cr_loss=0.3633, over 21034.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1515, cr_loss=0.3736, over 3988867.34 frames. ], batch size: 62, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:09:23,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=561070.8333333334, ans=0.125 2024-09-17 06:09:25,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=561070.8333333334, ans=0.0 2024-09-17 06:09:56,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=561127.5, ans=0.2 2024-09-17 06:10:22,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=561184.1666666666, ans=0.0 2024-09-17 06:10:25,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=561184.1666666666, ans=0.035 2024-09-17 06:10:29,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=561184.1666666666, ans=0.0 2024-09-17 06:10:34,970 INFO [train.py:1198] (0/2) Epoch 31, batch 6350, loss[loss=0.3032, ctc_loss=0.2134, cr_loss=0.4489, over 13995.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1546, cr_loss=0.3772, over 3905203.02 frames. ], batch size: 150, lr: 2.65e-03, grad_scale: 32.0 2024-09-17 06:10:50,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=561240.8333333334, ans=0.2 2024-09-17 06:11:01,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=561240.8333333334, ans=0.125 2024-09-17 06:11:15,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=561269.1666666666, ans=0.125 2024-09-17 06:11:16,296 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.313e+02 2.603e+02 2.829e+02 3.684e+02, threshold=5.206e+02, percent-clipped=0.0 2024-09-17 06:11:24,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=561297.5, ans=0.125 2024-09-17 06:11:32,967 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-31.pt 2024-09-17 06:12:20,748 INFO [train.py:1198] (0/2) Epoch 32, batch 0, loss[loss=0.2855, ctc_loss=0.2007, cr_loss=0.424, over 14224.00 frames. ], tot_loss[loss=0.2855, ctc_loss=0.2007, cr_loss=0.424, over 14224.00 frames. ], batch size: 150, lr: 2.61e-03, grad_scale: 32.0 2024-09-17 06:12:20,749 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 06:12:39,137 INFO [train.py:1230] (0/2) Epoch 32, validation: loss=0.04055, ctc_loss=0.04055, cr_loss=1.282e-14, over 944034.00 frames. 2024-09-17 06:12:39,137 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 06:12:39,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-17 06:12:48,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=561328.6666666666, ans=0.2 2024-09-17 06:12:57,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=561357.0, ans=0.07 2024-09-17 06:13:16,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=561385.3333333334, ans=0.125 2024-09-17 06:13:18,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=561385.3333333334, ans=0.0 2024-09-17 06:13:20,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561385.3333333334, ans=0.1 2024-09-17 06:13:39,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=561413.6666666666, ans=0.0 2024-09-17 06:13:49,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=561442.0, ans=0.125 2024-09-17 06:13:52,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=561442.0, ans=0.0 2024-09-17 06:13:57,078 INFO [train.py:1198] (0/2) Epoch 32, batch 50, loss[loss=0.2489, ctc_loss=0.1689, cr_loss=0.4002, over 20696.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.149, cr_loss=0.373, over 932681.49 frames. ], batch size: 71, lr: 2.61e-03, grad_scale: 16.0 2024-09-17 06:14:33,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=561527.0, ans=0.0 2024-09-17 06:14:42,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=561555.3333333334, ans=0.125 2024-09-17 06:14:56,622 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.148e+02 2.334e+02 2.565e+02 8.400e+02, threshold=4.668e+02, percent-clipped=2.0 2024-09-17 06:15:12,804 INFO [train.py:1198] (0/2) Epoch 32, batch 100, loss[loss=0.1989, ctc_loss=0.1282, cr_loss=0.3535, over 21001.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1506, cr_loss=0.3748, over 1629341.15 frames. ], batch size: 48, lr: 2.61e-03, grad_scale: 16.0 2024-09-17 06:15:49,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=561668.6666666666, ans=0.125 2024-09-17 06:16:16,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.93 vs. limit=15.0 2024-09-17 06:16:23,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=22.5 2024-09-17 06:16:31,455 INFO [train.py:1198] (0/2) Epoch 32, batch 150, loss[loss=0.2484, ctc_loss=0.1671, cr_loss=0.4063, over 20685.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.15, cr_loss=0.3737, over 2187153.29 frames. ], batch size: 68, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:17:29,834 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.106e+02 2.273e+02 2.464e+02 3.847e+02, threshold=4.547e+02, percent-clipped=0.0 2024-09-17 06:17:43,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=561867.0, ans=0.0 2024-09-17 06:17:46,427 INFO [train.py:1198] (0/2) Epoch 32, batch 200, loss[loss=0.2234, ctc_loss=0.1472, cr_loss=0.3809, over 20274.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1515, cr_loss=0.3775, over 2619291.05 frames. ], batch size: 74, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:18:15,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561952.0, ans=0.1 2024-09-17 06:18:16,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561952.0, ans=0.1 2024-09-17 06:18:48,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=562008.6666666666, ans=0.125 2024-09-17 06:18:49,107 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:18:49,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=562008.6666666666, ans=0.125 2024-09-17 06:19:02,076 INFO [train.py:1198] (0/2) Epoch 32, batch 250, loss[loss=0.2289, ctc_loss=0.1528, cr_loss=0.3805, over 21019.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1507, cr_loss=0.3758, over 2944186.83 frames. ], batch size: 63, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:19:48,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-09-17 06:20:04,384 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.137e+02 2.243e+02 2.422e+02 3.810e+02, threshold=4.486e+02, percent-clipped=0.0 2024-09-17 06:20:16,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=562150.3333333334, ans=0.125 2024-09-17 06:20:21,169 INFO [train.py:1198] (0/2) Epoch 32, batch 300, loss[loss=0.1965, ctc_loss=0.1284, cr_loss=0.3403, over 20954.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.151, cr_loss=0.3768, over 3211150.49 frames. ], batch size: 50, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:20:23,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=562178.6666666666, ans=0.125 2024-09-17 06:20:32,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=562178.6666666666, ans=0.0 2024-09-17 06:20:41,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=562207.0, ans=0.0 2024-09-17 06:20:45,938 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:21:13,670 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=22.5 2024-09-17 06:21:35,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=562320.3333333334, ans=0.0 2024-09-17 06:21:36,873 INFO [train.py:1198] (0/2) Epoch 32, batch 350, loss[loss=0.2501, ctc_loss=0.1682, cr_loss=0.4095, over 20975.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1513, cr_loss=0.3766, over 3411313.56 frames. ], batch size: 67, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:21:49,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=22.5 2024-09-17 06:22:38,903 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.150e+02 2.271e+02 2.406e+02 3.200e+02, threshold=4.542e+02, percent-clipped=0.0 2024-09-17 06:22:55,553 INFO [train.py:1198] (0/2) Epoch 32, batch 400, loss[loss=0.1861, ctc_loss=0.1216, cr_loss=0.3224, over 21011.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1501, cr_loss=0.3742, over 3557488.30 frames. ], batch size: 52, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:23:01,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=562462.0, ans=0.1 2024-09-17 06:23:05,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=562462.0, ans=0.125 2024-09-17 06:23:50,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=562547.0, ans=0.09899494936611666 2024-09-17 06:23:53,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=562547.0, ans=0.125 2024-09-17 06:24:00,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=562575.3333333334, ans=0.05 2024-09-17 06:24:09,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=562603.6666666666, ans=0.2 2024-09-17 06:24:10,814 INFO [train.py:1198] (0/2) Epoch 32, batch 450, loss[loss=0.2123, ctc_loss=0.1413, cr_loss=0.355, over 21019.00 frames. ], tot_loss[loss=0.2273, ctc_loss=0.152, cr_loss=0.3768, over 3668973.26 frames. ], batch size: 63, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:24:33,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=562632.0, ans=0.0 2024-09-17 06:25:12,921 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.141e+02 2.298e+02 2.521e+02 4.450e+02, threshold=4.595e+02, percent-clipped=0.0 2024-09-17 06:25:29,563 INFO [train.py:1198] (0/2) Epoch 32, batch 500, loss[loss=0.2273, ctc_loss=0.151, cr_loss=0.3812, over 20776.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1505, cr_loss=0.3743, over 3766872.62 frames. ], batch size: 56, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:25:40,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=562745.3333333334, ans=0.125 2024-09-17 06:26:41,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=562858.6666666666, ans=0.125 2024-09-17 06:26:43,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=562887.0, ans=0.0 2024-09-17 06:26:45,023 INFO [train.py:1198] (0/2) Epoch 32, batch 550, loss[loss=0.2394, ctc_loss=0.1575, cr_loss=0.4095, over 20958.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1502, cr_loss=0.3737, over 3837022.37 frames. ], batch size: 64, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:26:55,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=562887.0, ans=0.2 2024-09-17 06:27:13,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-17 06:27:20,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=562943.6666666666, ans=0.0 2024-09-17 06:27:28,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.93 vs. limit=10.0 2024-09-17 06:27:29,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=562972.0, ans=0.125 2024-09-17 06:27:42,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=562972.0, ans=0.125 2024-09-17 06:27:44,112 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.128e+02 2.252e+02 2.422e+02 3.990e+02, threshold=4.505e+02, percent-clipped=0.0 2024-09-17 06:28:01,107 INFO [train.py:1198] (0/2) Epoch 32, batch 600, loss[loss=0.2419, ctc_loss=0.161, cr_loss=0.4045, over 20832.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1494, cr_loss=0.3728, over 3906323.78 frames. ], batch size: 59, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:28:12,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=563028.6666666666, ans=0.1 2024-09-17 06:28:38,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=563085.3333333334, ans=0.125 2024-09-17 06:29:04,757 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:29:08,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.93 vs. limit=15.0 2024-09-17 06:29:19,319 INFO [train.py:1198] (0/2) Epoch 32, batch 650, loss[loss=0.2139, ctc_loss=0.1422, cr_loss=0.3587, over 21089.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1488, cr_loss=0.3719, over 3952106.58 frames. ], batch size: 59, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:30:12,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=563255.3333333334, ans=0.0 2024-09-17 06:30:18,254 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.136e+02 2.277e+02 2.441e+02 2.853e+02, threshold=4.554e+02, percent-clipped=0.0 2024-09-17 06:30:26,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=563283.6666666666, ans=0.07 2024-09-17 06:30:33,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-09-17 06:30:34,640 INFO [train.py:1198] (0/2) Epoch 32, batch 700, loss[loss=0.2279, ctc_loss=0.1501, cr_loss=0.3888, over 20967.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1487, cr_loss=0.3716, over 3976050.59 frames. ], batch size: 58, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:30:36,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=563312.0, ans=0.0 2024-09-17 06:30:42,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563312.0, ans=0.1 2024-09-17 06:31:16,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563368.6666666666, ans=0.1 2024-09-17 06:31:16,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=563368.6666666666, ans=0.125 2024-09-17 06:31:19,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=22.5 2024-09-17 06:31:35,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=563397.0, ans=0.125 2024-09-17 06:31:48,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-09-17 06:31:54,046 INFO [train.py:1198] (0/2) Epoch 32, batch 750, loss[loss=0.2436, ctc_loss=0.1733, cr_loss=0.3515, over 13755.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1488, cr_loss=0.3716, over 4004632.35 frames. ], batch size: 149, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:32:54,699 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.150e+02 2.294e+02 2.462e+02 4.152e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-17 06:33:09,801 INFO [train.py:1198] (0/2) Epoch 32, batch 800, loss[loss=0.2135, ctc_loss=0.1424, cr_loss=0.3553, over 20683.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1493, cr_loss=0.3728, over 4018658.24 frames. ], batch size: 66, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:33:12,513 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=22.5 2024-09-17 06:33:26,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=563623.6666666666, ans=0.1 2024-09-17 06:33:31,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=563623.6666666666, ans=0.125 2024-09-17 06:33:36,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=563623.6666666666, ans=0.2 2024-09-17 06:33:48,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-09-17 06:34:02,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=22.5 2024-09-17 06:34:21,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=563708.6666666666, ans=0.2 2024-09-17 06:34:26,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563708.6666666666, ans=0.1 2024-09-17 06:34:29,029 INFO [train.py:1198] (0/2) Epoch 32, batch 850, loss[loss=0.1973, ctc_loss=0.132, cr_loss=0.3264, over 21052.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1498, cr_loss=0.3732, over 4031573.56 frames. ], batch size: 62, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:34:30,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=563737.0, ans=0.125 2024-09-17 06:34:32,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=563737.0, ans=0.125 2024-09-17 06:35:05,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563793.6666666666, ans=0.1 2024-09-17 06:35:23,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=563822.0, ans=0.0 2024-09-17 06:35:28,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=563850.3333333334, ans=0.125 2024-09-17 06:35:29,425 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.123e+02 2.293e+02 2.439e+02 3.333e+02, threshold=4.585e+02, percent-clipped=0.0 2024-09-17 06:35:44,275 INFO [train.py:1198] (0/2) Epoch 32, batch 900, loss[loss=0.236, ctc_loss=0.1587, cr_loss=0.3867, over 20950.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1498, cr_loss=0.3726, over 4028943.99 frames. ], batch size: 58, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:36:18,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.27 vs. limit=6.0 2024-09-17 06:36:20,049 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:36:51,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=563992.0, ans=0.1 2024-09-17 06:36:57,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=563992.0, ans=0.0 2024-09-17 06:36:58,925 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:37:03,037 INFO [train.py:1198] (0/2) Epoch 32, batch 950, loss[loss=0.2033, ctc_loss=0.1334, cr_loss=0.3495, over 20899.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1501, cr_loss=0.3732, over 4032881.00 frames. ], batch size: 54, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:37:21,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=564048.6666666666, ans=10.0 2024-09-17 06:37:58,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=564105.3333333334, ans=0.05 2024-09-17 06:38:04,528 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.115e+02 2.220e+02 2.319e+02 3.664e+02, threshold=4.441e+02, percent-clipped=0.0 2024-09-17 06:38:18,113 INFO [train.py:1198] (0/2) Epoch 32, batch 1000, loss[loss=0.1958, ctc_loss=0.1302, cr_loss=0.3283, over 21029.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1494, cr_loss=0.3727, over 4047619.24 frames. ], batch size: 52, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:38:46,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=564218.6666666666, ans=0.125 2024-09-17 06:38:47,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=564218.6666666666, ans=0.125 2024-09-17 06:38:56,569 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-17 06:39:05,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=564247.0, ans=0.125 2024-09-17 06:39:06,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=564247.0, ans=0.125 2024-09-17 06:39:36,148 INFO [train.py:1198] (0/2) Epoch 32, batch 1050, loss[loss=0.1996, ctc_loss=0.1305, cr_loss=0.3453, over 19936.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1503, cr_loss=0.3737, over 4035897.25 frames. ], batch size: 44, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:39:45,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=564303.6666666666, ans=0.125 2024-09-17 06:40:00,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=564332.0, ans=0.0 2024-09-17 06:40:02,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-09-17 06:40:07,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.17 vs. limit=12.0 2024-09-17 06:40:29,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564388.6666666666, ans=0.1 2024-09-17 06:40:37,924 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.150e+02 2.287e+02 2.402e+02 5.259e+02, threshold=4.574e+02, percent-clipped=1.0 2024-09-17 06:40:47,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=564417.0, ans=0.0 2024-09-17 06:40:51,460 INFO [train.py:1198] (0/2) Epoch 32, batch 1100, loss[loss=0.2662, ctc_loss=0.1797, cr_loss=0.4327, over 20311.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.3738, over 4051598.25 frames. ], batch size: 74, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:40:56,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=564445.3333333334, ans=0.05 2024-09-17 06:41:02,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564445.3333333334, ans=0.1 2024-09-17 06:41:09,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=564473.6666666666, ans=0.2 2024-09-17 06:41:41,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.90 vs. limit=15.0 2024-09-17 06:42:00,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.68 vs. limit=15.0 2024-09-17 06:42:07,138 INFO [train.py:1198] (0/2) Epoch 32, batch 1150, loss[loss=0.2218, ctc_loss=0.1461, cr_loss=0.3782, over 21017.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1496, cr_loss=0.3729, over 4062866.66 frames. ], batch size: 63, lr: 2.60e-03, grad_scale: 16.0 2024-09-17 06:42:26,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2024-09-17 06:42:34,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=564615.3333333334, ans=0.125 2024-09-17 06:42:45,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2024-09-17 06:42:51,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=564643.6666666666, ans=0.0 2024-09-17 06:42:52,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=564643.6666666666, ans=0.0 2024-09-17 06:43:11,925 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.100e+02 2.251e+02 2.429e+02 2.712e+02, threshold=4.503e+02, percent-clipped=0.0 2024-09-17 06:43:13,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=564700.3333333334, ans=0.5 2024-09-17 06:43:25,173 INFO [train.py:1198] (0/2) Epoch 32, batch 1200, loss[loss=0.2072, ctc_loss=0.1391, cr_loss=0.3408, over 20868.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1497, cr_loss=0.3726, over 4060940.08 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:43:25,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-17 06:43:52,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=564757.0, ans=0.5 2024-09-17 06:43:52,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=564757.0, ans=0.04949747468305833 2024-09-17 06:44:40,189 INFO [train.py:1198] (0/2) Epoch 32, batch 1250, loss[loss=0.1805, ctc_loss=0.1168, cr_loss=0.3186, over 20951.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1492, cr_loss=0.3724, over 4070054.57 frames. ], batch size: 49, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:44:40,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=564870.3333333334, ans=0.125 2024-09-17 06:45:09,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=564898.6666666666, ans=0.125 2024-09-17 06:45:09,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=564898.6666666666, ans=0.0 2024-09-17 06:45:27,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=564955.3333333334, ans=0.125 2024-09-17 06:45:45,141 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.191e+02 2.363e+02 2.536e+02 4.105e+02, threshold=4.726e+02, percent-clipped=0.0 2024-09-17 06:45:58,621 INFO [train.py:1198] (0/2) Epoch 32, batch 1300, loss[loss=0.2098, ctc_loss=0.1373, cr_loss=0.3623, over 20956.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1491, cr_loss=0.3723, over 4083161.72 frames. ], batch size: 50, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:46:07,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=565012.0, ans=10.0 2024-09-17 06:46:27,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=565068.6666666666, ans=0.2 2024-09-17 06:46:32,562 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.15 vs. limit=10.0 2024-09-17 06:46:38,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=565068.6666666666, ans=0.125 2024-09-17 06:46:51,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=565097.0, ans=0.125 2024-09-17 06:47:03,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=565125.3333333334, ans=0.0 2024-09-17 06:47:10,185 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=22.5 2024-09-17 06:47:13,869 INFO [train.py:1198] (0/2) Epoch 32, batch 1350, loss[loss=0.2277, ctc_loss=0.1482, cr_loss=0.3973, over 20894.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1492, cr_loss=0.3728, over 4081742.87 frames. ], batch size: 57, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:48:18,244 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.174e+02 2.279e+02 2.413e+02 3.662e+02, threshold=4.557e+02, percent-clipped=0.0 2024-09-17 06:48:26,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=565267.0, ans=0.1 2024-09-17 06:48:27,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-17 06:48:30,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=565295.3333333334, ans=0.0 2024-09-17 06:48:31,837 INFO [train.py:1198] (0/2) Epoch 32, batch 1400, loss[loss=0.2173, ctc_loss=0.1427, cr_loss=0.3731, over 20961.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1491, cr_loss=0.3722, over 4087623.37 frames. ], batch size: 51, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:48:38,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=565295.3333333334, ans=0.025 2024-09-17 06:48:39,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=565295.3333333334, ans=0.0 2024-09-17 06:48:47,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=565323.6666666666, ans=0.125 2024-09-17 06:48:50,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=565323.6666666666, ans=0.125 2024-09-17 06:49:00,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=565352.0, ans=0.125 2024-09-17 06:49:05,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=565352.0, ans=0.125 2024-09-17 06:49:37,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=565408.6666666666, ans=0.125 2024-09-17 06:49:41,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=565408.6666666666, ans=0.125 2024-09-17 06:49:47,338 INFO [train.py:1198] (0/2) Epoch 32, batch 1450, loss[loss=0.2129, ctc_loss=0.1425, cr_loss=0.3521, over 20972.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1497, cr_loss=0.3721, over 4087559.12 frames. ], batch size: 58, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:50:02,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=565465.3333333334, ans=0.0 2024-09-17 06:50:04,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=565465.3333333334, ans=0.07 2024-09-17 06:50:47,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=565522.0, ans=0.025 2024-09-17 06:50:47,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.13 vs. limit=10.0 2024-09-17 06:50:50,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=565550.3333333334, ans=0.1 2024-09-17 06:50:51,662 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.772e+02 2.127e+02 2.261e+02 2.444e+02 5.232e+02, threshold=4.522e+02, percent-clipped=1.0 2024-09-17 06:50:54,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=565550.3333333334, ans=0.125 2024-09-17 06:51:05,133 INFO [train.py:1198] (0/2) Epoch 32, batch 1500, loss[loss=0.2442, ctc_loss=0.1633, cr_loss=0.4044, over 20772.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1507, cr_loss=0.3737, over 4078353.83 frames. ], batch size: 56, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:52:03,661 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2024-09-17 06:52:11,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=565692.0, ans=0.1 2024-09-17 06:52:20,622 INFO [train.py:1198] (0/2) Epoch 32, batch 1550, loss[loss=0.2282, ctc_loss=0.1525, cr_loss=0.3786, over 20053.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1502, cr_loss=0.3734, over 4084536.17 frames. ], batch size: 80, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:52:24,069 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:53:20,446 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.44 vs. limit=15.0 2024-09-17 06:53:22,246 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.131e+02 2.244e+02 2.401e+02 3.184e+02, threshold=4.488e+02, percent-clipped=0.0 2024-09-17 06:53:37,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=565862.0, ans=0.2 2024-09-17 06:53:38,694 INFO [train.py:1198] (0/2) Epoch 32, batch 1600, loss[loss=0.2185, ctc_loss=0.1442, cr_loss=0.3713, over 21044.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1504, cr_loss=0.374, over 4092263.63 frames. ], batch size: 62, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:54:04,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=565890.3333333334, ans=0.2 2024-09-17 06:54:31,689 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:54:32,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=565947.0, ans=0.2 2024-09-17 06:54:49,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=565975.3333333334, ans=0.125 2024-09-17 06:54:54,000 INFO [train.py:1198] (0/2) Epoch 32, batch 1650, loss[loss=0.2144, ctc_loss=0.1419, cr_loss=0.3624, over 20780.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1501, cr_loss=0.3737, over 4093152.01 frames. ], batch size: 56, lr: 2.60e-03, grad_scale: 32.0 2024-09-17 06:55:02,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=566003.6666666666, ans=12.0 2024-09-17 06:55:03,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=566003.6666666666, ans=0.0 2024-09-17 06:55:03,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=566003.6666666666, ans=0.5 2024-09-17 06:55:09,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=566032.0, ans=0.0 2024-09-17 06:55:55,811 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.136e+02 2.269e+02 2.410e+02 3.416e+02, threshold=4.539e+02, percent-clipped=0.0 2024-09-17 06:56:09,232 INFO [train.py:1198] (0/2) Epoch 32, batch 1700, loss[loss=0.201, ctc_loss=0.1314, cr_loss=0.348, over 20869.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1494, cr_loss=0.3728, over 4099879.15 frames. ], batch size: 57, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 06:56:48,548 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=22.5 2024-09-17 06:57:21,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=566258.6666666666, ans=12.0 2024-09-17 06:57:22,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=566258.6666666666, ans=0.0 2024-09-17 06:57:26,863 INFO [train.py:1198] (0/2) Epoch 32, batch 1750, loss[loss=0.2449, ctc_loss=0.1622, cr_loss=0.4134, over 20640.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.15, cr_loss=0.3743, over 4104389.76 frames. ], batch size: 68, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 06:57:39,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.04 vs. limit=15.0 2024-09-17 06:58:00,333 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:58:29,764 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.163e+02 2.251e+02 2.385e+02 4.582e+02, threshold=4.503e+02, percent-clipped=1.0 2024-09-17 06:58:41,780 INFO [train.py:1198] (0/2) Epoch 32, batch 1800, loss[loss=0.291, ctc_loss=0.2002, cr_loss=0.4539, over 18348.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1492, cr_loss=0.3725, over 4108886.80 frames. ], batch size: 108, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 06:58:52,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=566428.6666666666, ans=0.125 2024-09-17 06:58:54,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=566428.6666666666, ans=0.025 2024-09-17 06:58:55,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=566457.0, ans=0.125 2024-09-17 06:59:33,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=566513.6666666666, ans=0.025 2024-09-17 07:00:00,258 INFO [train.py:1198] (0/2) Epoch 32, batch 1850, loss[loss=0.1783, ctc_loss=0.1142, cr_loss=0.3202, over 21004.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1495, cr_loss=0.373, over 4108416.75 frames. ], batch size: 50, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:00:28,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=566627.0, ans=0.2 2024-09-17 07:00:41,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=566627.0, ans=0.2 2024-09-17 07:00:49,818 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-200000.pt 2024-09-17 07:00:58,911 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-17 07:01:04,178 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.156e+02 2.256e+02 2.428e+02 3.231e+02, threshold=4.513e+02, percent-clipped=0.0 2024-09-17 07:01:16,296 INFO [train.py:1198] (0/2) Epoch 32, batch 1900, loss[loss=0.2462, ctc_loss=0.164, cr_loss=0.4108, over 20864.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1506, cr_loss=0.3755, over 4100264.35 frames. ], batch size: 65, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:01:53,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=566768.6666666666, ans=0.0 2024-09-17 07:02:34,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=566853.6666666666, ans=0.125 2024-09-17 07:02:35,248 INFO [train.py:1198] (0/2) Epoch 32, batch 1950, loss[loss=0.1894, ctc_loss=0.1241, cr_loss=0.3263, over 21053.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1495, cr_loss=0.3741, over 4107663.25 frames. ], batch size: 53, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:02:51,236 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-17 07:03:23,096 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-17 07:03:38,556 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.153e+02 2.247e+02 2.428e+02 3.636e+02, threshold=4.493e+02, percent-clipped=0.0 2024-09-17 07:03:48,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=566967.0, ans=0.0 2024-09-17 07:03:49,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=566995.3333333334, ans=0.125 2024-09-17 07:03:50,997 INFO [train.py:1198] (0/2) Epoch 32, batch 2000, loss[loss=0.1881, ctc_loss=0.1232, cr_loss=0.3248, over 21005.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1493, cr_loss=0.3735, over 4096038.25 frames. ], batch size: 51, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:04:24,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=567052.0, ans=0.1 2024-09-17 07:04:27,680 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:04:53,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=567108.6666666666, ans=0.025 2024-09-17 07:04:58,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=567108.6666666666, ans=0.125 2024-09-17 07:05:09,874 INFO [train.py:1198] (0/2) Epoch 32, batch 2050, loss[loss=0.2225, ctc_loss=0.1472, cr_loss=0.3766, over 21071.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1492, cr_loss=0.3734, over 4101931.42 frames. ], batch size: 59, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:05:35,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=567165.3333333334, ans=0.0 2024-09-17 07:06:14,382 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.181e+02 2.301e+02 2.456e+02 3.502e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-17 07:06:20,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=567250.3333333334, ans=0.125 2024-09-17 07:06:22,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567250.3333333334, ans=0.1 2024-09-17 07:06:24,792 INFO [train.py:1198] (0/2) Epoch 32, batch 2100, loss[loss=0.2278, ctc_loss=0.1493, cr_loss=0.3921, over 20653.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1494, cr_loss=0.3744, over 4114583.62 frames. ], batch size: 68, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:06:28,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567278.6666666666, ans=0.1 2024-09-17 07:07:39,933 INFO [train.py:1198] (0/2) Epoch 32, batch 2150, loss[loss=0.2143, ctc_loss=0.142, cr_loss=0.3614, over 20891.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1487, cr_loss=0.3731, over 4115931.33 frames. ], batch size: 57, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:07:49,881 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-09-17 07:08:14,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=567477.0, ans=0.0 2024-09-17 07:08:32,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=567505.3333333334, ans=0.025 2024-09-17 07:08:47,496 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.136e+02 2.257e+02 2.419e+02 4.178e+02, threshold=4.514e+02, percent-clipped=0.0 2024-09-17 07:08:57,945 INFO [train.py:1198] (0/2) Epoch 32, batch 2200, loss[loss=0.2164, ctc_loss=0.1421, cr_loss=0.3714, over 21012.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.148, cr_loss=0.3723, over 4120567.54 frames. ], batch size: 62, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:09:32,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=567618.6666666666, ans=0.125 2024-09-17 07:09:33,554 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:09:36,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=567618.6666666666, ans=0.125 2024-09-17 07:09:39,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=567618.6666666666, ans=0.125 2024-09-17 07:10:00,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=567675.3333333334, ans=0.1 2024-09-17 07:10:01,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=567675.3333333334, ans=0.2 2024-09-17 07:10:03,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=567675.3333333334, ans=0.0 2024-09-17 07:10:13,380 INFO [train.py:1198] (0/2) Epoch 32, batch 2250, loss[loss=0.1939, ctc_loss=0.1245, cr_loss=0.3468, over 20359.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1479, cr_loss=0.3721, over 4109830.99 frames. ], batch size: 45, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:10:24,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=567703.6666666666, ans=0.07 2024-09-17 07:10:30,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=567732.0, ans=0.125 2024-09-17 07:10:39,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=567732.0, ans=0.125 2024-09-17 07:10:40,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=567732.0, ans=0.125 2024-09-17 07:11:21,375 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.098e+02 2.214e+02 2.357e+02 4.754e+02, threshold=4.427e+02, percent-clipped=1.0 2024-09-17 07:11:31,817 INFO [train.py:1198] (0/2) Epoch 32, batch 2300, loss[loss=0.2316, ctc_loss=0.1531, cr_loss=0.3924, over 20962.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1476, cr_loss=0.3712, over 4107936.91 frames. ], batch size: 67, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:11:32,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=567845.3333333334, ans=0.125 2024-09-17 07:11:39,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=567845.3333333334, ans=0.09899494936611666 2024-09-17 07:11:53,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.25 vs. limit=22.5 2024-09-17 07:12:04,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-09-17 07:12:22,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=567930.3333333334, ans=0.125 2024-09-17 07:12:40,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=567958.6666666666, ans=0.0 2024-09-17 07:12:47,377 INFO [train.py:1198] (0/2) Epoch 32, batch 2350, loss[loss=0.213, ctc_loss=0.1387, cr_loss=0.3716, over 20786.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1473, cr_loss=0.3699, over 4113278.95 frames. ], batch size: 53, lr: 2.59e-03, grad_scale: 16.0 2024-09-17 07:13:06,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=568015.3333333334, ans=0.0 2024-09-17 07:13:19,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=568043.6666666666, ans=0.0 2024-09-17 07:13:27,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-09-17 07:13:45,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=568072.0, ans=0.125 2024-09-17 07:13:51,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=568100.3333333334, ans=0.125 2024-09-17 07:13:55,162 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.169e+02 2.283e+02 2.413e+02 5.050e+02, threshold=4.567e+02, percent-clipped=1.0 2024-09-17 07:13:58,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=568100.3333333334, ans=0.0 2024-09-17 07:14:05,652 INFO [train.py:1198] (0/2) Epoch 32, batch 2400, loss[loss=0.1971, ctc_loss=0.1267, cr_loss=0.3519, over 19017.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1479, cr_loss=0.3705, over 4099269.35 frames. ], batch size: 42, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:14:16,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=568128.6666666666, ans=0.125 2024-09-17 07:14:28,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=568157.0, ans=0.0 2024-09-17 07:14:34,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=568185.3333333334, ans=0.125 2024-09-17 07:15:09,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=568242.0, ans=0.2 2024-09-17 07:15:18,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=568242.0, ans=0.04949747468305833 2024-09-17 07:15:20,813 INFO [train.py:1198] (0/2) Epoch 32, batch 2450, loss[loss=0.1879, ctc_loss=0.1224, cr_loss=0.3278, over 21063.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.148, cr_loss=0.3702, over 4093754.25 frames. ], batch size: 53, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:15:43,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=568298.6666666666, ans=0.125 2024-09-17 07:15:43,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=568298.6666666666, ans=0.125 2024-09-17 07:16:21,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568383.6666666666, ans=0.1 2024-09-17 07:16:25,933 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.138e+02 2.262e+02 2.448e+02 4.421e+02, threshold=4.525e+02, percent-clipped=0.0 2024-09-17 07:16:32,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568383.6666666666, ans=0.1 2024-09-17 07:16:39,427 INFO [train.py:1198] (0/2) Epoch 32, batch 2500, loss[loss=0.1937, ctc_loss=0.1244, cr_loss=0.3465, over 19978.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1486, cr_loss=0.3703, over 4088190.94 frames. ], batch size: 44, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:16:47,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=568412.0, ans=0.0 2024-09-17 07:17:49,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=568525.3333333334, ans=0.0 2024-09-17 07:17:55,322 INFO [train.py:1198] (0/2) Epoch 32, batch 2550, loss[loss=0.2019, ctc_loss=0.133, cr_loss=0.3443, over 21060.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1478, cr_loss=0.3695, over 4097766.34 frames. ], batch size: 53, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:18:04,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568553.6666666666, ans=0.1 2024-09-17 07:18:46,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=568638.6666666666, ans=0.0 2024-09-17 07:18:59,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.179e+02 2.332e+02 2.505e+02 3.120e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-17 07:19:01,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=568667.0, ans=0.125 2024-09-17 07:19:10,430 INFO [train.py:1198] (0/2) Epoch 32, batch 2600, loss[loss=0.1964, ctc_loss=0.1295, cr_loss=0.3343, over 20877.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1483, cr_loss=0.3705, over 4085841.93 frames. ], batch size: 57, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:20:10,318 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2024-09-17 07:20:12,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568808.6666666666, ans=0.1 2024-09-17 07:20:29,025 INFO [train.py:1198] (0/2) Epoch 32, batch 2650, loss[loss=0.2536, ctc_loss=0.1718, cr_loss=0.4089, over 20691.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1487, cr_loss=0.3719, over 4091286.37 frames. ], batch size: 68, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:20:38,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=22.5 2024-09-17 07:20:56,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=568865.3333333334, ans=0.125 2024-09-17 07:21:10,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=568893.6666666666, ans=0.125 2024-09-17 07:21:34,593 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.134e+02 2.251e+02 2.417e+02 3.469e+02, threshold=4.502e+02, percent-clipped=0.0 2024-09-17 07:21:45,418 INFO [train.py:1198] (0/2) Epoch 32, batch 2700, loss[loss=0.2368, ctc_loss=0.1543, cr_loss=0.4126, over 20963.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1483, cr_loss=0.3714, over 4096122.84 frames. ], batch size: 67, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:21:48,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568978.6666666666, ans=0.1 2024-09-17 07:21:54,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=568978.6666666666, ans=0.125 2024-09-17 07:22:01,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=22.5 2024-09-17 07:22:32,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=569063.6666666666, ans=0.1 2024-09-17 07:22:42,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=569063.6666666666, ans=0.125 2024-09-17 07:23:01,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=569092.0, ans=0.125 2024-09-17 07:23:04,589 INFO [train.py:1198] (0/2) Epoch 32, batch 2750, loss[loss=0.1972, ctc_loss=0.1304, cr_loss=0.3342, over 20985.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1487, cr_loss=0.3723, over 4101609.19 frames. ], batch size: 52, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:23:06,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=569120.3333333334, ans=0.125 2024-09-17 07:24:02,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=569205.3333333334, ans=0.2 2024-09-17 07:24:03,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=569233.6666666666, ans=0.0 2024-09-17 07:24:05,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=569233.6666666666, ans=0.035 2024-09-17 07:24:08,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=569233.6666666666, ans=0.125 2024-09-17 07:24:09,266 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.139e+02 2.268e+02 2.431e+02 3.732e+02, threshold=4.536e+02, percent-clipped=0.0 2024-09-17 07:24:19,077 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=15.0 2024-09-17 07:24:19,582 INFO [train.py:1198] (0/2) Epoch 32, batch 2800, loss[loss=0.196, ctc_loss=0.1285, cr_loss=0.3371, over 20990.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1485, cr_loss=0.3721, over 4106300.61 frames. ], batch size: 52, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:24:24,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=569262.0, ans=0.015 2024-09-17 07:24:27,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=569262.0, ans=0.125 2024-09-17 07:25:33,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=569375.3333333334, ans=0.125 2024-09-17 07:25:37,637 INFO [train.py:1198] (0/2) Epoch 32, batch 2850, loss[loss=0.1791, ctc_loss=0.1155, cr_loss=0.3178, over 20947.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.3732, over 4102568.13 frames. ], batch size: 49, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:25:54,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=569432.0, ans=0.2 2024-09-17 07:25:57,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=569432.0, ans=10.0 2024-09-17 07:26:22,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=569488.6666666666, ans=0.125 2024-09-17 07:26:23,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-17 07:26:42,602 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.164e+02 2.318e+02 2.473e+02 4.370e+02, threshold=4.636e+02, percent-clipped=0.0 2024-09-17 07:26:49,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=569517.0, ans=0.125 2024-09-17 07:26:49,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=569517.0, ans=0.125 2024-09-17 07:26:53,108 INFO [train.py:1198] (0/2) Epoch 32, batch 2900, loss[loss=0.1904, ctc_loss=0.1245, cr_loss=0.3294, over 20983.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1489, cr_loss=0.3734, over 4101169.61 frames. ], batch size: 52, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:26:57,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=569545.3333333334, ans=0.125 2024-09-17 07:27:15,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=569573.6666666666, ans=0.125 2024-09-17 07:27:35,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=569602.0, ans=0.0 2024-09-17 07:27:51,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=12.0 2024-09-17 07:28:11,546 INFO [train.py:1198] (0/2) Epoch 32, batch 2950, loss[loss=0.1751, ctc_loss=0.1112, cr_loss=0.3197, over 20987.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1486, cr_loss=0.3732, over 4100667.61 frames. ], batch size: 49, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:28:14,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=569687.0, ans=0.015 2024-09-17 07:28:24,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=569687.0, ans=0.0 2024-09-17 07:28:43,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=569743.6666666666, ans=0.125 2024-09-17 07:28:45,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=569743.6666666666, ans=0.125 2024-09-17 07:29:16,327 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.172e+02 2.360e+02 2.593e+02 4.618e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-17 07:29:22,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=569800.3333333334, ans=0.125 2024-09-17 07:29:26,797 INFO [train.py:1198] (0/2) Epoch 32, batch 3000, loss[loss=0.2337, ctc_loss=0.155, cr_loss=0.3935, over 20732.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3735, over 4092606.12 frames. ], batch size: 71, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:29:26,798 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 07:29:34,286 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.3318, 6.0117, 5.7908, 5.4284], device='cuda:0') 2024-09-17 07:29:45,599 INFO [train.py:1230] (0/2) Epoch 32, validation: loss=0.04051, ctc_loss=0.04051, cr_loss=1.305e-14, over 944034.00 frames. 2024-09-17 07:29:45,600 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 07:29:47,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-17 07:29:52,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=569828.6666666666, ans=0.0 2024-09-17 07:29:56,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=569828.6666666666, ans=0.125 2024-09-17 07:30:12,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.97 vs. limit=10.0 2024-09-17 07:30:19,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=569885.3333333334, ans=0.125 2024-09-17 07:31:03,805 INFO [train.py:1198] (0/2) Epoch 32, batch 3050, loss[loss=0.2423, ctc_loss=0.163, cr_loss=0.3961, over 21043.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3732, over 4095208.72 frames. ], batch size: 62, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:31:08,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=569970.3333333334, ans=0.0 2024-09-17 07:31:08,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=569970.3333333334, ans=0.125 2024-09-17 07:31:34,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=570027.0, ans=0.0 2024-09-17 07:31:37,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=570027.0, ans=0.125 2024-09-17 07:31:41,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=570027.0, ans=0.125 2024-09-17 07:31:55,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=570055.3333333334, ans=0.2 2024-09-17 07:31:57,384 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.62 vs. limit=22.5 2024-09-17 07:32:08,519 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.213e+02 2.329e+02 2.544e+02 4.650e+02, threshold=4.659e+02, percent-clipped=0.0 2024-09-17 07:32:18,999 INFO [train.py:1198] (0/2) Epoch 32, batch 3100, loss[loss=0.2397, ctc_loss=0.1604, cr_loss=0.3965, over 20820.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1486, cr_loss=0.3726, over 4101010.11 frames. ], batch size: 59, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:32:31,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=570112.0, ans=0.125 2024-09-17 07:33:31,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=570225.3333333334, ans=15.0 2024-09-17 07:33:33,904 INFO [train.py:1198] (0/2) Epoch 32, batch 3150, loss[loss=0.2344, ctc_loss=0.156, cr_loss=0.3919, over 20888.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1479, cr_loss=0.3714, over 4105566.19 frames. ], batch size: 54, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:33:43,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=570253.6666666666, ans=0.0 2024-09-17 07:34:26,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=570338.6666666666, ans=0.2 2024-09-17 07:34:38,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=570367.0, ans=0.0 2024-09-17 07:34:41,187 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.137e+02 2.259e+02 2.395e+02 4.184e+02, threshold=4.518e+02, percent-clipped=0.0 2024-09-17 07:34:41,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=570367.0, ans=0.0 2024-09-17 07:34:51,748 INFO [train.py:1198] (0/2) Epoch 32, batch 3200, loss[loss=0.2712, ctc_loss=0.1847, cr_loss=0.4326, over 18167.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1496, cr_loss=0.374, over 4092347.59 frames. ], batch size: 108, lr: 2.59e-03, grad_scale: 32.0 2024-09-17 07:35:28,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=570452.0, ans=0.125 2024-09-17 07:36:00,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2024-09-17 07:36:07,346 INFO [train.py:1198] (0/2) Epoch 32, batch 3250, loss[loss=0.2016, ctc_loss=0.1344, cr_loss=0.3363, over 21018.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.149, cr_loss=0.3726, over 4087645.29 frames. ], batch size: 61, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:36:49,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-09-17 07:37:15,557 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.125e+02 2.256e+02 2.459e+02 6.121e+02, threshold=4.513e+02, percent-clipped=2.0 2024-09-17 07:37:20,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=570650.3333333334, ans=0.125 2024-09-17 07:37:26,315 INFO [train.py:1198] (0/2) Epoch 32, batch 3300, loss[loss=0.2228, ctc_loss=0.1509, cr_loss=0.3598, over 20989.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1484, cr_loss=0.3715, over 4099651.87 frames. ], batch size: 55, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:37:37,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=570678.6666666666, ans=0.035 2024-09-17 07:37:37,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=570678.6666666666, ans=0.0 2024-09-17 07:37:46,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=570707.0, ans=0.025 2024-09-17 07:38:01,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=570735.3333333334, ans=0.125 2024-09-17 07:38:10,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=570763.6666666666, ans=0.0 2024-09-17 07:38:36,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=570792.0, ans=0.125 2024-09-17 07:38:41,896 INFO [train.py:1198] (0/2) Epoch 32, batch 3350, loss[loss=0.231, ctc_loss=0.1509, cr_loss=0.4006, over 20331.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1481, cr_loss=0.3705, over 4090439.50 frames. ], batch size: 74, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:38:52,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=570820.3333333334, ans=0.0 2024-09-17 07:39:16,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=570877.0, ans=0.125 2024-09-17 07:39:27,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=570877.0, ans=0.125 2024-09-17 07:39:49,778 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.203e+02 2.306e+02 2.500e+02 3.116e+02, threshold=4.612e+02, percent-clipped=0.0 2024-09-17 07:40:00,145 INFO [train.py:1198] (0/2) Epoch 32, batch 3400, loss[loss=0.2129, ctc_loss=0.1395, cr_loss=0.3671, over 21066.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1481, cr_loss=0.3702, over 4092852.88 frames. ], batch size: 56, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:40:06,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=570962.0, ans=0.125 2024-09-17 07:40:45,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=15.0 2024-09-17 07:40:47,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=571047.0, ans=0.0 2024-09-17 07:41:14,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2024-09-17 07:41:15,676 INFO [train.py:1198] (0/2) Epoch 32, batch 3450, loss[loss=0.2062, ctc_loss=0.1356, cr_loss=0.3529, over 21059.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1495, cr_loss=0.3729, over 4102205.73 frames. ], batch size: 56, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:41:34,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=571132.0, ans=0.2 2024-09-17 07:41:35,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=571132.0, ans=0.125 2024-09-17 07:41:53,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=571160.3333333334, ans=0.125 2024-09-17 07:42:20,265 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-17 07:42:23,667 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.201e+02 2.305e+02 2.525e+02 3.316e+02, threshold=4.610e+02, percent-clipped=0.0 2024-09-17 07:42:25,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=571217.0, ans=0.025 2024-09-17 07:42:34,280 INFO [train.py:1198] (0/2) Epoch 32, batch 3500, loss[loss=0.2391, ctc_loss=0.1579, cr_loss=0.4063, over 20957.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.149, cr_loss=0.3721, over 4107614.42 frames. ], batch size: 64, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:43:00,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=571273.6666666666, ans=0.95 2024-09-17 07:43:17,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=571302.0, ans=0.125 2024-09-17 07:43:25,243 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-09-17 07:43:35,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=571358.6666666666, ans=0.2 2024-09-17 07:43:49,883 INFO [train.py:1198] (0/2) Epoch 32, batch 3550, loss[loss=0.2659, ctc_loss=0.1847, cr_loss=0.4057, over 20098.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.15, cr_loss=0.3741, over 4105759.37 frames. ], batch size: 80, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:43:51,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=571387.0, ans=0.025 2024-09-17 07:44:14,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=571415.3333333334, ans=0.0 2024-09-17 07:44:46,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=571472.0, ans=0.0 2024-09-17 07:44:52,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=571500.3333333334, ans=0.1 2024-09-17 07:44:55,074 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.179e+02 2.302e+02 2.453e+02 3.233e+02, threshold=4.604e+02, percent-clipped=0.0 2024-09-17 07:44:58,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=571500.3333333334, ans=0.125 2024-09-17 07:44:58,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-09-17 07:45:05,616 INFO [train.py:1198] (0/2) Epoch 32, batch 3600, loss[loss=0.2187, ctc_loss=0.1455, cr_loss=0.366, over 20334.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1502, cr_loss=0.3744, over 4109911.70 frames. ], batch size: 74, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:45:15,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=571528.6666666666, ans=0.0 2024-09-17 07:45:18,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=12.0 2024-09-17 07:45:38,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=571585.3333333334, ans=0.0 2024-09-17 07:45:48,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2024-09-17 07:46:20,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.72 vs. limit=15.0 2024-09-17 07:46:23,983 INFO [train.py:1198] (0/2) Epoch 32, batch 3650, loss[loss=0.2515, ctc_loss=0.1703, cr_loss=0.4059, over 20069.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1501, cr_loss=0.3744, over 4111413.29 frames. ], batch size: 80, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:46:25,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=571670.3333333334, ans=0.125 2024-09-17 07:47:09,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=571755.3333333334, ans=0.125 2024-09-17 07:47:21,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=571755.3333333334, ans=0.025 2024-09-17 07:47:28,832 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.151e+02 2.272e+02 2.492e+02 3.935e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-17 07:47:41,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=571812.0, ans=0.1 2024-09-17 07:47:42,211 INFO [train.py:1198] (0/2) Epoch 32, batch 3700, loss[loss=0.2452, ctc_loss=0.1659, cr_loss=0.3964, over 20866.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1504, cr_loss=0.375, over 4097676.01 frames. ], batch size: 65, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:47:45,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-17 07:48:17,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=571868.6666666666, ans=0.125 2024-09-17 07:48:55,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=571925.3333333334, ans=0.125 2024-09-17 07:48:58,066 INFO [train.py:1198] (0/2) Epoch 32, batch 3750, loss[loss=0.2622, ctc_loss=0.1777, cr_loss=0.4221, over 20848.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3743, over 4106868.84 frames. ], batch size: 65, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:49:16,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=571982.0, ans=0.2 2024-09-17 07:49:57,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=572067.0, ans=0.125 2024-09-17 07:50:02,662 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.170e+02 2.309e+02 2.463e+02 4.810e+02, threshold=4.618e+02, percent-clipped=1.0 2024-09-17 07:50:13,290 INFO [train.py:1198] (0/2) Epoch 32, batch 3800, loss[loss=0.1931, ctc_loss=0.1246, cr_loss=0.3425, over 20981.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1504, cr_loss=0.3744, over 4096143.98 frames. ], batch size: 48, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:50:47,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=572152.0, ans=0.125 2024-09-17 07:51:32,478 INFO [train.py:1198] (0/2) Epoch 32, batch 3850, loss[loss=0.2276, ctc_loss=0.1521, cr_loss=0.3775, over 20350.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1496, cr_loss=0.3729, over 4085246.44 frames. ], batch size: 74, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:52:03,078 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:52:05,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2024-09-17 07:52:15,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=572293.6666666666, ans=0.0 2024-09-17 07:52:24,076 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.69 vs. limit=15.0 2024-09-17 07:52:24,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=572322.0, ans=0.04949747468305833 2024-09-17 07:52:27,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=572322.0, ans=0.125 2024-09-17 07:52:38,228 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.115e+02 2.269e+02 2.413e+02 5.300e+02, threshold=4.537e+02, percent-clipped=0.0 2024-09-17 07:52:48,890 INFO [train.py:1198] (0/2) Epoch 32, batch 3900, loss[loss=0.2459, ctc_loss=0.1639, cr_loss=0.4096, over 20981.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1498, cr_loss=0.3738, over 4094011.91 frames. ], batch size: 58, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:52:53,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=572378.6666666666, ans=0.125 2024-09-17 07:52:55,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=572378.6666666666, ans=0.125 2024-09-17 07:53:26,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=572435.3333333334, ans=0.125 2024-09-17 07:53:28,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=572435.3333333334, ans=0.125 2024-09-17 07:53:59,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=572492.0, ans=0.125 2024-09-17 07:54:08,738 INFO [train.py:1198] (0/2) Epoch 32, batch 3950, loss[loss=0.2294, ctc_loss=0.1506, cr_loss=0.3943, over 20973.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1493, cr_loss=0.3724, over 4092354.78 frames. ], batch size: 58, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:54:09,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=572520.3333333334, ans=0.0 2024-09-17 07:54:16,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=572520.3333333334, ans=0.125 2024-09-17 07:54:21,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=572520.3333333334, ans=0.125 2024-09-17 07:54:58,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=572605.3333333334, ans=0.125 2024-09-17 07:55:13,403 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.147e+02 2.300e+02 2.436e+02 4.605e+02, threshold=4.601e+02, percent-clipped=2.0 2024-09-17 07:55:23,946 INFO [train.py:1198] (0/2) Epoch 32, batch 4000, loss[loss=0.2743, ctc_loss=0.1914, cr_loss=0.4142, over 14050.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1496, cr_loss=0.3733, over 4085053.00 frames. ], batch size: 150, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:55:38,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-09-17 07:55:46,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-09-17 07:55:53,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=572718.6666666666, ans=0.0 2024-09-17 07:56:27,070 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=12.0 2024-09-17 07:56:43,122 INFO [train.py:1198] (0/2) Epoch 32, batch 4050, loss[loss=0.2067, ctc_loss=0.1361, cr_loss=0.3528, over 20987.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1505, cr_loss=0.3748, over 4078121.52 frames. ], batch size: 52, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:57:07,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=572832.0, ans=0.125 2024-09-17 07:57:48,315 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 2.157e+02 2.277e+02 2.428e+02 2.938e+02, threshold=4.555e+02, percent-clipped=0.0 2024-09-17 07:57:55,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.21 vs. limit=12.0 2024-09-17 07:57:59,010 INFO [train.py:1198] (0/2) Epoch 32, batch 4100, loss[loss=0.2203, ctc_loss=0.1447, cr_loss=0.378, over 20793.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1499, cr_loss=0.3742, over 4086606.36 frames. ], batch size: 53, lr: 2.58e-03, grad_scale: 64.0 2024-09-17 07:59:10,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=573058.6666666666, ans=0.0 2024-09-17 07:59:15,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=573058.6666666666, ans=0.0 2024-09-17 07:59:17,731 INFO [train.py:1198] (0/2) Epoch 32, batch 4150, loss[loss=0.2021, ctc_loss=0.1328, cr_loss=0.3462, over 20959.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1504, cr_loss=0.3744, over 4086859.63 frames. ], batch size: 49, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 07:59:40,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=573115.3333333334, ans=0.125 2024-09-17 08:00:20,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=573200.3333333334, ans=0.0 2024-09-17 08:00:24,717 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.156e+02 2.322e+02 2.502e+02 5.137e+02, threshold=4.644e+02, percent-clipped=1.0 2024-09-17 08:00:25,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=573200.3333333334, ans=0.04949747468305833 2024-09-17 08:00:31,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.46 vs. limit=15.0 2024-09-17 08:00:33,718 INFO [train.py:1198] (0/2) Epoch 32, batch 4200, loss[loss=0.2304, ctc_loss=0.1525, cr_loss=0.3897, over 21010.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1498, cr_loss=0.3735, over 4096068.39 frames. ], batch size: 63, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:00:38,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=573228.6666666666, ans=0.2 2024-09-17 08:00:49,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573257.0, ans=0.1 2024-09-17 08:01:07,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=573285.3333333334, ans=0.0 2024-09-17 08:01:21,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=573313.6666666666, ans=0.125 2024-09-17 08:01:28,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=573313.6666666666, ans=0.0 2024-09-17 08:01:49,582 INFO [train.py:1198] (0/2) Epoch 32, batch 4250, loss[loss=0.2375, ctc_loss=0.1595, cr_loss=0.39, over 20937.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1506, cr_loss=0.3751, over 4087321.57 frames. ], batch size: 60, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:01:56,155 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:02:17,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=573398.6666666666, ans=0.1 2024-09-17 08:02:52,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=573483.6666666666, ans=0.125 2024-09-17 08:02:52,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573483.6666666666, ans=0.1 2024-09-17 08:03:00,062 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.170e+02 2.279e+02 2.432e+02 3.476e+02, threshold=4.558e+02, percent-clipped=0.0 2024-09-17 08:03:06,741 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:03:09,331 INFO [train.py:1198] (0/2) Epoch 32, batch 4300, loss[loss=0.1955, ctc_loss=0.1271, cr_loss=0.3421, over 20986.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1503, cr_loss=0.3749, over 4077646.51 frames. ], batch size: 52, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:03:26,538 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:03:36,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.81 vs. limit=12.0 2024-09-17 08:04:13,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-17 08:04:24,876 INFO [train.py:1198] (0/2) Epoch 32, batch 4350, loss[loss=0.2416, ctc_loss=0.1609, cr_loss=0.4036, over 21047.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.15, cr_loss=0.375, over 4081066.18 frames. ], batch size: 62, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:05:00,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=573710.3333333334, ans=0.125 2024-09-17 08:05:29,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=573767.0, ans=0.0 2024-09-17 08:05:35,007 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.141e+02 2.271e+02 2.417e+02 5.155e+02, threshold=4.542e+02, percent-clipped=1.0 2024-09-17 08:05:43,993 INFO [train.py:1198] (0/2) Epoch 32, batch 4400, loss[loss=0.2341, ctc_loss=0.1632, cr_loss=0.3544, over 14211.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1497, cr_loss=0.3746, over 4077437.49 frames. ], batch size: 149, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:05:55,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573795.3333333334, ans=0.1 2024-09-17 08:05:56,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=573795.3333333334, ans=0.2 2024-09-17 08:05:59,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=573823.6666666666, ans=0.125 2024-09-17 08:06:49,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=573908.6666666666, ans=0.2 2024-09-17 08:06:59,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.12 vs. limit=15.0 2024-09-17 08:07:00,078 INFO [train.py:1198] (0/2) Epoch 32, batch 4450, loss[loss=0.2454, ctc_loss=0.1617, cr_loss=0.4186, over 19995.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1497, cr_loss=0.3748, over 4075047.39 frames. ], batch size: 80, lr: 2.58e-03, grad_scale: 32.0 2024-09-17 08:07:34,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=573993.6666666666, ans=0.1 2024-09-17 08:07:58,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=574022.0, ans=0.025 2024-09-17 08:08:04,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=574050.3333333334, ans=0.0 2024-09-17 08:08:10,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.115e+02 2.257e+02 2.394e+02 3.432e+02, threshold=4.514e+02, percent-clipped=0.0 2024-09-17 08:08:15,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=574050.3333333334, ans=0.05 2024-09-17 08:08:18,079 INFO [train.py:1198] (0/2) Epoch 32, batch 4500, loss[loss=0.2678, ctc_loss=0.1792, cr_loss=0.4427, over 19960.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.15, cr_loss=0.3749, over 4059550.65 frames. ], batch size: 80, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:08:25,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=574078.6666666666, ans=0.1 2024-09-17 08:08:28,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=574078.6666666666, ans=0.125 2024-09-17 08:08:28,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=574078.6666666666, ans=0.0 2024-09-17 08:08:34,163 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.39 vs. limit=15.0 2024-09-17 08:08:38,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=574107.0, ans=0.125 2024-09-17 08:08:50,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=574135.3333333334, ans=0.1 2024-09-17 08:08:57,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=574135.3333333334, ans=0.125 2024-09-17 08:09:05,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=574163.6666666666, ans=0.1 2024-09-17 08:09:33,772 INFO [train.py:1198] (0/2) Epoch 32, batch 4550, loss[loss=0.1908, ctc_loss=0.1222, cr_loss=0.3428, over 19863.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1499, cr_loss=0.3749, over 4077533.85 frames. ], batch size: 44, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:09:40,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=574220.3333333334, ans=0.125 2024-09-17 08:09:49,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=574248.6666666666, ans=0.125 2024-09-17 08:10:18,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=574305.3333333334, ans=0.2 2024-09-17 08:10:40,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-09-17 08:10:45,570 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.219e+02 2.336e+02 2.528e+02 3.714e+02, threshold=4.672e+02, percent-clipped=0.0 2024-09-17 08:10:52,989 INFO [train.py:1198] (0/2) Epoch 32, batch 4600, loss[loss=0.2342, ctc_loss=0.157, cr_loss=0.3858, over 21017.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1489, cr_loss=0.3734, over 4088400.90 frames. ], batch size: 61, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:11:05,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=574362.0, ans=0.125 2024-09-17 08:11:50,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=574447.0, ans=0.025 2024-09-17 08:12:09,644 INFO [train.py:1198] (0/2) Epoch 32, batch 4650, loss[loss=0.2072, ctc_loss=0.1369, cr_loss=0.3517, over 20883.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1498, cr_loss=0.3746, over 4077179.73 frames. ], batch size: 54, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:12:54,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=574588.6666666666, ans=0.0 2024-09-17 08:12:55,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=574588.6666666666, ans=0.2 2024-09-17 08:12:57,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=574588.6666666666, ans=0.2 2024-09-17 08:13:17,910 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.145e+02 2.298e+02 2.514e+02 5.438e+02, threshold=4.595e+02, percent-clipped=1.0 2024-09-17 08:13:25,593 INFO [train.py:1198] (0/2) Epoch 32, batch 4700, loss[loss=0.2101, ctc_loss=0.1395, cr_loss=0.3533, over 20929.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1496, cr_loss=0.3745, over 4075593.22 frames. ], batch size: 49, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:13:51,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=574673.6666666666, ans=0.2 2024-09-17 08:13:53,418 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:14:38,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=574758.6666666666, ans=0.125 2024-09-17 08:14:43,937 INFO [train.py:1198] (0/2) Epoch 32, batch 4750, loss[loss=0.2291, ctc_loss=0.151, cr_loss=0.3903, over 20826.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1505, cr_loss=0.3754, over 4064319.62 frames. ], batch size: 59, lr: 2.58e-03, grad_scale: 16.0 2024-09-17 08:15:08,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=574815.3333333334, ans=0.125 2024-09-17 08:15:52,400 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.165e+02 2.283e+02 2.412e+02 3.982e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-17 08:15:59,761 INFO [train.py:1198] (0/2) Epoch 32, batch 4800, loss[loss=0.2208, ctc_loss=0.1467, cr_loss=0.3707, over 20872.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1496, cr_loss=0.3742, over 4078050.04 frames. ], batch size: 57, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:16:38,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=574985.3333333334, ans=0.0 2024-09-17 08:16:57,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=575013.6666666666, ans=0.125 2024-09-17 08:17:08,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=575042.0, ans=0.125 2024-09-17 08:17:18,623 INFO [train.py:1198] (0/2) Epoch 32, batch 4850, loss[loss=0.2206, ctc_loss=0.1451, cr_loss=0.3776, over 21069.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3734, over 4085215.80 frames. ], batch size: 53, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:17:27,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=575070.3333333334, ans=0.0 2024-09-17 08:17:44,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575098.6666666666, ans=0.1 2024-09-17 08:17:56,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=575127.0, ans=0.07 2024-09-17 08:18:08,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=575155.3333333334, ans=0.125 2024-09-17 08:18:23,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=575183.6666666666, ans=0.2 2024-09-17 08:18:26,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.177e+02 2.291e+02 2.500e+02 8.857e+02, threshold=4.582e+02, percent-clipped=2.0 2024-09-17 08:18:34,131 INFO [train.py:1198] (0/2) Epoch 32, batch 4900, loss[loss=0.1815, ctc_loss=0.1188, cr_loss=0.3134, over 19460.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1496, cr_loss=0.3737, over 4076607.12 frames. ], batch size: 43, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:18:44,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=575212.0, ans=0.125 2024-09-17 08:19:29,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=575297.0, ans=0.125 2024-09-17 08:19:48,819 INFO [train.py:1198] (0/2) Epoch 32, batch 4950, loss[loss=0.2183, ctc_loss=0.1451, cr_loss=0.3661, over 20929.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1497, cr_loss=0.3747, over 4091385.53 frames. ], batch size: 60, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:20:29,894 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:20:59,569 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.159e+02 2.267e+02 2.468e+02 4.210e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-17 08:21:06,951 INFO [train.py:1198] (0/2) Epoch 32, batch 5000, loss[loss=0.2562, ctc_loss=0.1723, cr_loss=0.4195, over 20116.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1483, cr_loss=0.372, over 4088903.44 frames. ], batch size: 80, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:21:26,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=575523.6666666666, ans=0.125 2024-09-17 08:21:54,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=575580.3333333334, ans=0.1 2024-09-17 08:22:08,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=575608.6666666666, ans=0.2 2024-09-17 08:22:20,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=575637.0, ans=0.2 2024-09-17 08:22:21,553 INFO [train.py:1198] (0/2) Epoch 32, batch 5050, loss[loss=0.2483, ctc_loss=0.1681, cr_loss=0.4009, over 20838.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.149, cr_loss=0.3728, over 4093557.22 frames. ], batch size: 65, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:22:36,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=575665.3333333334, ans=0.2 2024-09-17 08:22:42,575 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:22:56,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2024-09-17 08:23:09,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=575722.0, ans=0.0 2024-09-17 08:23:12,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=575722.0, ans=0.0 2024-09-17 08:23:17,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=575722.0, ans=0.125 2024-09-17 08:23:28,219 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.163e+02 2.300e+02 2.480e+02 3.319e+02, threshold=4.599e+02, percent-clipped=0.0 2024-09-17 08:23:35,752 INFO [train.py:1198] (0/2) Epoch 32, batch 5100, loss[loss=0.2326, ctc_loss=0.154, cr_loss=0.3931, over 20868.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1497, cr_loss=0.3742, over 4084121.70 frames. ], batch size: 57, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:23:55,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=575807.0, ans=0.125 2024-09-17 08:23:58,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=575807.0, ans=0.125 2024-09-17 08:24:25,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=575863.6666666666, ans=0.125 2024-09-17 08:24:52,861 INFO [train.py:1198] (0/2) Epoch 32, batch 5150, loss[loss=0.2326, ctc_loss=0.1539, cr_loss=0.3933, over 20984.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1497, cr_loss=0.3744, over 4094599.82 frames. ], batch size: 58, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:25:08,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=575948.6666666666, ans=0.0 2024-09-17 08:25:09,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=575948.6666666666, ans=0.0 2024-09-17 08:25:16,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=575948.6666666666, ans=0.125 2024-09-17 08:25:34,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=575977.0, ans=0.125 2024-09-17 08:25:59,940 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.218e+02 2.339e+02 2.528e+02 3.794e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-17 08:26:07,280 INFO [train.py:1198] (0/2) Epoch 32, batch 5200, loss[loss=0.1997, ctc_loss=0.1326, cr_loss=0.3356, over 21082.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1494, cr_loss=0.3737, over 4103616.01 frames. ], batch size: 59, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:26:23,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=576090.3333333334, ans=0.0 2024-09-17 08:26:32,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=576090.3333333334, ans=0.125 2024-09-17 08:27:21,611 INFO [train.py:1198] (0/2) Epoch 32, batch 5250, loss[loss=0.2262, ctc_loss=0.151, cr_loss=0.3761, over 20967.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.3723, over 4108911.03 frames. ], batch size: 58, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:27:43,049 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-09-17 08:28:28,531 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.124e+02 2.226e+02 2.349e+02 2.879e+02, threshold=4.453e+02, percent-clipped=0.0 2024-09-17 08:28:32,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2024-09-17 08:28:35,957 INFO [train.py:1198] (0/2) Epoch 32, batch 5300, loss[loss=0.2433, ctc_loss=0.1645, cr_loss=0.3937, over 20818.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1496, cr_loss=0.3738, over 4103777.97 frames. ], batch size: 59, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:28:45,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=576345.3333333334, ans=0.2 2024-09-17 08:29:36,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=576458.6666666666, ans=0.125 2024-09-17 08:29:48,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=576487.0, ans=0.0 2024-09-17 08:29:50,220 INFO [train.py:1198] (0/2) Epoch 32, batch 5350, loss[loss=0.2388, ctc_loss=0.1624, cr_loss=0.3816, over 21039.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1491, cr_loss=0.3728, over 4111370.15 frames. ], batch size: 61, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:30:37,025 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=15.0 2024-09-17 08:30:52,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=576600.3333333334, ans=0.125 2024-09-17 08:30:54,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=576600.3333333334, ans=0.125 2024-09-17 08:30:59,970 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.153e+02 2.282e+02 2.445e+02 6.157e+02, threshold=4.565e+02, percent-clipped=1.0 2024-09-17 08:31:07,415 INFO [train.py:1198] (0/2) Epoch 32, batch 5400, loss[loss=0.1897, ctc_loss=0.1225, cr_loss=0.3357, over 21001.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1482, cr_loss=0.371, over 4104108.84 frames. ], batch size: 48, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:31:28,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=576657.0, ans=0.0 2024-09-17 08:32:21,570 INFO [train.py:1198] (0/2) Epoch 32, batch 5450, loss[loss=0.2041, ctc_loss=0.1375, cr_loss=0.3331, over 21072.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1479, cr_loss=0.3707, over 4110406.73 frames. ], batch size: 53, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:32:38,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=576798.6666666666, ans=0.0 2024-09-17 08:32:53,780 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.39 vs. limit=22.5 2024-09-17 08:32:56,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=576827.0, ans=0.0 2024-09-17 08:33:25,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=576883.6666666666, ans=0.2 2024-09-17 08:33:28,648 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.140e+02 2.263e+02 2.431e+02 3.030e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-17 08:33:36,087 INFO [train.py:1198] (0/2) Epoch 32, batch 5500, loss[loss=0.2029, ctc_loss=0.1319, cr_loss=0.3551, over 20884.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1488, cr_loss=0.3722, over 4105855.66 frames. ], batch size: 57, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:33:41,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.51 vs. limit=15.0 2024-09-17 08:34:08,679 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:34:53,238 INFO [train.py:1198] (0/2) Epoch 32, batch 5550, loss[loss=0.2427, ctc_loss=0.1601, cr_loss=0.4132, over 21005.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1482, cr_loss=0.3711, over 4099452.36 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:35:08,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=12.0 2024-09-17 08:35:56,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.64 vs. limit=5.0 2024-09-17 08:36:00,071 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.138e+02 2.254e+02 2.526e+02 5.415e+02, threshold=4.507e+02, percent-clipped=1.0 2024-09-17 08:36:07,472 INFO [train.py:1198] (0/2) Epoch 32, batch 5600, loss[loss=0.2226, ctc_loss=0.1468, cr_loss=0.3789, over 20994.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1483, cr_loss=0.3719, over 4111557.75 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:36:10,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577195.3333333334, ans=0.1 2024-09-17 08:37:10,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=577308.6666666666, ans=0.0 2024-09-17 08:37:17,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.84 vs. limit=22.5 2024-09-17 08:37:22,026 INFO [train.py:1198] (0/2) Epoch 32, batch 5650, loss[loss=0.1997, ctc_loss=0.1302, cr_loss=0.3472, over 20975.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3735, over 4110263.18 frames. ], batch size: 50, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:37:44,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=577365.3333333334, ans=0.125 2024-09-17 08:37:48,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=577365.3333333334, ans=0.125 2024-09-17 08:38:30,296 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.140e+02 2.256e+02 2.414e+02 2.895e+02, threshold=4.511e+02, percent-clipped=0.0 2024-09-17 08:38:36,430 INFO [train.py:1198] (0/2) Epoch 32, batch 5700, loss[loss=0.2533, ctc_loss=0.1712, cr_loss=0.4103, over 20869.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1498, cr_loss=0.3747, over 4104878.61 frames. ], batch size: 65, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:39:53,457 INFO [train.py:1198] (0/2) Epoch 32, batch 5750, loss[loss=0.197, ctc_loss=0.1299, cr_loss=0.3354, over 20979.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.15, cr_loss=0.3748, over 4096588.62 frames. ], batch size: 49, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:39:59,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=577620.3333333334, ans=0.125 2024-09-17 08:40:56,693 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:40:57,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=577733.6666666666, ans=15.0 2024-09-17 08:41:02,142 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.147e+02 2.280e+02 2.436e+02 3.460e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 08:41:02,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=577733.6666666666, ans=0.0 2024-09-17 08:41:07,974 INFO [train.py:1198] (0/2) Epoch 32, batch 5800, loss[loss=0.2249, ctc_loss=0.1492, cr_loss=0.3786, over 20069.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1497, cr_loss=0.3747, over 4100131.34 frames. ], batch size: 80, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:42:00,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=577847.0, ans=0.0 2024-09-17 08:42:04,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=577847.0, ans=0.2 2024-09-17 08:42:10,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-17 08:42:24,436 INFO [train.py:1198] (0/2) Epoch 32, batch 5850, loss[loss=0.2422, ctc_loss=0.1614, cr_loss=0.4039, over 20941.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1498, cr_loss=0.3742, over 4104878.73 frames. ], batch size: 60, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:43:11,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=577988.6666666666, ans=0.125 2024-09-17 08:43:13,003 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-204000.pt 2024-09-17 08:43:18,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=577988.6666666666, ans=0.0 2024-09-17 08:43:32,792 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.221e+02 2.314e+02 2.487e+02 3.047e+02, threshold=4.628e+02, percent-clipped=0.0 2024-09-17 08:43:38,849 INFO [train.py:1198] (0/2) Epoch 32, batch 5900, loss[loss=0.2292, ctc_loss=0.1535, cr_loss=0.3785, over 20980.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1507, cr_loss=0.3755, over 4104204.50 frames. ], batch size: 58, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:43:46,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578045.3333333334, ans=0.1 2024-09-17 08:43:53,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578073.6666666666, ans=0.1 2024-09-17 08:44:09,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-17 08:44:13,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=578102.0, ans=0.125 2024-09-17 08:44:16,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=578102.0, ans=0.125 2024-09-17 08:44:40,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=578158.6666666666, ans=0.0 2024-09-17 08:44:53,773 INFO [train.py:1198] (0/2) Epoch 32, batch 5950, loss[loss=0.181, ctc_loss=0.1183, cr_loss=0.313, over 19903.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1509, cr_loss=0.3757, over 4096332.57 frames. ], batch size: 44, lr: 2.57e-03, grad_scale: 16.0 2024-09-17 08:44:57,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=578187.0, ans=0.025 2024-09-17 08:44:58,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=578187.0, ans=0.125 2024-09-17 08:45:02,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=578187.0, ans=0.2 2024-09-17 08:45:08,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=12.0 2024-09-17 08:45:46,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=578272.0, ans=0.125 2024-09-17 08:45:48,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=578272.0, ans=0.5 2024-09-17 08:45:53,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-09-17 08:46:02,073 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.180e+02 2.291e+02 2.464e+02 3.593e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-17 08:46:05,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=578300.3333333334, ans=0.0 2024-09-17 08:46:08,002 INFO [train.py:1198] (0/2) Epoch 32, batch 6000, loss[loss=0.2444, ctc_loss=0.167, cr_loss=0.387, over 20874.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1513, cr_loss=0.3765, over 4102027.64 frames. ], batch size: 57, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:46:08,003 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 08:46:34,019 INFO [train.py:1230] (0/2) Epoch 32, validation: loss=0.04093, ctc_loss=0.04093, cr_loss=1.254e-14, over 944034.00 frames. 2024-09-17 08:46:34,020 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 08:46:44,354 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2024-09-17 08:47:30,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=578413.6666666666, ans=0.125 2024-09-17 08:47:38,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=578442.0, ans=0.025 2024-09-17 08:47:51,534 INFO [train.py:1198] (0/2) Epoch 32, batch 6050, loss[loss=0.2029, ctc_loss=0.1315, cr_loss=0.3568, over 20984.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1506, cr_loss=0.3754, over 4084785.07 frames. ], batch size: 51, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:48:31,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=578527.0, ans=0.0 2024-09-17 08:48:48,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=578555.3333333334, ans=0.0 2024-09-17 08:49:01,020 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.099e+02 2.286e+02 2.447e+02 8.010e+02, threshold=4.571e+02, percent-clipped=1.0 2024-09-17 08:49:04,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=578583.6666666666, ans=0.125 2024-09-17 08:49:07,160 INFO [train.py:1198] (0/2) Epoch 32, batch 6100, loss[loss=0.2261, ctc_loss=0.1498, cr_loss=0.3813, over 21015.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1502, cr_loss=0.3748, over 4094268.17 frames. ], batch size: 61, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:49:31,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.23 vs. limit=10.0 2024-09-17 08:49:50,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=578697.0, ans=0.125 2024-09-17 08:49:56,405 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=22.5 2024-09-17 08:49:58,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=578697.0, ans=0.2 2024-09-17 08:50:20,905 INFO [train.py:1198] (0/2) Epoch 32, batch 6150, loss[loss=0.2044, ctc_loss=0.1343, cr_loss=0.3507, over 20781.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1497, cr_loss=0.3738, over 4094707.83 frames. ], batch size: 53, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:50:31,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578753.6666666666, ans=0.1 2024-09-17 08:50:49,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=578782.0, ans=0.125 2024-09-17 08:51:30,352 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.191e+02 2.316e+02 2.492e+02 4.494e+02, threshold=4.632e+02, percent-clipped=0.0 2024-09-17 08:51:36,352 INFO [train.py:1198] (0/2) Epoch 32, batch 6200, loss[loss=0.2273, ctc_loss=0.1562, cr_loss=0.3557, over 18615.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1508, cr_loss=0.3747, over 4055903.74 frames. ], batch size: 108, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:51:45,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578895.3333333334, ans=0.1 2024-09-17 08:51:51,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578923.6666666666, ans=0.1 2024-09-17 08:51:51,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=578923.6666666666, ans=0.0 2024-09-17 08:52:19,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=22.5 2024-09-17 08:52:49,571 INFO [train.py:1198] (0/2) Epoch 32, batch 6250, loss[loss=0.2026, ctc_loss=0.1347, cr_loss=0.3391, over 19888.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1511, cr_loss=0.3743, over 4042511.44 frames. ], batch size: 44, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:52:52,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=579037.0, ans=0.125 2024-09-17 08:53:09,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=579065.3333333334, ans=0.125 2024-09-17 08:53:35,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=579122.0, ans=0.125 2024-09-17 08:53:57,225 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.209e+02 2.363e+02 2.577e+02 3.623e+02, threshold=4.726e+02, percent-clipped=0.0 2024-09-17 08:54:03,309 INFO [train.py:1198] (0/2) Epoch 32, batch 6300, loss[loss=0.2187, ctc_loss=0.1465, cr_loss=0.3608, over 20683.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1531, cr_loss=0.3763, over 3992679.73 frames. ], batch size: 71, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:54:40,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.36 vs. limit=15.0 2024-09-17 08:54:56,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579263.6666666666, ans=0.1 2024-09-17 08:55:07,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=22.5 2024-09-17 08:55:15,933 INFO [train.py:1198] (0/2) Epoch 32, batch 6350, loss[loss=0.2774, ctc_loss=0.1958, cr_loss=0.4083, over 14632.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1584, cr_loss=0.3807, over 3787512.69 frames. ], batch size: 150, lr: 2.57e-03, grad_scale: 32.0 2024-09-17 08:55:19,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=579320.3333333334, ans=0.125 2024-09-17 08:55:47,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=579377.0, ans=0.125 2024-09-17 08:55:53,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=579377.0, ans=0.2 2024-09-17 08:56:02,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=579405.3333333334, ans=0.125 2024-09-17 08:56:11,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=579405.3333333334, ans=0.0 2024-09-17 08:56:15,046 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-32.pt 2024-09-17 08:57:05,645 INFO [train.py:1198] (0/2) Epoch 33, batch 0, loss[loss=0.2428, ctc_loss=0.1668, cr_loss=0.3797, over 19291.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1668, cr_loss=0.3797, over 19291.00 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 08:57:05,646 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 08:57:24,036 INFO [train.py:1230] (0/2) Epoch 33, validation: loss=0.04005, ctc_loss=0.04005, cr_loss=1.295e-14, over 944034.00 frames. 2024-09-17 08:57:24,037 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 08:57:31,631 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.090e+02 2.630e+02 2.802e+02 3.019e+02 5.106e+02, threshold=5.605e+02, percent-clipped=1.0 2024-09-17 08:58:28,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=579549.8333333334, ans=0.125 2024-09-17 08:58:42,109 INFO [train.py:1198] (0/2) Epoch 33, batch 50, loss[loss=0.2555, ctc_loss=0.1693, cr_loss=0.4314, over 18105.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1509, cr_loss=0.3723, over 903430.46 frames. ], batch size: 108, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 08:58:51,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579578.1666666666, ans=0.1 2024-09-17 08:59:57,406 INFO [train.py:1198] (0/2) Epoch 33, batch 100, loss[loss=0.1585, ctc_loss=0.1021, cr_loss=0.282, over 20983.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1507, cr_loss=0.3732, over 1606634.62 frames. ], batch size: 48, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:00:02,194 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:00:04,940 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.123e+02 2.249e+02 2.450e+02 4.320e+02, threshold=4.499e+02, percent-clipped=0.0 2024-09-17 09:00:08,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579719.8333333334, ans=0.1 2024-09-17 09:00:11,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=579748.1666666666, ans=0.07 2024-09-17 09:00:17,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=579748.1666666666, ans=0.125 2024-09-17 09:00:17,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=579748.1666666666, ans=0.025 2024-09-17 09:01:12,509 INFO [train.py:1198] (0/2) Epoch 33, batch 150, loss[loss=0.2551, ctc_loss=0.1736, cr_loss=0.4073, over 19356.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1521, cr_loss=0.3742, over 2138616.32 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:02:09,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=579946.5, ans=0.125 2024-09-17 09:02:11,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=579946.5, ans=0.0 2024-09-17 09:02:11,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=579946.5, ans=10.0 2024-09-17 09:02:12,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-09-17 09:02:17,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=579974.8333333334, ans=0.125 2024-09-17 09:02:30,735 INFO [train.py:1198] (0/2) Epoch 33, batch 200, loss[loss=0.1936, ctc_loss=0.1258, cr_loss=0.339, over 20767.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1505, cr_loss=0.3745, over 2583789.63 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:02:37,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=580003.1666666666, ans=0.125 2024-09-17 09:02:38,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=580003.1666666666, ans=0.125 2024-09-17 09:02:39,816 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.186e+02 2.332e+02 2.527e+02 4.428e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-17 09:02:46,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=580031.5, ans=0.0 2024-09-17 09:03:22,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=580088.1666666666, ans=0.0 2024-09-17 09:03:31,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=580116.5, ans=0.125 2024-09-17 09:03:38,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=580116.5, ans=0.05 2024-09-17 09:03:49,516 INFO [train.py:1198] (0/2) Epoch 33, batch 250, loss[loss=0.2499, ctc_loss=0.166, cr_loss=0.4199, over 20977.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1507, cr_loss=0.3759, over 2919038.18 frames. ], batch size: 67, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:04:19,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=580201.5, ans=0.025 2024-09-17 09:04:38,032 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:05:04,525 INFO [train.py:1198] (0/2) Epoch 33, batch 300, loss[loss=0.2269, ctc_loss=0.1488, cr_loss=0.3905, over 20676.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1516, cr_loss=0.3775, over 3182383.66 frames. ], batch size: 68, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:05:13,407 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.170e+02 2.282e+02 2.417e+02 3.185e+02, threshold=4.563e+02, percent-clipped=0.0 2024-09-17 09:05:13,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=580286.5, ans=0.1 2024-09-17 09:05:18,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=580314.8333333334, ans=0.04949747468305833 2024-09-17 09:05:25,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=580314.8333333334, ans=0.1 2024-09-17 09:05:31,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=580314.8333333334, ans=0.1 2024-09-17 09:05:43,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=580343.1666666666, ans=0.025 2024-09-17 09:06:19,106 INFO [train.py:1198] (0/2) Epoch 33, batch 350, loss[loss=0.1768, ctc_loss=0.1155, cr_loss=0.3063, over 20968.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1507, cr_loss=0.3765, over 3390940.30 frames. ], batch size: 50, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:06:34,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=580456.5, ans=0.025 2024-09-17 09:06:57,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=580484.8333333334, ans=0.0 2024-09-17 09:07:07,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=580513.1666666666, ans=0.2 2024-09-17 09:07:24,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.29 vs. limit=15.0 2024-09-17 09:07:28,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=580541.5, ans=0.0 2024-09-17 09:07:37,529 INFO [train.py:1198] (0/2) Epoch 33, batch 400, loss[loss=0.2229, ctc_loss=0.1474, cr_loss=0.3774, over 20783.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1495, cr_loss=0.3751, over 3553472.30 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:07:43,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=580569.8333333334, ans=0.2 2024-09-17 09:07:46,302 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.156e+02 2.238e+02 2.437e+02 9.103e+02, threshold=4.477e+02, percent-clipped=1.0 2024-09-17 09:08:06,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=580626.5, ans=0.0 2024-09-17 09:08:07,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=580626.5, ans=0.2 2024-09-17 09:08:14,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-17 09:08:15,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=580626.5, ans=0.125 2024-09-17 09:08:35,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=580654.8333333334, ans=0.09899494936611666 2024-09-17 09:08:50,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=580683.1666666666, ans=0.0 2024-09-17 09:08:53,067 INFO [train.py:1198] (0/2) Epoch 33, batch 450, loss[loss=0.2213, ctc_loss=0.1459, cr_loss=0.3769, over 20987.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1495, cr_loss=0.3751, over 3676555.71 frames. ], batch size: 67, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:08:59,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=580711.5, ans=0.0 2024-09-17 09:09:37,630 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:09:58,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=580824.8333333334, ans=0.125 2024-09-17 09:10:05,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=580824.8333333334, ans=22.5 2024-09-17 09:10:12,198 INFO [train.py:1198] (0/2) Epoch 33, batch 500, loss[loss=0.2357, ctc_loss=0.1569, cr_loss=0.3942, over 20674.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1506, cr_loss=0.3765, over 3759242.12 frames. ], batch size: 66, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:10:12,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=580853.1666666666, ans=0.125 2024-09-17 09:10:12,577 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:10:14,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=580853.1666666666, ans=0.1 2024-09-17 09:10:20,993 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.185e+02 2.280e+02 2.478e+02 4.436e+02, threshold=4.559e+02, percent-clipped=0.0 2024-09-17 09:11:27,438 INFO [train.py:1198] (0/2) Epoch 33, batch 550, loss[loss=0.2449, ctc_loss=0.1667, cr_loss=0.3909, over 20653.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1501, cr_loss=0.3757, over 3836122.46 frames. ], batch size: 68, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:11:30,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=580994.8333333334, ans=0.125 2024-09-17 09:12:21,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=581079.8333333334, ans=0.2 2024-09-17 09:12:42,074 INFO [train.py:1198] (0/2) Epoch 33, batch 600, loss[loss=0.2372, ctc_loss=0.1611, cr_loss=0.3807, over 20248.00 frames. ], tot_loss[loss=0.2253, ctc_loss=0.1502, cr_loss=0.3752, over 3894137.30 frames. ], batch size: 74, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:12:49,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=581136.5, ans=0.0 2024-09-17 09:12:51,201 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.159e+02 2.286e+02 2.461e+02 2.995e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-17 09:13:05,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-17 09:13:14,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=581193.1666666666, ans=0.2 2024-09-17 09:13:20,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=581193.1666666666, ans=0.125 2024-09-17 09:14:00,983 INFO [train.py:1198] (0/2) Epoch 33, batch 650, loss[loss=0.2329, ctc_loss=0.1558, cr_loss=0.3853, over 21041.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1499, cr_loss=0.3747, over 3945521.09 frames. ], batch size: 61, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:14:18,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=581306.5, ans=0.025 2024-09-17 09:14:49,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=581363.1666666666, ans=0.0 2024-09-17 09:15:20,218 INFO [train.py:1198] (0/2) Epoch 33, batch 700, loss[loss=0.237, ctc_loss=0.1573, cr_loss=0.3986, over 21069.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1491, cr_loss=0.3738, over 3976263.84 frames. ], batch size: 59, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:15:23,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=581419.8333333334, ans=0.025 2024-09-17 09:15:29,320 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.140e+02 2.298e+02 2.476e+02 5.299e+02, threshold=4.596e+02, percent-clipped=1.0 2024-09-17 09:15:31,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=581419.8333333334, ans=0.125 2024-09-17 09:15:56,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=581476.5, ans=0.2 2024-09-17 09:16:19,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=581533.1666666666, ans=0.025 2024-09-17 09:16:35,769 INFO [train.py:1198] (0/2) Epoch 33, batch 750, loss[loss=0.2456, ctc_loss=0.1668, cr_loss=0.3945, over 14273.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1489, cr_loss=0.3733, over 3997801.84 frames. ], batch size: 149, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:16:39,309 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:16:48,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=581561.5, ans=0.125 2024-09-17 09:17:18,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=581618.1666666666, ans=0.2 2024-09-17 09:17:20,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=581646.5, ans=0.0 2024-09-17 09:17:51,367 INFO [train.py:1198] (0/2) Epoch 33, batch 800, loss[loss=0.2261, ctc_loss=0.152, cr_loss=0.3705, over 20831.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.3724, over 4020273.75 frames. ], batch size: 59, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:18:00,527 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.168e+02 2.266e+02 2.451e+02 3.548e+02, threshold=4.533e+02, percent-clipped=0.0 2024-09-17 09:18:52,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=581788.1666666666, ans=0.0 2024-09-17 09:19:10,030 INFO [train.py:1198] (0/2) Epoch 33, batch 850, loss[loss=0.2312, ctc_loss=0.1557, cr_loss=0.3779, over 20635.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1487, cr_loss=0.3728, over 4037659.63 frames. ], batch size: 68, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:19:19,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=581844.8333333334, ans=0.125 2024-09-17 09:19:44,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-09-17 09:19:50,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-09-17 09:20:00,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2024-09-17 09:20:07,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=581929.8333333334, ans=0.0 2024-09-17 09:20:10,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=581958.1666666666, ans=0.125 2024-09-17 09:20:16,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=581958.1666666666, ans=0.0 2024-09-17 09:20:25,247 INFO [train.py:1198] (0/2) Epoch 33, batch 900, loss[loss=0.2329, ctc_loss=0.1534, cr_loss=0.3976, over 20975.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1494, cr_loss=0.3735, over 4046837.75 frames. ], batch size: 55, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:20:34,513 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.172e+02 2.301e+02 2.463e+02 3.738e+02, threshold=4.603e+02, percent-clipped=0.0 2024-09-17 09:20:39,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=22.5 2024-09-17 09:20:49,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=582014.8333333334, ans=0.125 2024-09-17 09:20:57,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=582043.1666666666, ans=0.125 2024-09-17 09:21:32,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=582099.8333333334, ans=0.0 2024-09-17 09:21:32,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-09-17 09:21:43,792 INFO [train.py:1198] (0/2) Epoch 33, batch 950, loss[loss=0.246, ctc_loss=0.1654, cr_loss=0.4034, over 20849.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1494, cr_loss=0.3741, over 4061353.75 frames. ], batch size: 65, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:22:25,271 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2024-09-17 09:22:56,574 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:22:59,014 INFO [train.py:1198] (0/2) Epoch 33, batch 1000, loss[loss=0.25, ctc_loss=0.1695, cr_loss=0.4022, over 18198.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1497, cr_loss=0.3752, over 4066098.47 frames. ], batch size: 108, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:23:08,022 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.176e+02 2.318e+02 2.450e+02 2.982e+02, threshold=4.637e+02, percent-clipped=0.0 2024-09-17 09:23:40,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=582326.5, ans=0.125 2024-09-17 09:23:53,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=582354.8333333334, ans=0.125 2024-09-17 09:24:14,640 INFO [train.py:1198] (0/2) Epoch 33, batch 1050, loss[loss=0.2106, ctc_loss=0.1373, cr_loss=0.3666, over 21061.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1498, cr_loss=0.3746, over 4073245.30 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:24:14,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=582411.5, ans=0.09899494936611666 2024-09-17 09:25:33,468 INFO [train.py:1198] (0/2) Epoch 33, batch 1100, loss[loss=0.2562, ctc_loss=0.1815, cr_loss=0.3738, over 13933.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1497, cr_loss=0.3746, over 4071922.33 frames. ], batch size: 149, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:25:42,163 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.126e+02 2.267e+02 2.389e+02 3.726e+02, threshold=4.534e+02, percent-clipped=0.0 2024-09-17 09:26:33,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=22.5 2024-09-17 09:26:35,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=582666.5, ans=0.125 2024-09-17 09:26:42,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=582666.5, ans=0.0 2024-09-17 09:26:51,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=582694.8333333334, ans=0.125 2024-09-17 09:26:52,408 INFO [train.py:1198] (0/2) Epoch 33, batch 1150, loss[loss=0.2307, ctc_loss=0.1562, cr_loss=0.3727, over 20957.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1492, cr_loss=0.3732, over 4070618.35 frames. ], batch size: 64, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:27:15,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=582723.1666666666, ans=0.2 2024-09-17 09:27:26,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=582751.5, ans=0.2 2024-09-17 09:27:35,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=582751.5, ans=0.125 2024-09-17 09:27:49,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=582779.8333333334, ans=0.125 2024-09-17 09:27:50,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=582779.8333333334, ans=0.125 2024-09-17 09:28:08,365 INFO [train.py:1198] (0/2) Epoch 33, batch 1200, loss[loss=0.227, ctc_loss=0.1517, cr_loss=0.3764, over 20777.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.372, over 4087417.19 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:28:14,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=582836.5, ans=0.0 2024-09-17 09:28:17,474 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.181e+02 2.326e+02 2.412e+02 3.196e+02, threshold=4.652e+02, percent-clipped=0.0 2024-09-17 09:28:20,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=582836.5, ans=0.0 2024-09-17 09:28:45,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-17 09:28:49,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=582893.1666666666, ans=0.0 2024-09-17 09:29:01,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=582921.5, ans=0.125 2024-09-17 09:29:16,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=582949.8333333334, ans=0.125 2024-09-17 09:29:24,137 INFO [train.py:1198] (0/2) Epoch 33, batch 1250, loss[loss=0.2396, ctc_loss=0.16, cr_loss=0.3978, over 20937.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1481, cr_loss=0.3715, over 4101185.44 frames. ], batch size: 60, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:29:30,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=582978.1666666666, ans=0.025 2024-09-17 09:30:17,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=583063.1666666666, ans=0.125 2024-09-17 09:30:43,127 INFO [train.py:1198] (0/2) Epoch 33, batch 1300, loss[loss=0.2359, ctc_loss=0.1557, cr_loss=0.4012, over 20678.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1486, cr_loss=0.3722, over 4100934.12 frames. ], batch size: 66, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:30:44,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=583119.8333333334, ans=0.125 2024-09-17 09:30:52,294 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.183e+02 2.292e+02 2.459e+02 3.044e+02, threshold=4.583e+02, percent-clipped=0.0 2024-09-17 09:31:01,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=583148.1666666666, ans=0.0 2024-09-17 09:31:17,197 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.13 vs. limit=6.0 2024-09-17 09:31:58,678 INFO [train.py:1198] (0/2) Epoch 33, batch 1350, loss[loss=0.2137, ctc_loss=0.1442, cr_loss=0.3476, over 20762.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1492, cr_loss=0.3733, over 4090022.91 frames. ], batch size: 56, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:32:01,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=583261.5, ans=0.125 2024-09-17 09:32:02,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=583261.5, ans=0.07 2024-09-17 09:33:17,333 INFO [train.py:1198] (0/2) Epoch 33, batch 1400, loss[loss=0.242, ctc_loss=0.1601, cr_loss=0.4094, over 20668.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.3724, over 4099775.37 frames. ], batch size: 66, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:33:21,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=15.0 2024-09-17 09:33:25,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=583403.1666666666, ans=0.125 2024-09-17 09:33:26,442 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.133e+02 2.260e+02 2.453e+02 4.298e+02, threshold=4.520e+02, percent-clipped=0.0 2024-09-17 09:33:26,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583403.1666666666, ans=0.1 2024-09-17 09:33:55,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=583459.8333333334, ans=0.2 2024-09-17 09:34:03,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=583488.1666666666, ans=0.5 2024-09-17 09:34:17,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=583516.5, ans=0.025 2024-09-17 09:34:33,339 INFO [train.py:1198] (0/2) Epoch 33, batch 1450, loss[loss=0.2558, ctc_loss=0.1744, cr_loss=0.4074, over 20706.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1494, cr_loss=0.3734, over 4093837.94 frames. ], batch size: 71, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:34:37,061 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-09-17 09:35:10,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583601.5, ans=0.1 2024-09-17 09:35:14,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=583601.5, ans=0.05 2024-09-17 09:35:23,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=583629.8333333334, ans=15.0 2024-09-17 09:35:41,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=583658.1666666666, ans=0.125 2024-09-17 09:35:49,331 INFO [train.py:1198] (0/2) Epoch 33, batch 1500, loss[loss=0.211, ctc_loss=0.1405, cr_loss=0.3527, over 20789.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1497, cr_loss=0.3734, over 4104876.49 frames. ], batch size: 53, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:35:59,949 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.174e+02 2.285e+02 2.430e+02 3.782e+02, threshold=4.571e+02, percent-clipped=0.0 2024-09-17 09:36:43,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-17 09:37:04,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583799.8333333334, ans=0.1 2024-09-17 09:37:08,465 INFO [train.py:1198] (0/2) Epoch 33, batch 1550, loss[loss=0.2513, ctc_loss=0.1667, cr_loss=0.4229, over 20853.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1494, cr_loss=0.373, over 4091535.95 frames. ], batch size: 65, lr: 2.52e-03, grad_scale: 16.0 2024-09-17 09:37:16,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=583828.1666666666, ans=0.125 2024-09-17 09:37:39,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=583884.8333333334, ans=0.04949747468305833 2024-09-17 09:37:39,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-09-17 09:37:52,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=583913.1666666666, ans=0.0 2024-09-17 09:38:04,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=583913.1666666666, ans=0.0 2024-09-17 09:38:13,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.47 vs. limit=5.0 2024-09-17 09:38:25,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=583969.8333333334, ans=0.125 2024-09-17 09:38:27,006 INFO [train.py:1198] (0/2) Epoch 33, batch 1600, loss[loss=0.1906, ctc_loss=0.1252, cr_loss=0.3271, over 19889.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1483, cr_loss=0.3705, over 4080629.69 frames. ], batch size: 44, lr: 2.52e-03, grad_scale: 32.0 2024-09-17 09:38:37,365 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.139e+02 2.280e+02 2.404e+02 3.218e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 09:38:44,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=583998.1666666666, ans=0.0 2024-09-17 09:38:58,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=584026.5, ans=0.125 2024-09-17 09:39:06,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=584026.5, ans=0.0 2024-09-17 09:39:36,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-09-17 09:39:42,803 INFO [train.py:1198] (0/2) Epoch 33, batch 1650, loss[loss=0.2255, ctc_loss=0.1496, cr_loss=0.3797, over 20976.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1485, cr_loss=0.3713, over 4082305.55 frames. ], batch size: 58, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:39:53,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=584111.5, ans=0.025 2024-09-17 09:40:01,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=584139.8333333334, ans=0.025 2024-09-17 09:40:28,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=584196.5, ans=0.125 2024-09-17 09:40:44,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=584224.8333333334, ans=0.125 2024-09-17 09:40:59,021 INFO [train.py:1198] (0/2) Epoch 33, batch 1700, loss[loss=0.1947, ctc_loss=0.1271, cr_loss=0.3382, over 20983.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.3724, over 4092894.04 frames. ], batch size: 52, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:41:09,452 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.145e+02 2.250e+02 2.407e+02 6.829e+02, threshold=4.500e+02, percent-clipped=1.0 2024-09-17 09:41:21,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584281.5, ans=0.1 2024-09-17 09:41:24,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584281.5, ans=0.1 2024-09-17 09:41:30,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=584309.8333333334, ans=0.1 2024-09-17 09:41:53,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=584338.1666666666, ans=0.0 2024-09-17 09:42:17,533 INFO [train.py:1198] (0/2) Epoch 33, batch 1750, loss[loss=0.254, ctc_loss=0.1702, cr_loss=0.419, over 20823.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1493, cr_loss=0.3735, over 4098932.24 frames. ], batch size: 65, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:42:40,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=584423.1666666666, ans=0.0 2024-09-17 09:43:01,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584479.8333333334, ans=0.1 2024-09-17 09:43:06,654 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:43:33,245 INFO [train.py:1198] (0/2) Epoch 33, batch 1800, loss[loss=0.2366, ctc_loss=0.1601, cr_loss=0.3825, over 20836.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1496, cr_loss=0.3744, over 4099318.22 frames. ], batch size: 65, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:43:43,737 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.196e+02 2.310e+02 2.481e+02 3.292e+02, threshold=4.620e+02, percent-clipped=0.0 2024-09-17 09:43:54,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584564.8333333334, ans=0.1 2024-09-17 09:43:56,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=584564.8333333334, ans=0.125 2024-09-17 09:44:10,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=584593.1666666666, ans=0.07 2024-09-17 09:44:27,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=584621.5, ans=0.0 2024-09-17 09:44:28,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=584621.5, ans=0.025 2024-09-17 09:44:33,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=584621.5, ans=0.1 2024-09-17 09:44:33,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=584621.5, ans=0.125 2024-09-17 09:44:52,564 INFO [train.py:1198] (0/2) Epoch 33, batch 1850, loss[loss=0.1957, ctc_loss=0.1294, cr_loss=0.3312, over 20864.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.374, over 4111849.24 frames. ], batch size: 65, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:45:11,681 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-17 09:45:41,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=584763.1666666666, ans=0.125 2024-09-17 09:45:49,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=584763.1666666666, ans=0.09899494936611666 2024-09-17 09:46:08,661 INFO [train.py:1198] (0/2) Epoch 33, batch 1900, loss[loss=0.1877, ctc_loss=0.1226, cr_loss=0.3252, over 19889.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3726, over 4110188.29 frames. ], batch size: 44, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:46:19,422 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.126e+02 2.319e+02 2.432e+02 3.467e+02, threshold=4.638e+02, percent-clipped=0.0 2024-09-17 09:46:36,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=584848.1666666666, ans=0.0 2024-09-17 09:46:52,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=584904.8333333334, ans=0.0 2024-09-17 09:47:02,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=584904.8333333334, ans=0.125 2024-09-17 09:47:23,712 INFO [train.py:1198] (0/2) Epoch 33, batch 1950, loss[loss=0.2081, ctc_loss=0.1367, cr_loss=0.3575, over 21056.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1489, cr_loss=0.3732, over 4112025.76 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:48:03,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2024-09-17 09:48:28,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=585074.8333333334, ans=0.0 2024-09-17 09:48:42,417 INFO [train.py:1198] (0/2) Epoch 33, batch 2000, loss[loss=0.2753, ctc_loss=0.1863, cr_loss=0.4448, over 18375.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1488, cr_loss=0.373, over 4109822.12 frames. ], batch size: 108, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 09:48:52,646 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.26 vs. limit=15.0 2024-09-17 09:48:53,068 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.192e+02 2.288e+02 2.478e+02 4.432e+02, threshold=4.575e+02, percent-clipped=0.0 2024-09-17 09:49:07,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585131.5, ans=0.1 2024-09-17 09:50:01,300 INFO [train.py:1198] (0/2) Epoch 33, batch 2050, loss[loss=0.218, ctc_loss=0.1446, cr_loss=0.367, over 21066.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1483, cr_loss=0.3727, over 4112819.00 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:50:30,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=585301.5, ans=0.2 2024-09-17 09:50:34,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=12.0 2024-09-17 09:50:43,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=585301.5, ans=0.025 2024-09-17 09:51:00,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=585358.1666666666, ans=0.125 2024-09-17 09:51:03,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=585358.1666666666, ans=0.125 2024-09-17 09:51:16,726 INFO [train.py:1198] (0/2) Epoch 33, batch 2100, loss[loss=0.2369, ctc_loss=0.1591, cr_loss=0.3891, over 20661.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3725, over 4118876.55 frames. ], batch size: 66, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:51:26,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-09-17 09:51:27,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=585386.5, ans=0.025 2024-09-17 09:51:30,379 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.808e+02 2.177e+02 2.299e+02 2.464e+02 4.987e+02, threshold=4.598e+02, percent-clipped=1.0 2024-09-17 09:51:40,373 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=12.0 2024-09-17 09:51:40,553 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-09-17 09:52:27,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=585499.8333333334, ans=0.0 2024-09-17 09:52:33,007 INFO [train.py:1198] (0/2) Epoch 33, batch 2150, loss[loss=0.1845, ctc_loss=0.1196, cr_loss=0.3242, over 20972.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1486, cr_loss=0.3731, over 4120876.97 frames. ], batch size: 49, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:52:38,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=585528.1666666666, ans=0.0 2024-09-17 09:53:11,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=585584.8333333334, ans=0.125 2024-09-17 09:53:23,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=585613.1666666666, ans=0.0 2024-09-17 09:53:38,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=585641.5, ans=0.125 2024-09-17 09:53:51,971 INFO [train.py:1198] (0/2) Epoch 33, batch 2200, loss[loss=0.2076, ctc_loss=0.1374, cr_loss=0.3509, over 20671.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3735, over 4116381.51 frames. ], batch size: 71, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:54:05,435 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.158e+02 2.266e+02 2.414e+02 3.128e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-17 09:54:21,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=585726.5, ans=0.0 2024-09-17 09:54:22,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=585726.5, ans=0.0 2024-09-17 09:54:40,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=585754.8333333334, ans=0.0 2024-09-17 09:55:07,624 INFO [train.py:1198] (0/2) Epoch 33, batch 2250, loss[loss=0.2256, ctc_loss=0.1489, cr_loss=0.3835, over 21070.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1482, cr_loss=0.3723, over 4120178.46 frames. ], batch size: 59, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:56:22,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=585924.8333333334, ans=0.125 2024-09-17 09:56:24,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-09-17 09:56:25,609 INFO [train.py:1198] (0/2) Epoch 33, batch 2300, loss[loss=0.2569, ctc_loss=0.1722, cr_loss=0.4233, over 20645.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1485, cr_loss=0.3724, over 4109410.39 frames. ], batch size: 68, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:56:36,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=585953.1666666666, ans=0.125 2024-09-17 09:56:39,175 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.154e+02 2.293e+02 2.449e+02 3.027e+02, threshold=4.585e+02, percent-clipped=0.0 2024-09-17 09:56:51,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=585981.5, ans=0.0 2024-09-17 09:56:57,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=586009.8333333334, ans=0.2 2024-09-17 09:57:13,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=586038.1666666666, ans=0.2 2024-09-17 09:57:14,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-17 09:57:41,094 INFO [train.py:1198] (0/2) Epoch 33, batch 2350, loss[loss=0.2763, ctc_loss=0.1963, cr_loss=0.4003, over 13643.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1493, cr_loss=0.3741, over 4091635.99 frames. ], batch size: 152, lr: 2.51e-03, grad_scale: 8.0 2024-09-17 09:57:46,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=586094.8333333334, ans=0.125 2024-09-17 09:58:37,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=586179.8333333334, ans=0.1 2024-09-17 09:58:46,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=586208.1666666666, ans=0.0 2024-09-17 09:58:57,217 INFO [train.py:1198] (0/2) Epoch 33, batch 2400, loss[loss=0.1985, ctc_loss=0.1334, cr_loss=0.3253, over 21072.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1486, cr_loss=0.3729, over 4096634.68 frames. ], batch size: 53, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 09:59:11,677 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-17 09:59:14,026 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.718e+02 2.113e+02 2.273e+02 2.441e+02 1.109e+03, threshold=4.547e+02, percent-clipped=1.0 2024-09-17 09:59:21,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=586264.8333333334, ans=0.0 2024-09-17 10:00:02,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=586349.8333333334, ans=0.0 2024-09-17 10:00:11,103 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=8.0 2024-09-17 10:00:15,753 INFO [train.py:1198] (0/2) Epoch 33, batch 2450, loss[loss=0.1829, ctc_loss=0.1193, cr_loss=0.3179, over 20997.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3726, over 4091558.54 frames. ], batch size: 50, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:00:16,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=586378.1666666666, ans=0.125 2024-09-17 10:00:46,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-09-17 10:00:47,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2024-09-17 10:00:48,743 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.78 vs. limit=6.0 2024-09-17 10:01:06,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-09-17 10:01:34,411 INFO [train.py:1198] (0/2) Epoch 33, batch 2500, loss[loss=0.2309, ctc_loss=0.1559, cr_loss=0.3749, over 21038.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3734, over 4074849.21 frames. ], batch size: 62, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:01:46,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=586519.8333333334, ans=0.0 2024-09-17 10:01:48,131 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.145e+02 2.313e+02 2.499e+02 4.124e+02, threshold=4.626e+02, percent-clipped=0.0 2024-09-17 10:01:51,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=586548.1666666666, ans=0.0 2024-09-17 10:02:00,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=586548.1666666666, ans=0.125 2024-09-17 10:02:31,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=586604.8333333334, ans=0.0 2024-09-17 10:02:36,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=586633.1666666666, ans=0.1 2024-09-17 10:02:49,586 INFO [train.py:1198] (0/2) Epoch 33, batch 2550, loss[loss=0.2409, ctc_loss=0.1611, cr_loss=0.399, over 21016.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.3729, over 4080884.51 frames. ], batch size: 63, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:02:56,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2024-09-17 10:03:13,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=586689.8333333334, ans=0.2 2024-09-17 10:03:29,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=586718.1666666666, ans=0.125 2024-09-17 10:03:52,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=22.5 2024-09-17 10:04:01,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=586774.8333333334, ans=0.2 2024-09-17 10:04:05,432 INFO [train.py:1198] (0/2) Epoch 33, batch 2600, loss[loss=0.2518, ctc_loss=0.1698, cr_loss=0.4098, over 19960.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1487, cr_loss=0.373, over 4081747.96 frames. ], batch size: 80, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:04:19,167 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.162e+02 2.322e+02 2.472e+02 3.448e+02, threshold=4.643e+02, percent-clipped=0.0 2024-09-17 10:04:24,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=586831.5, ans=0.0 2024-09-17 10:04:28,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=586831.5, ans=0.2 2024-09-17 10:04:34,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=586859.8333333334, ans=0.2 2024-09-17 10:05:24,511 INFO [train.py:1198] (0/2) Epoch 33, batch 2650, loss[loss=0.2164, ctc_loss=0.1424, cr_loss=0.3699, over 21052.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1488, cr_loss=0.3728, over 4078738.44 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:05:34,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=586944.8333333334, ans=0.125 2024-09-17 10:05:58,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=22.5 2024-09-17 10:06:09,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=587029.8333333334, ans=0.2 2024-09-17 10:06:30,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2024-09-17 10:06:40,564 INFO [train.py:1198] (0/2) Epoch 33, batch 2700, loss[loss=0.2265, ctc_loss=0.1517, cr_loss=0.374, over 19369.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1487, cr_loss=0.3733, over 4088301.17 frames. ], batch size: 90, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:06:49,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=587086.5, ans=0.2 2024-09-17 10:06:54,189 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.152e+02 2.291e+02 2.421e+02 3.428e+02, threshold=4.582e+02, percent-clipped=0.0 2024-09-17 10:07:27,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=587171.5, ans=0.0 2024-09-17 10:07:52,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=587199.8333333334, ans=0.125 2024-09-17 10:07:59,770 INFO [train.py:1198] (0/2) Epoch 33, batch 2750, loss[loss=0.2041, ctc_loss=0.1359, cr_loss=0.3408, over 20882.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3729, over 4097332.58 frames. ], batch size: 54, lr: 2.51e-03, grad_scale: 16.0 2024-09-17 10:08:36,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-17 10:08:40,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587284.8333333334, ans=0.1 2024-09-17 10:09:15,226 INFO [train.py:1198] (0/2) Epoch 33, batch 2800, loss[loss=0.2462, ctc_loss=0.1637, cr_loss=0.4122, over 20841.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1487, cr_loss=0.3726, over 4083512.62 frames. ], batch size: 65, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:09:22,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=587369.8333333334, ans=0.1 2024-09-17 10:09:28,816 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.127e+02 2.313e+02 2.493e+02 3.541e+02, threshold=4.627e+02, percent-clipped=0.0 2024-09-17 10:09:58,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=587426.5, ans=0.2 2024-09-17 10:10:02,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=587454.8333333334, ans=0.125 2024-09-17 10:10:04,513 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2024-09-17 10:10:14,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=587483.1666666666, ans=0.0 2024-09-17 10:10:34,141 INFO [train.py:1198] (0/2) Epoch 33, batch 2850, loss[loss=0.2264, ctc_loss=0.1508, cr_loss=0.3782, over 20934.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1495, cr_loss=0.3739, over 4088722.25 frames. ], batch size: 60, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:11:16,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2024-09-17 10:11:49,464 INFO [train.py:1198] (0/2) Epoch 33, batch 2900, loss[loss=0.1755, ctc_loss=0.1145, cr_loss=0.3052, over 21003.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1495, cr_loss=0.3742, over 4091456.36 frames. ], batch size: 51, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:11:52,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=587653.1666666666, ans=0.2 2024-09-17 10:12:03,140 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.160e+02 2.292e+02 2.490e+02 6.948e+02, threshold=4.584e+02, percent-clipped=1.0 2024-09-17 10:12:39,680 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:12:41,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=587738.1666666666, ans=0.0 2024-09-17 10:12:41,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=587738.1666666666, ans=0.125 2024-09-17 10:13:04,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587766.5, ans=0.1 2024-09-17 10:13:07,677 INFO [train.py:1198] (0/2) Epoch 33, batch 2950, loss[loss=0.2484, ctc_loss=0.1699, cr_loss=0.3927, over 20845.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3738, over 4103206.15 frames. ], batch size: 65, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:13:24,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=587823.1666666666, ans=0.5 2024-09-17 10:14:02,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=587879.8333333334, ans=0.025 2024-09-17 10:14:12,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=22.5 2024-09-17 10:14:14,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-09-17 10:14:23,079 INFO [train.py:1198] (0/2) Epoch 33, batch 3000, loss[loss=0.2166, ctc_loss=0.1411, cr_loss=0.3771, over 20893.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1488, cr_loss=0.3737, over 4104889.61 frames. ], batch size: 54, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:14:23,080 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 10:14:47,778 INFO [train.py:1230] (0/2) Epoch 33, validation: loss=0.03983, ctc_loss=0.03983, cr_loss=1.313e-14, over 944034.00 frames. 2024-09-17 10:14:47,779 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 10:15:00,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=587936.5, ans=0.0 2024-09-17 10:15:01,328 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.751e+02 2.169e+02 2.293e+02 2.458e+02 5.072e+02, threshold=4.586e+02, percent-clipped=1.0 2024-09-17 10:15:10,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=587964.8333333334, ans=0.1 2024-09-17 10:15:30,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=587993.1666666666, ans=0.0 2024-09-17 10:15:35,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.91 vs. limit=15.0 2024-09-17 10:15:56,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=588049.8333333334, ans=0.0 2024-09-17 10:16:00,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=588049.8333333334, ans=0.0 2024-09-17 10:16:06,180 INFO [train.py:1198] (0/2) Epoch 33, batch 3050, loss[loss=0.2163, ctc_loss=0.1419, cr_loss=0.3718, over 21017.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.1493, cr_loss=0.3734, over 4092138.25 frames. ], batch size: 61, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:16:20,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2024-09-17 10:16:57,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=588163.1666666666, ans=0.0 2024-09-17 10:16:58,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=588163.1666666666, ans=0.0 2024-09-17 10:17:15,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=588191.5, ans=0.125 2024-09-17 10:17:22,628 INFO [train.py:1198] (0/2) Epoch 33, batch 3100, loss[loss=0.242, ctc_loss=0.1612, cr_loss=0.4039, over 20969.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1481, cr_loss=0.3718, over 4098576.38 frames. ], batch size: 58, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:17:36,256 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.138e+02 2.243e+02 2.404e+02 3.026e+02, threshold=4.485e+02, percent-clipped=0.0 2024-09-17 10:17:53,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=588276.5, ans=0.125 2024-09-17 10:18:12,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2024-09-17 10:18:17,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=588304.8333333334, ans=0.125 2024-09-17 10:18:29,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=588333.1666666666, ans=0.0 2024-09-17 10:18:41,260 INFO [train.py:1198] (0/2) Epoch 33, batch 3150, loss[loss=0.2066, ctc_loss=0.1357, cr_loss=0.3547, over 20878.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.148, cr_loss=0.3719, over 4099036.87 frames. ], batch size: 57, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:18:43,199 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:18:55,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=588389.8333333334, ans=0.0 2024-09-17 10:19:01,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=588389.8333333334, ans=10.0 2024-09-17 10:19:03,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-17 10:19:54,234 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:19:56,716 INFO [train.py:1198] (0/2) Epoch 33, batch 3200, loss[loss=0.231, ctc_loss=0.1549, cr_loss=0.3807, over 20969.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1489, cr_loss=0.3734, over 4099647.50 frames. ], batch size: 58, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:20:10,673 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.222e+02 2.327e+02 2.511e+02 3.661e+02, threshold=4.653e+02, percent-clipped=0.0 2024-09-17 10:20:11,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-17 10:20:36,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=588559.8333333334, ans=0.0 2024-09-17 10:20:43,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=588588.1666666666, ans=0.125 2024-09-17 10:20:51,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-09-17 10:20:52,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=588588.1666666666, ans=0.125 2024-09-17 10:20:55,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=588616.5, ans=0.125 2024-09-17 10:21:11,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=588644.8333333334, ans=0.04949747468305833 2024-09-17 10:21:12,114 INFO [train.py:1198] (0/2) Epoch 33, batch 3250, loss[loss=0.2372, ctc_loss=0.1587, cr_loss=0.3923, over 21062.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.3733, over 4097746.27 frames. ], batch size: 59, lr: 2.51e-03, grad_scale: 32.0 2024-09-17 10:21:18,905 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=22.5 2024-09-17 10:21:38,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=588673.1666666666, ans=0.0 2024-09-17 10:21:49,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=12.0 2024-09-17 10:21:53,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=588701.5, ans=0.0 2024-09-17 10:21:53,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=588701.5, ans=0.125 2024-09-17 10:21:56,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=588701.5, ans=0.125 2024-09-17 10:22:01,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=588729.8333333334, ans=0.0 2024-09-17 10:22:02,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=588729.8333333334, ans=0.125 2024-09-17 10:22:21,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2024-09-17 10:22:28,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=588758.1666666666, ans=0.0 2024-09-17 10:22:31,079 INFO [train.py:1198] (0/2) Epoch 33, batch 3300, loss[loss=0.2409, ctc_loss=0.1609, cr_loss=0.3998, over 20663.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1485, cr_loss=0.3726, over 4097585.18 frames. ], batch size: 66, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:22:44,565 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.213e+02 2.316e+02 2.471e+02 4.382e+02, threshold=4.632e+02, percent-clipped=0.0 2024-09-17 10:22:44,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=588814.8333333334, ans=0.0 2024-09-17 10:22:49,994 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2024-09-17 10:22:59,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=588843.1666666666, ans=0.125 2024-09-17 10:23:02,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.64 vs. limit=15.0 2024-09-17 10:23:20,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=588871.5, ans=0.05 2024-09-17 10:23:46,653 INFO [train.py:1198] (0/2) Epoch 33, batch 3350, loss[loss=0.2452, ctc_loss=0.166, cr_loss=0.3961, over 20860.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1471, cr_loss=0.37, over 4097988.21 frames. ], batch size: 65, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:24:43,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=589013.1666666666, ans=0.125 2024-09-17 10:24:46,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589013.1666666666, ans=0.1 2024-09-17 10:25:00,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=589041.5, ans=0.02 2024-09-17 10:25:01,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=589041.5, ans=0.125 2024-09-17 10:25:04,448 INFO [train.py:1198] (0/2) Epoch 33, batch 3400, loss[loss=0.254, ctc_loss=0.1714, cr_loss=0.4129, over 18269.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1473, cr_loss=0.37, over 4093078.78 frames. ], batch size: 108, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:25:17,787 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.148e+02 2.304e+02 2.436e+02 5.081e+02, threshold=4.607e+02, percent-clipped=1.0 2024-09-17 10:25:24,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589098.1666666666, ans=0.1 2024-09-17 10:25:51,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589154.8333333334, ans=0.1 2024-09-17 10:25:52,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=589154.8333333334, ans=0.2 2024-09-17 10:25:53,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=22.5 2024-09-17 10:26:12,701 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:26:19,974 INFO [train.py:1198] (0/2) Epoch 33, batch 3450, loss[loss=0.2142, ctc_loss=0.1432, cr_loss=0.3552, over 20800.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1467, cr_loss=0.369, over 4100496.18 frames. ], batch size: 56, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:26:34,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=589239.8333333334, ans=0.0 2024-09-17 10:26:41,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=589239.8333333334, ans=0.2 2024-09-17 10:26:43,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=589239.8333333334, ans=10.0 2024-09-17 10:26:44,001 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-17 10:26:57,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=589268.1666666666, ans=0.125 2024-09-17 10:27:07,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=589296.5, ans=0.95 2024-09-17 10:27:09,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=589296.5, ans=0.0 2024-09-17 10:27:10,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=589296.5, ans=0.125 2024-09-17 10:27:24,303 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-208000.pt 2024-09-17 10:27:40,138 INFO [train.py:1198] (0/2) Epoch 33, batch 3500, loss[loss=0.1742, ctc_loss=0.1117, cr_loss=0.3122, over 20977.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1474, cr_loss=0.3707, over 4109059.45 frames. ], batch size: 49, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:27:46,717 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:27:54,134 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.160e+02 2.311e+02 2.466e+02 3.491e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-17 10:28:25,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-17 10:28:29,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589438.1666666666, ans=0.1 2024-09-17 10:28:56,400 INFO [train.py:1198] (0/2) Epoch 33, batch 3550, loss[loss=0.2078, ctc_loss=0.1381, cr_loss=0.3481, over 20968.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3716, over 4115944.26 frames. ], batch size: 49, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:29:17,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=589523.1666666666, ans=0.2 2024-09-17 10:29:18,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2024-09-17 10:29:26,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=589551.5, ans=0.025 2024-09-17 10:29:41,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=589579.8333333334, ans=0.0 2024-09-17 10:29:43,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-09-17 10:29:44,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=589579.8333333334, ans=0.0 2024-09-17 10:30:07,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=589608.1666666666, ans=0.125 2024-09-17 10:30:13,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=589636.5, ans=0.0 2024-09-17 10:30:14,409 INFO [train.py:1198] (0/2) Epoch 33, batch 3600, loss[loss=0.1758, ctc_loss=0.1161, cr_loss=0.2984, over 20958.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1483, cr_loss=0.372, over 4104365.69 frames. ], batch size: 50, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:30:28,007 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.194e+02 2.309e+02 2.464e+02 3.603e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-17 10:30:32,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=589664.8333333334, ans=0.07 2024-09-17 10:30:37,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589664.8333333334, ans=0.1 2024-09-17 10:30:47,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=589693.1666666666, ans=0.125 2024-09-17 10:30:49,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=589693.1666666666, ans=0.125 2024-09-17 10:31:03,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=589721.5, ans=0.2 2024-09-17 10:31:07,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=589721.5, ans=0.0 2024-09-17 10:31:27,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589749.8333333334, ans=0.1 2024-09-17 10:31:29,639 INFO [train.py:1198] (0/2) Epoch 33, batch 3650, loss[loss=0.2684, ctc_loss=0.1844, cr_loss=0.4196, over 18144.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1487, cr_loss=0.3729, over 4101650.62 frames. ], batch size: 108, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:32:18,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=589863.1666666666, ans=0.125 2024-09-17 10:32:45,744 INFO [train.py:1198] (0/2) Epoch 33, batch 3700, loss[loss=0.2431, ctc_loss=0.1591, cr_loss=0.42, over 20655.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1487, cr_loss=0.3725, over 4096819.42 frames. ], batch size: 71, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:32:47,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589919.8333333334, ans=0.1 2024-09-17 10:32:58,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.69 vs. limit=10.0 2024-09-17 10:33:01,131 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.167e+02 2.352e+02 2.549e+02 4.743e+02, threshold=4.705e+02, percent-clipped=1.0 2024-09-17 10:33:06,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=589948.1666666666, ans=0.0 2024-09-17 10:33:51,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=590033.1666666666, ans=0.07 2024-09-17 10:34:05,138 INFO [train.py:1198] (0/2) Epoch 33, batch 3750, loss[loss=0.2088, ctc_loss=0.138, cr_loss=0.3537, over 21019.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1474, cr_loss=0.371, over 4104738.90 frames. ], batch size: 61, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:34:14,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=590061.5, ans=0.05 2024-09-17 10:34:21,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=590089.8333333334, ans=0.125 2024-09-17 10:34:51,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590146.5, ans=0.1 2024-09-17 10:35:13,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=590174.8333333334, ans=0.1 2024-09-17 10:35:20,480 INFO [train.py:1198] (0/2) Epoch 33, batch 3800, loss[loss=0.2136, ctc_loss=0.141, cr_loss=0.3631, over 20987.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1469, cr_loss=0.3704, over 4114108.87 frames. ], batch size: 52, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:35:27,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-17 10:35:32,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=590203.1666666666, ans=0.0 2024-09-17 10:35:38,348 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.140e+02 2.270e+02 2.410e+02 3.531e+02, threshold=4.541e+02, percent-clipped=0.0 2024-09-17 10:36:16,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=590288.1666666666, ans=0.0 2024-09-17 10:36:23,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=590316.5, ans=0.2 2024-09-17 10:36:38,808 INFO [train.py:1198] (0/2) Epoch 33, batch 3850, loss[loss=0.1953, ctc_loss=0.1288, cr_loss=0.3326, over 20968.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1472, cr_loss=0.3707, over 4121760.55 frames. ], batch size: 51, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:36:55,896 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:37:39,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=590458.1666666666, ans=0.125 2024-09-17 10:37:48,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=590458.1666666666, ans=0.0 2024-09-17 10:37:54,773 INFO [train.py:1198] (0/2) Epoch 33, batch 3900, loss[loss=0.2084, ctc_loss=0.137, cr_loss=0.3571, over 21077.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1468, cr_loss=0.3702, over 4128318.58 frames. ], batch size: 53, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:38:09,945 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.174e+02 2.267e+02 2.471e+02 3.008e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-17 10:38:22,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=22.5 2024-09-17 10:38:28,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=590543.1666666666, ans=0.125 2024-09-17 10:38:37,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=590543.1666666666, ans=0.5 2024-09-17 10:38:43,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=590571.5, ans=0.125 2024-09-17 10:39:00,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=15.0 2024-09-17 10:39:13,710 INFO [train.py:1198] (0/2) Epoch 33, batch 3950, loss[loss=0.1904, ctc_loss=0.1225, cr_loss=0.3398, over 20963.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3709, over 4119745.80 frames. ], batch size: 48, lr: 2.50e-03, grad_scale: 16.0 2024-09-17 10:39:36,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=590656.5, ans=0.125 2024-09-17 10:39:39,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=590656.5, ans=0.125 2024-09-17 10:39:42,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=590684.8333333334, ans=0.2 2024-09-17 10:39:48,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=22.5 2024-09-17 10:39:49,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=590684.8333333334, ans=0.0 2024-09-17 10:39:54,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=590684.8333333334, ans=0.125 2024-09-17 10:39:54,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=590684.8333333334, ans=0.2 2024-09-17 10:40:29,086 INFO [train.py:1198] (0/2) Epoch 33, batch 4000, loss[loss=0.2279, ctc_loss=0.1533, cr_loss=0.3733, over 20019.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1483, cr_loss=0.3726, over 4113040.03 frames. ], batch size: 80, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:40:43,787 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.235e+02 2.348e+02 2.541e+02 3.717e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-17 10:40:53,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=590798.1666666666, ans=0.2 2024-09-17 10:41:17,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590854.8333333334, ans=0.1 2024-09-17 10:41:19,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=590854.8333333334, ans=0.125 2024-09-17 10:41:47,666 INFO [train.py:1198] (0/2) Epoch 33, batch 4050, loss[loss=0.1875, ctc_loss=0.1218, cr_loss=0.3284, over 20973.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3711, over 4111928.47 frames. ], batch size: 49, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:42:01,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=590939.8333333334, ans=0.125 2024-09-17 10:42:24,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=590968.1666666666, ans=0.0 2024-09-17 10:42:31,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=22.5 2024-09-17 10:42:38,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=590996.5, ans=0.125 2024-09-17 10:42:41,810 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:43:03,928 INFO [train.py:1198] (0/2) Epoch 33, batch 4100, loss[loss=0.2098, ctc_loss=0.1376, cr_loss=0.3611, over 20976.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1475, cr_loss=0.3717, over 4109757.05 frames. ], batch size: 55, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:43:18,825 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.171e+02 2.292e+02 2.440e+02 3.120e+02, threshold=4.584e+02, percent-clipped=0.0 2024-09-17 10:43:52,645 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2024-09-17 10:44:16,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-09-17 10:44:19,227 INFO [train.py:1198] (0/2) Epoch 33, batch 4150, loss[loss=0.2315, ctc_loss=0.1541, cr_loss=0.3871, over 21063.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1486, cr_loss=0.3734, over 4097565.70 frames. ], batch size: 62, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:44:44,158 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-17 10:45:03,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=591251.5, ans=0.05 2024-09-17 10:45:13,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-09-17 10:45:17,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=591279.8333333334, ans=0.125 2024-09-17 10:45:17,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=591279.8333333334, ans=0.125 2024-09-17 10:45:37,778 INFO [train.py:1198] (0/2) Epoch 33, batch 4200, loss[loss=0.2029, ctc_loss=0.1345, cr_loss=0.3422, over 20871.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1489, cr_loss=0.3735, over 4099881.22 frames. ], batch size: 54, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:45:52,929 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.187e+02 2.279e+02 2.480e+02 3.393e+02, threshold=4.558e+02, percent-clipped=0.0 2024-09-17 10:46:08,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=591393.1666666666, ans=0.0 2024-09-17 10:46:40,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=591449.8333333334, ans=0.125 2024-09-17 10:46:44,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=591449.8333333334, ans=0.125 2024-09-17 10:46:46,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=591449.8333333334, ans=0.0 2024-09-17 10:46:49,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=591449.8333333334, ans=0.0 2024-09-17 10:46:56,860 INFO [train.py:1198] (0/2) Epoch 33, batch 4250, loss[loss=0.1962, ctc_loss=0.1277, cr_loss=0.3428, over 20966.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3736, over 4093526.44 frames. ], batch size: 50, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:47:29,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=591534.8333333334, ans=0.0 2024-09-17 10:48:12,873 INFO [train.py:1198] (0/2) Epoch 33, batch 4300, loss[loss=0.2053, ctc_loss=0.1351, cr_loss=0.3514, over 21061.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1495, cr_loss=0.3739, over 4082812.38 frames. ], batch size: 56, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:48:28,190 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.143e+02 2.274e+02 2.457e+02 3.121e+02, threshold=4.548e+02, percent-clipped=0.0 2024-09-17 10:49:09,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=591704.8333333334, ans=0.0 2024-09-17 10:49:28,736 INFO [train.py:1198] (0/2) Epoch 33, batch 4350, loss[loss=0.1872, ctc_loss=0.121, cr_loss=0.3309, over 20948.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1494, cr_loss=0.3744, over 4098499.14 frames. ], batch size: 51, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:50:00,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=591818.1666666666, ans=0.0 2024-09-17 10:50:01,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=22.5 2024-09-17 10:50:18,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=591846.5, ans=0.07 2024-09-17 10:50:22,320 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-09-17 10:50:29,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=591846.5, ans=0.125 2024-09-17 10:50:34,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=591874.8333333334, ans=0.125 2024-09-17 10:50:47,197 INFO [train.py:1198] (0/2) Epoch 33, batch 4400, loss[loss=0.2449, ctc_loss=0.1649, cr_loss=0.3996, over 21026.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1497, cr_loss=0.3745, over 4092450.07 frames. ], batch size: 61, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:51:02,378 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.166e+02 2.312e+02 2.520e+02 4.444e+02, threshold=4.624e+02, percent-clipped=0.0 2024-09-17 10:51:19,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=591959.8333333334, ans=0.125 2024-09-17 10:52:02,414 INFO [train.py:1198] (0/2) Epoch 33, batch 4450, loss[loss=0.2367, ctc_loss=0.1575, cr_loss=0.3963, over 20986.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1493, cr_loss=0.3735, over 4082323.21 frames. ], batch size: 67, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:52:10,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=592044.8333333334, ans=0.125 2024-09-17 10:52:41,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=592101.5, ans=0.125 2024-09-17 10:52:42,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=592101.5, ans=0.125 2024-09-17 10:53:04,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=15.0 2024-09-17 10:53:08,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=592158.1666666666, ans=0.07 2024-09-17 10:53:21,553 INFO [train.py:1198] (0/2) Epoch 33, batch 4500, loss[loss=0.2189, ctc_loss=0.1469, cr_loss=0.3599, over 20775.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1498, cr_loss=0.3743, over 4075806.72 frames. ], batch size: 56, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:53:36,874 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.218e+02 2.378e+02 2.567e+02 3.438e+02, threshold=4.756e+02, percent-clipped=0.0 2024-09-17 10:53:52,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=592243.1666666666, ans=0.0 2024-09-17 10:54:37,118 INFO [train.py:1198] (0/2) Epoch 33, batch 4550, loss[loss=0.2437, ctc_loss=0.1614, cr_loss=0.4113, over 20999.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1488, cr_loss=0.3726, over 4070744.44 frames. ], batch size: 61, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:55:12,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=592384.8333333334, ans=0.0 2024-09-17 10:55:41,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=592441.5, ans=0.0 2024-09-17 10:55:42,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.91 vs. limit=12.0 2024-09-17 10:55:53,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=592441.5, ans=0.0 2024-09-17 10:55:56,012 INFO [train.py:1198] (0/2) Epoch 33, batch 4600, loss[loss=0.2344, ctc_loss=0.1537, cr_loss=0.4032, over 20858.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1484, cr_loss=0.3723, over 4088853.00 frames. ], batch size: 65, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:56:08,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-09-17 10:56:11,057 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.160e+02 2.269e+02 2.458e+02 3.109e+02, threshold=4.539e+02, percent-clipped=0.0 2024-09-17 10:56:23,587 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-09-17 10:56:49,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=22.5 2024-09-17 10:57:09,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-09-17 10:57:11,851 INFO [train.py:1198] (0/2) Epoch 33, batch 4650, loss[loss=0.2455, ctc_loss=0.164, cr_loss=0.4077, over 20640.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1483, cr_loss=0.3722, over 4094530.42 frames. ], batch size: 71, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:57:16,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=592611.5, ans=0.2 2024-09-17 10:57:54,706 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=22.5 2024-09-17 10:58:02,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=592696.5, ans=0.025 2024-09-17 10:58:08,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=592696.5, ans=0.0 2024-09-17 10:58:13,304 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-09-17 10:58:28,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=592724.8333333334, ans=0.035 2024-09-17 10:58:30,784 INFO [train.py:1198] (0/2) Epoch 33, batch 4700, loss[loss=0.1998, ctc_loss=0.1307, cr_loss=0.3455, over 19886.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1493, cr_loss=0.3735, over 4082317.70 frames. ], batch size: 44, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:58:32,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=592753.1666666666, ans=0.1 2024-09-17 10:58:44,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=592781.5, ans=0.125 2024-09-17 10:58:45,759 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.187e+02 2.314e+02 2.497e+02 7.068e+02, threshold=4.628e+02, percent-clipped=1.0 2024-09-17 10:58:49,200 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:59:07,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=592809.8333333334, ans=0.2 2024-09-17 10:59:11,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=592809.8333333334, ans=0.125 2024-09-17 10:59:33,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=592866.5, ans=0.125 2024-09-17 10:59:46,429 INFO [train.py:1198] (0/2) Epoch 33, batch 4750, loss[loss=0.2267, ctc_loss=0.1502, cr_loss=0.3821, over 21011.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1492, cr_loss=0.3738, over 4084894.10 frames. ], batch size: 61, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 10:59:55,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=592894.8333333334, ans=0.015 2024-09-17 11:00:57,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=593008.1666666666, ans=0.07 2024-09-17 11:01:02,126 INFO [train.py:1198] (0/2) Epoch 33, batch 4800, loss[loss=0.2231, ctc_loss=0.1471, cr_loss=0.3798, over 21085.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1484, cr_loss=0.3727, over 4083612.97 frames. ], batch size: 59, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 11:01:16,781 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.129e+02 2.272e+02 2.423e+02 3.028e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-17 11:02:19,493 INFO [train.py:1198] (0/2) Epoch 33, batch 4850, loss[loss=0.1789, ctc_loss=0.1173, cr_loss=0.3079, over 20941.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3728, over 4078192.13 frames. ], batch size: 48, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 11:02:45,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=593206.5, ans=0.0 2024-09-17 11:02:50,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-17 11:03:03,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=593263.1666666666, ans=12.0 2024-09-17 11:03:34,553 INFO [train.py:1198] (0/2) Epoch 33, batch 4900, loss[loss=0.1931, ctc_loss=0.1281, cr_loss=0.325, over 20967.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1492, cr_loss=0.374, over 4088649.82 frames. ], batch size: 50, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 11:03:49,426 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.200e+02 2.295e+02 2.472e+02 3.096e+02, threshold=4.590e+02, percent-clipped=0.0 2024-09-17 11:04:01,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=593348.1666666666, ans=0.0 2024-09-17 11:04:26,587 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=22.5 2024-09-17 11:04:36,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=12.0 2024-09-17 11:04:52,633 INFO [train.py:1198] (0/2) Epoch 33, batch 4950, loss[loss=0.2506, ctc_loss=0.1691, cr_loss=0.4076, over 19473.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1496, cr_loss=0.3739, over 4088569.84 frames. ], batch size: 90, lr: 2.50e-03, grad_scale: 32.0 2024-09-17 11:05:38,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=593546.5, ans=0.125 2024-09-17 11:05:50,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=593546.5, ans=0.1 2024-09-17 11:05:57,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=593574.8333333334, ans=0.125 2024-09-17 11:06:06,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=593603.1666666666, ans=0.125 2024-09-17 11:06:07,817 INFO [train.py:1198] (0/2) Epoch 33, batch 5000, loss[loss=0.2403, ctc_loss=0.1584, cr_loss=0.4097, over 20964.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3734, over 4088594.20 frames. ], batch size: 64, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:06:11,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=593603.1666666666, ans=0.025 2024-09-17 11:06:20,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=22.5 2024-09-17 11:06:22,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.228e+02 2.357e+02 2.496e+02 5.510e+02, threshold=4.715e+02, percent-clipped=1.0 2024-09-17 11:06:27,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=593631.5, ans=0.035 2024-09-17 11:07:22,147 INFO [train.py:1198] (0/2) Epoch 33, batch 5050, loss[loss=0.254, ctc_loss=0.171, cr_loss=0.4148, over 20649.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1495, cr_loss=0.3734, over 4080330.68 frames. ], batch size: 71, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:08:29,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=593858.1666666666, ans=0.025 2024-09-17 11:08:36,412 INFO [train.py:1198] (0/2) Epoch 33, batch 5100, loss[loss=0.2047, ctc_loss=0.1348, cr_loss=0.3495, over 20673.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1494, cr_loss=0.3732, over 4077846.67 frames. ], batch size: 68, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:08:42,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=593886.5, ans=0.125 2024-09-17 11:08:45,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=593886.5, ans=0.1 2024-09-17 11:08:52,872 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.186e+02 2.310e+02 2.516e+02 3.575e+02, threshold=4.619e+02, percent-clipped=0.0 2024-09-17 11:09:04,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=593943.1666666666, ans=0.125 2024-09-17 11:09:42,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=593999.8333333334, ans=0.125 2024-09-17 11:09:50,595 INFO [train.py:1198] (0/2) Epoch 33, batch 5150, loss[loss=0.2076, ctc_loss=0.1374, cr_loss=0.3514, over 20768.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1487, cr_loss=0.3718, over 4073511.73 frames. ], batch size: 53, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:10:08,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2024-09-17 11:10:11,022 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-17 11:10:26,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=594084.8333333334, ans=0.04949747468305833 2024-09-17 11:10:36,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.16 vs. limit=6.0 2024-09-17 11:10:47,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=594113.1666666666, ans=0.5 2024-09-17 11:11:06,414 INFO [train.py:1198] (0/2) Epoch 33, batch 5200, loss[loss=0.1952, ctc_loss=0.1273, cr_loss=0.3399, over 20977.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.148, cr_loss=0.371, over 4084265.30 frames. ], batch size: 49, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:11:18,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=594169.8333333334, ans=0.0 2024-09-17 11:11:22,607 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.138e+02 2.278e+02 2.479e+02 3.410e+02, threshold=4.556e+02, percent-clipped=0.0 2024-09-17 11:11:28,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=594198.1666666666, ans=0.125 2024-09-17 11:11:39,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=594226.5, ans=0.125 2024-09-17 11:11:51,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.45 vs. limit=10.0 2024-09-17 11:12:13,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594283.1666666666, ans=0.1 2024-09-17 11:12:16,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=594283.1666666666, ans=0.125 2024-09-17 11:12:20,591 INFO [train.py:1198] (0/2) Epoch 33, batch 5250, loss[loss=0.2328, ctc_loss=0.1556, cr_loss=0.3861, over 20851.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1488, cr_loss=0.3725, over 4080882.07 frames. ], batch size: 65, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:12:29,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=594311.5, ans=0.0 2024-09-17 11:13:02,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=594368.1666666666, ans=0.125 2024-09-17 11:13:05,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=594396.5, ans=0.125 2024-09-17 11:13:06,167 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2024-09-17 11:13:26,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=594424.8333333334, ans=0.0 2024-09-17 11:13:28,532 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=12.0 2024-09-17 11:13:31,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=594424.8333333334, ans=0.125 2024-09-17 11:13:35,063 INFO [train.py:1198] (0/2) Epoch 33, batch 5300, loss[loss=0.2491, ctc_loss=0.1696, cr_loss=0.3973, over 19543.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1484, cr_loss=0.3721, over 4086758.26 frames. ], batch size: 90, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:13:51,320 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.200e+02 2.302e+02 2.433e+02 3.289e+02, threshold=4.604e+02, percent-clipped=0.0 2024-09-17 11:13:59,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=594481.5, ans=0.0 2024-09-17 11:14:11,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=594509.8333333334, ans=0.125 2024-09-17 11:14:25,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=594538.1666666666, ans=0.1 2024-09-17 11:14:46,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=594566.5, ans=0.0 2024-09-17 11:14:51,767 INFO [train.py:1198] (0/2) Epoch 33, batch 5350, loss[loss=0.2232, ctc_loss=0.1478, cr_loss=0.3773, over 21041.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3725, over 4088951.72 frames. ], batch size: 56, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:14:54,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2024-09-17 11:15:38,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-09-17 11:15:56,665 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:16:00,043 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-17 11:16:06,814 INFO [train.py:1198] (0/2) Epoch 33, batch 5400, loss[loss=0.194, ctc_loss=0.1245, cr_loss=0.3475, over 20878.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1483, cr_loss=0.3726, over 4090026.39 frames. ], batch size: 54, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:16:20,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=594764.8333333334, ans=0.0 2024-09-17 11:16:22,961 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.196e+02 2.326e+02 2.483e+02 3.555e+02, threshold=4.652e+02, percent-clipped=0.0 2024-09-17 11:17:04,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=594849.8333333334, ans=0.1 2024-09-17 11:17:06,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=594849.8333333334, ans=0.0 2024-09-17 11:17:20,892 INFO [train.py:1198] (0/2) Epoch 33, batch 5450, loss[loss=0.28, ctc_loss=0.1886, cr_loss=0.457, over 20006.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1489, cr_loss=0.3732, over 4085625.24 frames. ], batch size: 80, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:17:24,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=594878.1666666666, ans=0.125 2024-09-17 11:18:22,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=594991.5, ans=0.5 2024-09-17 11:18:35,396 INFO [train.py:1198] (0/2) Epoch 33, batch 5500, loss[loss=0.2633, ctc_loss=0.1814, cr_loss=0.4093, over 20073.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.3731, over 4091361.83 frames. ], batch size: 80, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:18:41,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=595019.8333333334, ans=0.0 2024-09-17 11:18:44,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=595019.8333333334, ans=0.2 2024-09-17 11:18:51,725 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.209e+02 2.354e+02 2.512e+02 7.886e+02, threshold=4.708e+02, percent-clipped=2.0 2024-09-17 11:18:57,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595048.1666666666, ans=0.1 2024-09-17 11:19:19,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=595076.5, ans=0.035 2024-09-17 11:19:51,998 INFO [train.py:1198] (0/2) Epoch 33, batch 5550, loss[loss=0.2174, ctc_loss=0.1441, cr_loss=0.3665, over 21068.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1492, cr_loss=0.3731, over 4094838.51 frames. ], batch size: 56, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:20:20,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595218.1666666666, ans=0.1 2024-09-17 11:20:50,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=595274.8333333334, ans=0.5 2024-09-17 11:20:50,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=22.5 2024-09-17 11:20:51,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=595274.8333333334, ans=0.09899494936611666 2024-09-17 11:21:06,303 INFO [train.py:1198] (0/2) Epoch 33, batch 5600, loss[loss=0.2155, ctc_loss=0.1436, cr_loss=0.3595, over 20941.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1485, cr_loss=0.3717, over 4083936.08 frames. ], batch size: 60, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:21:09,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=595303.1666666666, ans=0.0 2024-09-17 11:21:22,684 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.150e+02 2.319e+02 2.523e+02 3.734e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-17 11:21:30,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=595331.5, ans=0.125 2024-09-17 11:21:37,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=595359.8333333334, ans=0.0 2024-09-17 11:21:51,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=595388.1666666666, ans=0.125 2024-09-17 11:21:57,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=12.0 2024-09-17 11:22:08,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=595416.5, ans=0.125 2024-09-17 11:22:21,066 INFO [train.py:1198] (0/2) Epoch 33, batch 5650, loss[loss=0.2544, ctc_loss=0.17, cr_loss=0.4218, over 20934.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1492, cr_loss=0.3733, over 4092001.03 frames. ], batch size: 67, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:23:05,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=595501.5, ans=0.125 2024-09-17 11:23:05,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=595501.5, ans=0.125 2024-09-17 11:23:37,925 INFO [train.py:1198] (0/2) Epoch 33, batch 5700, loss[loss=0.2346, ctc_loss=0.1564, cr_loss=0.3912, over 21029.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.149, cr_loss=0.3729, over 4106572.12 frames. ], batch size: 62, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:23:47,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-17 11:23:47,661 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2024-09-17 11:23:55,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.14 vs. limit=6.0 2024-09-17 11:23:55,774 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.209e+02 2.343e+02 2.527e+02 3.731e+02, threshold=4.685e+02, percent-clipped=0.0 2024-09-17 11:23:57,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=595614.8333333334, ans=0.025 2024-09-17 11:24:03,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595614.8333333334, ans=0.1 2024-09-17 11:24:23,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=595671.5, ans=0.125 2024-09-17 11:24:44,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=595699.8333333334, ans=0.2 2024-09-17 11:24:52,046 INFO [train.py:1198] (0/2) Epoch 33, batch 5750, loss[loss=0.2032, ctc_loss=0.1333, cr_loss=0.3493, over 20979.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3733, over 4111125.76 frames. ], batch size: 52, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:25:01,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-09-17 11:25:02,841 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.59 vs. limit=10.0 2024-09-17 11:25:27,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-17 11:25:54,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=595841.5, ans=0.0 2024-09-17 11:26:06,166 INFO [train.py:1198] (0/2) Epoch 33, batch 5800, loss[loss=0.1731, ctc_loss=0.1139, cr_loss=0.2962, over 19796.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1479, cr_loss=0.3715, over 4108910.10 frames. ], batch size: 44, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:26:15,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=595869.8333333334, ans=0.125 2024-09-17 11:26:24,192 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.134e+02 2.304e+02 2.444e+02 4.548e+02, threshold=4.608e+02, percent-clipped=0.0 2024-09-17 11:26:24,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2024-09-17 11:26:29,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=595898.1666666666, ans=0.125 2024-09-17 11:26:41,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-17 11:27:12,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=595983.1666666666, ans=0.015 2024-09-17 11:27:22,644 INFO [train.py:1198] (0/2) Epoch 33, batch 5850, loss[loss=0.2152, ctc_loss=0.1456, cr_loss=0.348, over 20962.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1478, cr_loss=0.3713, over 4102064.90 frames. ], batch size: 58, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:27:58,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=596068.1666666666, ans=0.125 2024-09-17 11:28:28,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=596124.8333333334, ans=0.125 2024-09-17 11:28:31,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=596124.8333333334, ans=0.125 2024-09-17 11:28:37,379 INFO [train.py:1198] (0/2) Epoch 33, batch 5900, loss[loss=0.1715, ctc_loss=0.1121, cr_loss=0.2969, over 20291.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1474, cr_loss=0.3707, over 4102775.84 frames. ], batch size: 45, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:28:40,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=596153.1666666666, ans=0.95 2024-09-17 11:28:40,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=596153.1666666666, ans=0.125 2024-09-17 11:28:55,045 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.186e+02 2.299e+02 2.506e+02 6.411e+02, threshold=4.599e+02, percent-clipped=1.0 2024-09-17 11:29:13,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=596209.8333333334, ans=0.125 2024-09-17 11:29:19,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=596209.8333333334, ans=0.125 2024-09-17 11:29:38,942 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=22.5 2024-09-17 11:29:43,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-17 11:29:51,591 INFO [train.py:1198] (0/2) Epoch 33, batch 5950, loss[loss=0.2376, ctc_loss=0.1589, cr_loss=0.3936, over 21024.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.147, cr_loss=0.3701, over 4094584.90 frames. ], batch size: 63, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:30:06,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=596323.1666666666, ans=0.125 2024-09-17 11:30:11,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=596323.1666666666, ans=0.0 2024-09-17 11:30:54,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596408.1666666666, ans=0.1 2024-09-17 11:31:01,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=596408.1666666666, ans=0.1 2024-09-17 11:31:05,513 INFO [train.py:1198] (0/2) Epoch 33, batch 6000, loss[loss=0.2332, ctc_loss=0.1559, cr_loss=0.3863, over 21034.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1476, cr_loss=0.3708, over 4080719.98 frames. ], batch size: 62, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:31:05,514 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 11:31:27,785 INFO [train.py:1230] (0/2) Epoch 33, validation: loss=0.04003, ctc_loss=0.04003, cr_loss=1.289e-14, over 944034.00 frames. 2024-09-17 11:31:27,786 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 11:31:28,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.77 vs. limit=15.0 2024-09-17 11:31:32,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596436.5, ans=0.1 2024-09-17 11:31:44,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=596464.8333333334, ans=0.2 2024-09-17 11:31:45,820 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 2.162e+02 2.293e+02 2.434e+02 5.495e+02, threshold=4.587e+02, percent-clipped=1.0 2024-09-17 11:31:59,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=596493.1666666666, ans=0.05 2024-09-17 11:32:18,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=596521.5, ans=0.2 2024-09-17 11:32:22,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=596521.5, ans=0.0 2024-09-17 11:32:27,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=596549.8333333334, ans=10.0 2024-09-17 11:32:32,837 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:32:42,841 INFO [train.py:1198] (0/2) Epoch 33, batch 6050, loss[loss=0.2329, ctc_loss=0.1575, cr_loss=0.3769, over 20674.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1481, cr_loss=0.3718, over 4077511.98 frames. ], batch size: 68, lr: 2.49e-03, grad_scale: 32.0 2024-09-17 11:32:59,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596606.5, ans=0.1 2024-09-17 11:33:01,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=596606.5, ans=0.025 2024-09-17 11:33:16,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-17 11:33:32,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=596663.1666666666, ans=0.025 2024-09-17 11:33:57,061 INFO [train.py:1198] (0/2) Epoch 33, batch 6100, loss[loss=0.2352, ctc_loss=0.1558, cr_loss=0.3969, over 20856.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1488, cr_loss=0.3737, over 4075454.41 frames. ], batch size: 65, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:34:16,624 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.193e+02 2.300e+02 2.454e+02 3.475e+02, threshold=4.600e+02, percent-clipped=0.0 2024-09-17 11:34:22,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=596748.1666666666, ans=0.125 2024-09-17 11:35:01,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=596833.1666666666, ans=0.0 2024-09-17 11:35:05,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=596833.1666666666, ans=0.2 2024-09-17 11:35:12,498 INFO [train.py:1198] (0/2) Epoch 33, batch 6150, loss[loss=0.2346, ctc_loss=0.1565, cr_loss=0.3906, over 21017.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1495, cr_loss=0.3743, over 4073940.71 frames. ], batch size: 61, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:35:38,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=596889.8333333334, ans=0.125 2024-09-17 11:35:44,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=596918.1666666666, ans=0.0 2024-09-17 11:35:53,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=596918.1666666666, ans=0.05 2024-09-17 11:36:09,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=596946.5, ans=0.0 2024-09-17 11:36:26,607 INFO [train.py:1198] (0/2) Epoch 33, batch 6200, loss[loss=0.2402, ctc_loss=0.1585, cr_loss=0.4081, over 20841.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1502, cr_loss=0.3749, over 4074005.39 frames. ], batch size: 59, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:36:45,826 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.172e+02 2.361e+02 2.503e+02 4.032e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-17 11:36:46,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=597031.5, ans=0.0 2024-09-17 11:37:01,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=597059.8333333334, ans=0.125 2024-09-17 11:37:19,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597088.1666666666, ans=0.1 2024-09-17 11:37:33,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-09-17 11:37:40,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=597144.8333333334, ans=0.0 2024-09-17 11:37:41,375 INFO [train.py:1198] (0/2) Epoch 33, batch 6250, loss[loss=0.2731, ctc_loss=0.1919, cr_loss=0.406, over 14405.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1501, cr_loss=0.3738, over 4056179.68 frames. ], batch size: 149, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:37:58,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=597173.1666666666, ans=0.125 2024-09-17 11:38:43,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=597258.1666666666, ans=0.125 2024-09-17 11:38:44,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-17 11:38:56,977 INFO [train.py:1198] (0/2) Epoch 33, batch 6300, loss[loss=0.1773, ctc_loss=0.1161, cr_loss=0.3062, over 19848.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1506, cr_loss=0.3745, over 4036728.11 frames. ], batch size: 44, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:39:16,369 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.162e+02 2.295e+02 2.523e+02 3.244e+02, threshold=4.590e+02, percent-clipped=0.0 2024-09-17 11:39:26,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=597343.1666666666, ans=0.125 2024-09-17 11:39:53,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=597371.5, ans=0.125 2024-09-17 11:40:10,061 INFO [train.py:1198] (0/2) Epoch 33, batch 6350, loss[loss=0.2686, ctc_loss=0.1884, cr_loss=0.401, over 14522.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1521, cr_loss=0.3736, over 3917316.34 frames. ], batch size: 150, lr: 2.49e-03, grad_scale: 16.0 2024-09-17 11:40:16,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2024-09-17 11:40:52,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=597513.1666666666, ans=0.125 2024-09-17 11:41:06,914 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-33.pt 2024-09-17 11:41:57,293 INFO [train.py:1198] (0/2) Epoch 34, batch 0, loss[loss=0.2406, ctc_loss=0.1613, cr_loss=0.3965, over 20768.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1613, cr_loss=0.3965, over 20768.00 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:41:57,294 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 11:42:15,792 INFO [train.py:1230] (0/2) Epoch 34, validation: loss=0.04044, ctc_loss=0.04044, cr_loss=1.35e-14, over 944034.00 frames. 2024-09-17 11:42:15,793 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 11:42:20,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=597541.5, ans=0.125 2024-09-17 11:42:48,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=597598.1666666666, ans=0.0 2024-09-17 11:42:52,372 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.346e+02 2.669e+02 2.868e+02 6.385e+02, threshold=5.337e+02, percent-clipped=1.0 2024-09-17 11:43:07,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=597626.5, ans=0.0 2024-09-17 11:43:32,916 INFO [train.py:1198] (0/2) Epoch 34, batch 50, loss[loss=0.2312, ctc_loss=0.151, cr_loss=0.4009, over 21040.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1477, cr_loss=0.3736, over 926388.58 frames. ], batch size: 62, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:44:03,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597739.8333333334, ans=0.1 2024-09-17 11:44:14,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=597739.8333333334, ans=0.2 2024-09-17 11:44:15,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-17 11:44:26,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=597768.1666666666, ans=0.125 2024-09-17 11:44:49,238 INFO [train.py:1198] (0/2) Epoch 34, batch 100, loss[loss=0.1807, ctc_loss=0.1183, cr_loss=0.3122, over 20959.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3706, over 1640854.06 frames. ], batch size: 49, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:44:57,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=597824.8333333334, ans=0.125 2024-09-17 11:45:00,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=597824.8333333334, ans=0.0 2024-09-17 11:45:04,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=7.84 vs. limit=22.5 2024-09-17 11:45:11,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=597853.1666666666, ans=0.0 2024-09-17 11:45:17,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.74 vs. limit=15.0 2024-09-17 11:45:21,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=597881.5, ans=0.2 2024-09-17 11:45:24,287 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.127e+02 2.241e+02 2.411e+02 3.379e+02, threshold=4.483e+02, percent-clipped=0.0 2024-09-17 11:45:38,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=22.5 2024-09-17 11:45:39,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=597909.8333333334, ans=0.125 2024-09-17 11:45:59,874 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:46:05,619 INFO [train.py:1198] (0/2) Epoch 34, batch 150, loss[loss=0.2323, ctc_loss=0.1586, cr_loss=0.3684, over 20860.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3692, over 2177791.00 frames. ], batch size: 65, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:46:43,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=598023.1666666666, ans=0.0 2024-09-17 11:47:10,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=598079.8333333334, ans=0.0 2024-09-17 11:47:21,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=22.5 2024-09-17 11:47:23,787 INFO [train.py:1198] (0/2) Epoch 34, batch 200, loss[loss=0.2306, ctc_loss=0.1528, cr_loss=0.389, over 20818.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.146, cr_loss=0.3692, over 2596263.97 frames. ], batch size: 59, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:47:31,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=598108.1666666666, ans=0.0 2024-09-17 11:47:34,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-09-17 11:47:49,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=598136.5, ans=0.0 2024-09-17 11:47:58,281 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.160e+02 2.334e+02 2.503e+02 3.108e+02, threshold=4.668e+02, percent-clipped=0.0 2024-09-17 11:48:33,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598221.5, ans=0.1 2024-09-17 11:48:34,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598221.5, ans=0.1 2024-09-17 11:48:41,965 INFO [train.py:1198] (0/2) Epoch 34, batch 250, loss[loss=0.2361, ctc_loss=0.1566, cr_loss=0.3977, over 20687.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1459, cr_loss=0.369, over 2934064.61 frames. ], batch size: 68, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:48:45,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598249.8333333334, ans=0.1 2024-09-17 11:49:06,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=598278.1666666666, ans=0.025 2024-09-17 11:49:06,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=598278.1666666666, ans=0.125 2024-09-17 11:49:41,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=598363.1666666666, ans=0.125 2024-09-17 11:49:45,084 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2024-09-17 11:49:50,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=598363.1666666666, ans=0.125 2024-09-17 11:49:57,316 INFO [train.py:1198] (0/2) Epoch 34, batch 300, loss[loss=0.2555, ctc_loss=0.1741, cr_loss=0.4071, over 20673.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1474, cr_loss=0.3712, over 3196453.89 frames. ], batch size: 71, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:50:32,128 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.223e+02 2.344e+02 2.528e+02 4.380e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-17 11:50:37,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598448.1666666666, ans=0.1 2024-09-17 11:51:02,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=598504.8333333334, ans=0.125 2024-09-17 11:51:12,863 INFO [train.py:1198] (0/2) Epoch 34, batch 350, loss[loss=0.1932, ctc_loss=0.1272, cr_loss=0.3299, over 20962.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1464, cr_loss=0.3693, over 3405554.71 frames. ], batch size: 50, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:51:37,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-17 11:52:17,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=598646.5, ans=0.125 2024-09-17 11:52:28,317 INFO [train.py:1198] (0/2) Epoch 34, batch 400, loss[loss=0.1741, ctc_loss=0.1117, cr_loss=0.312, over 21001.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1477, cr_loss=0.3709, over 3550556.91 frames. ], batch size: 52, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:52:47,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=598703.1666666666, ans=0.0 2024-09-17 11:53:06,273 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.144e+02 2.252e+02 2.429e+02 3.054e+02, threshold=4.504e+02, percent-clipped=0.0 2024-09-17 11:53:11,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=598731.5, ans=0.125 2024-09-17 11:53:11,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=598731.5, ans=0.2 2024-09-17 11:53:14,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=598731.5, ans=0.0 2024-09-17 11:53:47,065 INFO [train.py:1198] (0/2) Epoch 34, batch 450, loss[loss=0.2549, ctc_loss=0.1791, cr_loss=0.3793, over 14292.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1466, cr_loss=0.3698, over 3678570.66 frames. ], batch size: 149, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:53:50,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=598816.5, ans=0.125 2024-09-17 11:54:21,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=598873.1666666666, ans=0.125 2024-09-17 11:54:26,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=12.0 2024-09-17 11:55:06,493 INFO [train.py:1198] (0/2) Epoch 34, batch 500, loss[loss=0.2329, ctc_loss=0.1564, cr_loss=0.3822, over 20807.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1467, cr_loss=0.3687, over 3753966.96 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:55:22,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2024-09-17 11:55:27,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=22.5 2024-09-17 11:55:41,590 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.164e+02 2.301e+02 2.454e+02 3.791e+02, threshold=4.601e+02, percent-clipped=0.0 2024-09-17 11:56:06,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-17 11:56:19,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=599071.5, ans=0.0 2024-09-17 11:56:22,227 INFO [train.py:1198] (0/2) Epoch 34, batch 550, loss[loss=0.2317, ctc_loss=0.1562, cr_loss=0.3773, over 18226.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1473, cr_loss=0.3691, over 3823017.79 frames. ], batch size: 108, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:56:32,102 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2024-09-17 11:56:49,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=599128.1666666666, ans=0.125 2024-09-17 11:56:57,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=599156.5, ans=0.125 2024-09-17 11:57:00,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=599156.5, ans=0.1 2024-09-17 11:57:12,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-17 11:57:18,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=599184.8333333334, ans=0.0 2024-09-17 11:57:19,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=599184.8333333334, ans=0.125 2024-09-17 11:57:24,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=599213.1666666666, ans=0.2 2024-09-17 11:57:28,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=599213.1666666666, ans=0.0 2024-09-17 11:57:37,684 INFO [train.py:1198] (0/2) Epoch 34, batch 600, loss[loss=0.2362, ctc_loss=0.1567, cr_loss=0.3975, over 20707.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1478, cr_loss=0.3702, over 3884144.39 frames. ], batch size: 66, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:57:51,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-09-17 11:57:57,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=15.0 2024-09-17 11:58:12,105 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.172e+02 2.291e+02 2.524e+02 3.924e+02, threshold=4.581e+02, percent-clipped=0.0 2024-09-17 11:58:18,876 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.87 vs. limit=10.0 2024-09-17 11:58:42,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=599354.8333333334, ans=0.2 2024-09-17 11:58:45,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=15.0 2024-09-17 11:58:56,077 INFO [train.py:1198] (0/2) Epoch 34, batch 650, loss[loss=0.2274, ctc_loss=0.1527, cr_loss=0.3736, over 20348.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1479, cr_loss=0.3702, over 3922649.98 frames. ], batch size: 74, lr: 2.45e-03, grad_scale: 32.0 2024-09-17 11:58:59,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=599383.1666666666, ans=0.0 2024-09-17 11:59:05,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=599383.1666666666, ans=0.0 2024-09-17 11:59:57,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2024-09-17 12:00:14,813 INFO [train.py:1198] (0/2) Epoch 34, batch 700, loss[loss=0.2143, ctc_loss=0.1401, cr_loss=0.3707, over 20963.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1478, cr_loss=0.3705, over 3965090.04 frames. ], batch size: 55, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:00:36,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=599553.1666666666, ans=0.125 2024-09-17 12:00:45,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=599581.5, ans=0.05 2024-09-17 12:00:49,599 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.162e+02 2.272e+02 2.432e+02 7.544e+02, threshold=4.544e+02, percent-clipped=1.0 2024-09-17 12:00:53,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=15.0 2024-09-17 12:01:09,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=599609.8333333334, ans=0.125 2024-09-17 12:01:27,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-09-17 12:01:29,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=599666.5, ans=0.2 2024-09-17 12:01:30,756 INFO [train.py:1198] (0/2) Epoch 34, batch 750, loss[loss=0.2404, ctc_loss=0.1593, cr_loss=0.4052, over 20863.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1477, cr_loss=0.3706, over 4003244.95 frames. ], batch size: 65, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:01:41,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=599666.5, ans=0.2 2024-09-17 12:02:27,555 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=12.0 2024-09-17 12:02:39,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=599779.8333333334, ans=0.5 2024-09-17 12:02:45,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=599808.1666666666, ans=0.0 2024-09-17 12:02:46,498 INFO [train.py:1198] (0/2) Epoch 34, batch 800, loss[loss=0.2149, ctc_loss=0.145, cr_loss=0.3497, over 20901.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1479, cr_loss=0.3704, over 4023131.94 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:02:59,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=12.0 2024-09-17 12:03:12,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=599836.5, ans=0.1 2024-09-17 12:03:20,911 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.189e+02 2.313e+02 2.479e+02 6.233e+02, threshold=4.625e+02, percent-clipped=1.0 2024-09-17 12:03:33,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=599893.1666666666, ans=0.0 2024-09-17 12:03:41,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-17 12:04:01,527 INFO [train.py:1198] (0/2) Epoch 34, batch 850, loss[loss=0.2683, ctc_loss=0.1791, cr_loss=0.4464, over 20869.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.149, cr_loss=0.373, over 4052374.59 frames. ], batch size: 65, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:05:19,423 INFO [train.py:1198] (0/2) Epoch 34, batch 900, loss[loss=0.1713, ctc_loss=0.1099, cr_loss=0.3072, over 20949.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.149, cr_loss=0.3726, over 4049168.65 frames. ], batch size: 49, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:05:47,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2024-09-17 12:05:57,384 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.140e+02 2.258e+02 2.425e+02 4.265e+02, threshold=4.516e+02, percent-clipped=0.0 2024-09-17 12:06:22,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=22.5 2024-09-17 12:06:34,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-09-17 12:06:35,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=600204.8333333334, ans=0.125 2024-09-17 12:06:37,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=600233.1666666666, ans=0.2 2024-09-17 12:06:38,366 INFO [train.py:1198] (0/2) Epoch 34, batch 950, loss[loss=0.2618, ctc_loss=0.1774, cr_loss=0.422, over 19405.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1492, cr_loss=0.3723, over 4040886.34 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:06:53,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=600261.5, ans=0.0 2024-09-17 12:06:59,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=600261.5, ans=0.015 2024-09-17 12:07:41,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=600346.5, ans=0.125 2024-09-17 12:07:53,684 INFO [train.py:1198] (0/2) Epoch 34, batch 1000, loss[loss=0.1896, ctc_loss=0.1232, cr_loss=0.332, over 20968.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1489, cr_loss=0.3725, over 4057149.97 frames. ], batch size: 51, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:08:07,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=600403.1666666666, ans=0.125 2024-09-17 12:08:09,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=600403.1666666666, ans=0.0 2024-09-17 12:08:28,494 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.189e+02 2.311e+02 2.508e+02 3.416e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-17 12:08:35,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=600431.5, ans=0.125 2024-09-17 12:08:49,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=600459.8333333334, ans=0.125 2024-09-17 12:09:08,404 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:09:09,394 INFO [train.py:1198] (0/2) Epoch 34, batch 1050, loss[loss=0.2709, ctc_loss=0.1837, cr_loss=0.4356, over 20631.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1486, cr_loss=0.3718, over 4066651.06 frames. ], batch size: 66, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:09:16,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=22.5 2024-09-17 12:09:48,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=600573.1666666666, ans=0.0 2024-09-17 12:10:28,483 INFO [train.py:1198] (0/2) Epoch 34, batch 1100, loss[loss=0.238, ctc_loss=0.1585, cr_loss=0.3976, over 20946.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1488, cr_loss=0.3718, over 4082656.32 frames. ], batch size: 60, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:10:31,632 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-212000.pt 2024-09-17 12:11:04,574 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.160e+02 2.290e+02 2.475e+02 4.873e+02, threshold=4.579e+02, percent-clipped=1.0 2024-09-17 12:11:21,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=600743.1666666666, ans=0.125 2024-09-17 12:11:23,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=600743.1666666666, ans=0.125 2024-09-17 12:11:37,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.40 vs. limit=6.0 2024-09-17 12:11:38,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=600771.5, ans=0.125 2024-09-17 12:11:43,401 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=12.0 2024-09-17 12:11:48,672 INFO [train.py:1198] (0/2) Epoch 34, batch 1150, loss[loss=0.2067, ctc_loss=0.1376, cr_loss=0.3458, over 21060.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1492, cr_loss=0.3729, over 4086378.12 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:12:04,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=600828.1666666666, ans=0.125 2024-09-17 12:12:14,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-17 12:12:17,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2024-09-17 12:12:44,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=600884.8333333334, ans=0.1 2024-09-17 12:13:05,171 INFO [train.py:1198] (0/2) Epoch 34, batch 1200, loss[loss=0.2536, ctc_loss=0.1695, cr_loss=0.4202, over 20870.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1485, cr_loss=0.3727, over 4095484.67 frames. ], batch size: 65, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:13:16,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=600941.5, ans=0.125 2024-09-17 12:13:39,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.188e+02 2.327e+02 2.488e+02 4.386e+02, threshold=4.655e+02, percent-clipped=0.0 2024-09-17 12:14:20,377 INFO [train.py:1198] (0/2) Epoch 34, batch 1250, loss[loss=0.2003, ctc_loss=0.1341, cr_loss=0.3309, over 20988.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1495, cr_loss=0.3737, over 4084210.32 frames. ], batch size: 48, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:14:23,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=601083.1666666666, ans=0.0 2024-09-17 12:14:45,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=601111.5, ans=0.0 2024-09-17 12:15:36,370 INFO [train.py:1198] (0/2) Epoch 34, batch 1300, loss[loss=0.2042, ctc_loss=0.137, cr_loss=0.3358, over 20791.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.3729, over 4073404.04 frames. ], batch size: 53, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:15:56,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=601253.1666666666, ans=0.2 2024-09-17 12:15:59,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=601253.1666666666, ans=0.2 2024-09-17 12:16:01,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=601253.1666666666, ans=0.125 2024-09-17 12:16:01,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=601253.1666666666, ans=0.125 2024-09-17 12:16:14,817 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.131e+02 2.259e+02 2.447e+02 8.802e+02, threshold=4.518e+02, percent-clipped=1.0 2024-09-17 12:16:18,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=601281.5, ans=0.125 2024-09-17 12:16:23,504 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=15.0 2024-09-17 12:16:36,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=601309.8333333334, ans=0.125 2024-09-17 12:16:55,644 INFO [train.py:1198] (0/2) Epoch 34, batch 1350, loss[loss=0.1995, ctc_loss=0.129, cr_loss=0.3522, over 20990.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3721, over 4090460.16 frames. ], batch size: 52, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:17:12,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=601394.8333333334, ans=0.0 2024-09-17 12:17:18,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=601394.8333333334, ans=0.125 2024-09-17 12:17:19,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-17 12:17:32,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=601423.1666666666, ans=0.125 2024-09-17 12:17:37,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2024-09-17 12:18:01,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=601479.8333333334, ans=0.0 2024-09-17 12:18:14,635 INFO [train.py:1198] (0/2) Epoch 34, batch 1400, loss[loss=0.1874, ctc_loss=0.1209, cr_loss=0.3328, over 20969.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1489, cr_loss=0.3731, over 4100422.69 frames. ], batch size: 51, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:18:25,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=601508.1666666666, ans=0.0 2024-09-17 12:18:28,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=601536.5, ans=0.125 2024-09-17 12:18:45,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=601564.8333333334, ans=0.0 2024-09-17 12:18:49,350 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.171e+02 2.295e+02 2.442e+02 3.235e+02, threshold=4.590e+02, percent-clipped=0.0 2024-09-17 12:19:29,939 INFO [train.py:1198] (0/2) Epoch 34, batch 1450, loss[loss=0.223, ctc_loss=0.1503, cr_loss=0.3639, over 21091.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1475, cr_loss=0.3711, over 4113281.94 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:19:51,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0 2024-09-17 12:19:58,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=601706.5, ans=0.125 2024-09-17 12:20:12,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=601706.5, ans=0.125 2024-09-17 12:20:34,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2024-09-17 12:20:42,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=601763.1666666666, ans=0.125 2024-09-17 12:20:42,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=601763.1666666666, ans=0.125 2024-09-17 12:20:45,510 INFO [train.py:1198] (0/2) Epoch 34, batch 1500, loss[loss=0.2034, ctc_loss=0.1331, cr_loss=0.3516, over 20887.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1469, cr_loss=0.3702, over 4109485.82 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:21:05,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=601819.8333333334, ans=0.0 2024-09-17 12:21:19,688 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.192e+02 2.293e+02 2.474e+02 3.224e+02, threshold=4.586e+02, percent-clipped=0.0 2024-09-17 12:21:47,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=601904.8333333334, ans=0.125 2024-09-17 12:22:02,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=601933.1666666666, ans=0.125 2024-09-17 12:22:03,832 INFO [train.py:1198] (0/2) Epoch 34, batch 1550, loss[loss=0.1941, ctc_loss=0.1253, cr_loss=0.3442, over 20203.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.3716, over 4115935.56 frames. ], batch size: 45, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:22:15,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-17 12:23:12,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=602046.5, ans=0.0 2024-09-17 12:23:22,664 INFO [train.py:1198] (0/2) Epoch 34, batch 1600, loss[loss=0.2383, ctc_loss=0.1573, cr_loss=0.4051, over 19459.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1469, cr_loss=0.3708, over 4110021.47 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:23:26,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=602074.8333333334, ans=0.0 2024-09-17 12:23:27,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=602074.8333333334, ans=0.0 2024-09-17 12:23:29,472 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.39 vs. limit=6.0 2024-09-17 12:23:50,521 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:23:57,769 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.176e+02 2.326e+02 2.475e+02 3.570e+02, threshold=4.653e+02, percent-clipped=0.0 2024-09-17 12:24:20,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=602159.8333333334, ans=0.07 2024-09-17 12:24:34,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=602188.1666666666, ans=0.0 2024-09-17 12:24:38,685 INFO [train.py:1198] (0/2) Epoch 34, batch 1650, loss[loss=0.2157, ctc_loss=0.1435, cr_loss=0.3609, over 20964.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1465, cr_loss=0.3694, over 4110384.79 frames. ], batch size: 49, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:24:44,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=22.5 2024-09-17 12:25:04,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=602244.8333333334, ans=0.0 2024-09-17 12:25:54,735 INFO [train.py:1198] (0/2) Epoch 34, batch 1700, loss[loss=0.215, ctc_loss=0.1413, cr_loss=0.3684, over 21054.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1471, cr_loss=0.3708, over 4109111.38 frames. ], batch size: 53, lr: 2.44e-03, grad_scale: 64.0 2024-09-17 12:25:55,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=602358.1666666666, ans=0.125 2024-09-17 12:26:05,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=602358.1666666666, ans=0.05 2024-09-17 12:26:19,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=602386.5, ans=0.0 2024-09-17 12:26:28,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=602414.8333333334, ans=0.025 2024-09-17 12:26:29,700 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.136e+02 2.268e+02 2.431e+02 3.384e+02, threshold=4.536e+02, percent-clipped=0.0 2024-09-17 12:26:32,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2024-09-17 12:26:46,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602443.1666666666, ans=0.1 2024-09-17 12:27:13,579 INFO [train.py:1198] (0/2) Epoch 34, batch 1750, loss[loss=0.2235, ctc_loss=0.1497, cr_loss=0.3692, over 20888.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3711, over 4106421.75 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 64.0 2024-09-17 12:27:47,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=602556.5, ans=0.2 2024-09-17 12:27:48,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=602556.5, ans=0.125 2024-09-17 12:28:26,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-09-17 12:28:28,931 INFO [train.py:1198] (0/2) Epoch 34, batch 1800, loss[loss=0.2473, ctc_loss=0.1691, cr_loss=0.391, over 18131.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.148, cr_loss=0.3724, over 4107464.82 frames. ], batch size: 108, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:28:38,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=602641.5, ans=10.0 2024-09-17 12:29:08,456 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.178e+02 2.311e+02 2.479e+02 3.589e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-17 12:29:40,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602754.8333333334, ans=0.1 2024-09-17 12:29:46,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=602783.1666666666, ans=0.125 2024-09-17 12:29:47,779 INFO [train.py:1198] (0/2) Epoch 34, batch 1850, loss[loss=0.2221, ctc_loss=0.147, cr_loss=0.3755, over 21088.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3721, over 4106036.70 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:29:50,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602783.1666666666, ans=0.1 2024-09-17 12:29:51,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602783.1666666666, ans=0.1 2024-09-17 12:30:01,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=602811.5, ans=0.09899494936611666 2024-09-17 12:30:24,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2024-09-17 12:30:28,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=602839.8333333334, ans=0.125 2024-09-17 12:30:30,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602839.8333333334, ans=0.1 2024-09-17 12:30:32,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602868.1666666666, ans=0.1 2024-09-17 12:30:33,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=602868.1666666666, ans=0.0 2024-09-17 12:30:48,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=602896.5, ans=0.0 2024-09-17 12:31:03,595 INFO [train.py:1198] (0/2) Epoch 34, batch 1900, loss[loss=0.2163, ctc_loss=0.1424, cr_loss=0.3693, over 20978.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1478, cr_loss=0.3725, over 4110417.88 frames. ], batch size: 55, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:31:15,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=602924.8333333334, ans=0.95 2024-09-17 12:31:40,422 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.188e+02 2.308e+02 2.457e+02 4.132e+02, threshold=4.616e+02, percent-clipped=0.0 2024-09-17 12:32:01,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=603009.8333333334, ans=0.125 2024-09-17 12:32:05,381 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2024-09-17 12:32:09,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=603038.1666666666, ans=0.0 2024-09-17 12:32:19,632 INFO [train.py:1198] (0/2) Epoch 34, batch 1950, loss[loss=0.2318, ctc_loss=0.155, cr_loss=0.3842, over 21013.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1478, cr_loss=0.3724, over 4114364.86 frames. ], batch size: 61, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:32:50,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=603123.1666666666, ans=0.0 2024-09-17 12:33:18,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2024-09-17 12:33:37,972 INFO [train.py:1198] (0/2) Epoch 34, batch 2000, loss[loss=0.2093, ctc_loss=0.1353, cr_loss=0.3695, over 20900.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.148, cr_loss=0.3727, over 4103345.53 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:33:43,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2024-09-17 12:34:14,686 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.186e+02 2.309e+02 2.461e+02 3.400e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-17 12:34:42,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=603321.5, ans=0.125 2024-09-17 12:34:57,200 INFO [train.py:1198] (0/2) Epoch 34, batch 2050, loss[loss=0.185, ctc_loss=0.1205, cr_loss=0.3221, over 20946.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1477, cr_loss=0.372, over 4105874.63 frames. ], batch size: 49, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:35:02,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=603349.8333333334, ans=0.2 2024-09-17 12:35:20,788 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=22.5 2024-09-17 12:35:26,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=603406.5, ans=0.125 2024-09-17 12:35:35,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=603406.5, ans=0.125 2024-09-17 12:35:42,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=603434.8333333334, ans=0.125 2024-09-17 12:35:56,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=603463.1666666666, ans=0.2 2024-09-17 12:36:12,648 INFO [train.py:1198] (0/2) Epoch 34, batch 2100, loss[loss=0.2085, ctc_loss=0.1363, cr_loss=0.3609, over 20961.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1482, cr_loss=0.3726, over 4097324.53 frames. ], batch size: 51, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:36:14,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=603491.5, ans=0.125 2024-09-17 12:36:17,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=603491.5, ans=0.0 2024-09-17 12:36:22,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=603491.5, ans=0.125 2024-09-17 12:36:48,797 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.178e+02 2.294e+02 2.445e+02 3.418e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-17 12:37:28,131 INFO [train.py:1198] (0/2) Epoch 34, batch 2150, loss[loss=0.2346, ctc_loss=0.1548, cr_loss=0.399, over 19348.00 frames. ], tot_loss[loss=0.2239, ctc_loss=0.149, cr_loss=0.3743, over 4098565.53 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:37:28,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=603633.1666666666, ans=0.125 2024-09-17 12:37:42,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=603661.5, ans=0.2 2024-09-17 12:38:26,252 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:38:45,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=603774.8333333334, ans=0.0 2024-09-17 12:38:47,030 INFO [train.py:1198] (0/2) Epoch 34, batch 2200, loss[loss=0.2334, ctc_loss=0.1561, cr_loss=0.3864, over 21077.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1489, cr_loss=0.374, over 4102919.08 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:38:50,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=603774.8333333334, ans=0.0 2024-09-17 12:39:05,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=603803.1666666666, ans=0.025 2024-09-17 12:39:22,896 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.165e+02 2.320e+02 2.433e+02 4.912e+02, threshold=4.641e+02, percent-clipped=1.0 2024-09-17 12:39:44,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=603859.8333333334, ans=0.0 2024-09-17 12:40:02,360 INFO [train.py:1198] (0/2) Epoch 34, batch 2250, loss[loss=0.2296, ctc_loss=0.156, cr_loss=0.368, over 20949.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3735, over 4104472.19 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:40:07,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.65 vs. limit=12.0 2024-09-17 12:41:20,844 INFO [train.py:1198] (0/2) Epoch 34, batch 2300, loss[loss=0.2353, ctc_loss=0.1538, cr_loss=0.4072, over 20896.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3735, over 4095313.40 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:41:21,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=604058.1666666666, ans=0.025 2024-09-17 12:41:57,104 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.174e+02 2.294e+02 2.464e+02 3.145e+02, threshold=4.587e+02, percent-clipped=0.0 2024-09-17 12:42:10,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=604143.1666666666, ans=0.125 2024-09-17 12:42:19,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.15 vs. limit=6.0 2024-09-17 12:42:26,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=604171.5, ans=0.05 2024-09-17 12:42:36,738 INFO [train.py:1198] (0/2) Epoch 34, batch 2350, loss[loss=0.2319, ctc_loss=0.1546, cr_loss=0.3865, over 20868.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1478, cr_loss=0.3713, over 4097474.23 frames. ], batch size: 65, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:42:44,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=604199.8333333334, ans=0.1 2024-09-17 12:43:00,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.20 vs. limit=10.0 2024-09-17 12:43:05,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=604256.5, ans=0.125 2024-09-17 12:43:24,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=604284.8333333334, ans=0.125 2024-09-17 12:43:33,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=604284.8333333334, ans=0.025 2024-09-17 12:43:53,065 INFO [train.py:1198] (0/2) Epoch 34, batch 2400, loss[loss=0.2232, ctc_loss=0.1486, cr_loss=0.3729, over 21047.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1474, cr_loss=0.3706, over 4099529.50 frames. ], batch size: 56, lr: 2.44e-03, grad_scale: 32.0 2024-09-17 12:44:19,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=604369.8333333334, ans=0.1 2024-09-17 12:44:32,771 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.181e+02 2.329e+02 2.467e+02 3.409e+02, threshold=4.659e+02, percent-clipped=0.0 2024-09-17 12:44:51,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=12.0 2024-09-17 12:45:05,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=604454.8333333334, ans=0.125 2024-09-17 12:45:12,262 INFO [train.py:1198] (0/2) Epoch 34, batch 2450, loss[loss=0.2698, ctc_loss=0.1813, cr_loss=0.4424, over 20954.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1469, cr_loss=0.3705, over 4104855.32 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:45:48,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=604539.8333333334, ans=22.5 2024-09-17 12:46:31,332 INFO [train.py:1198] (0/2) Epoch 34, batch 2500, loss[loss=0.2194, ctc_loss=0.1457, cr_loss=0.3686, over 21030.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1458, cr_loss=0.3685, over 4104797.91 frames. ], batch size: 62, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:47:07,821 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.142e+02 2.289e+02 2.463e+02 3.544e+02, threshold=4.578e+02, percent-clipped=0.0 2024-09-17 12:47:20,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=604709.8333333334, ans=0.0 2024-09-17 12:47:46,872 INFO [train.py:1198] (0/2) Epoch 34, batch 2550, loss[loss=0.192, ctc_loss=0.1251, cr_loss=0.3346, over 20975.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1461, cr_loss=0.3684, over 4100549.77 frames. ], batch size: 50, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:48:09,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=22.5 2024-09-17 12:48:30,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=604823.1666666666, ans=0.0 2024-09-17 12:48:30,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=604823.1666666666, ans=0.07 2024-09-17 12:48:36,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=604851.5, ans=0.0 2024-09-17 12:48:53,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=604879.8333333334, ans=0.125 2024-09-17 12:48:59,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=604879.8333333334, ans=0.0 2024-09-17 12:49:02,457 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:49:03,435 INFO [train.py:1198] (0/2) Epoch 34, batch 2600, loss[loss=0.1726, ctc_loss=0.1121, cr_loss=0.3026, over 20977.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1454, cr_loss=0.3674, over 4108792.68 frames. ], batch size: 48, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:49:09,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=604908.1666666666, ans=0.1 2024-09-17 12:49:11,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=604908.1666666666, ans=0.0 2024-09-17 12:49:16,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=604908.1666666666, ans=0.2 2024-09-17 12:49:25,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=604936.5, ans=0.0 2024-09-17 12:49:40,078 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.105e+02 2.240e+02 2.428e+02 3.435e+02, threshold=4.480e+02, percent-clipped=0.0 2024-09-17 12:50:05,198 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2024-09-17 12:50:22,782 INFO [train.py:1198] (0/2) Epoch 34, batch 2650, loss[loss=0.2471, ctc_loss=0.1701, cr_loss=0.3851, over 18493.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1462, cr_loss=0.3685, over 4102294.29 frames. ], batch size: 108, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:50:38,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=605078.1666666666, ans=0.0 2024-09-17 12:51:00,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=605106.5, ans=0.125 2024-09-17 12:51:32,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2024-09-17 12:51:34,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=605163.1666666666, ans=0.125 2024-09-17 12:51:38,876 INFO [train.py:1198] (0/2) Epoch 34, batch 2700, loss[loss=0.2383, ctc_loss=0.1595, cr_loss=0.394, over 21047.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1465, cr_loss=0.368, over 4093147.79 frames. ], batch size: 59, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:51:40,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=605191.5, ans=0.125 2024-09-17 12:51:57,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=605219.8333333334, ans=0.125 2024-09-17 12:52:03,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=605219.8333333334, ans=0.0 2024-09-17 12:52:08,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-17 12:52:13,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=605248.1666666666, ans=0.125 2024-09-17 12:52:18,258 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.223e+02 2.340e+02 2.514e+02 4.321e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-17 12:52:35,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2024-09-17 12:52:57,113 INFO [train.py:1198] (0/2) Epoch 34, batch 2750, loss[loss=0.2263, ctc_loss=0.1476, cr_loss=0.393, over 20978.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1476, cr_loss=0.3708, over 4095642.59 frames. ], batch size: 64, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:53:41,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=605418.1666666666, ans=0.125 2024-09-17 12:54:12,988 INFO [train.py:1198] (0/2) Epoch 34, batch 2800, loss[loss=0.1938, ctc_loss=0.1292, cr_loss=0.3227, over 21043.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1476, cr_loss=0.3708, over 4094722.65 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:54:22,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=605474.8333333334, ans=0.2 2024-09-17 12:54:33,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=605503.1666666666, ans=0.125 2024-09-17 12:54:49,556 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.122e+02 2.280e+02 2.445e+02 4.334e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 12:55:20,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-09-17 12:55:28,710 INFO [train.py:1198] (0/2) Epoch 34, batch 2850, loss[loss=0.1928, ctc_loss=0.1264, cr_loss=0.3322, over 20982.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1478, cr_loss=0.3712, over 4103253.12 frames. ], batch size: 49, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:55:29,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=605616.5, ans=0.125 2024-09-17 12:55:41,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=605616.5, ans=0.0 2024-09-17 12:55:56,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=605644.8333333334, ans=0.09899494936611666 2024-09-17 12:56:19,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605701.5, ans=0.1 2024-09-17 12:56:47,700 INFO [train.py:1198] (0/2) Epoch 34, batch 2900, loss[loss=0.2038, ctc_loss=0.1352, cr_loss=0.343, over 21019.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1474, cr_loss=0.3709, over 4099116.82 frames. ], batch size: 48, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:57:16,142 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=12.0 2024-09-17 12:57:18,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=605814.8333333334, ans=0.125 2024-09-17 12:57:21,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=605814.8333333334, ans=0.2 2024-09-17 12:57:23,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-17 12:57:24,336 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.176e+02 2.315e+02 2.536e+02 3.273e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-17 12:57:37,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=605843.1666666666, ans=0.025 2024-09-17 12:57:41,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=605843.1666666666, ans=0.025 2024-09-17 12:57:45,084 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2024-09-17 12:57:53,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=605871.5, ans=0.2 2024-09-17 12:57:59,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=605871.5, ans=0.2 2024-09-17 12:58:07,469 INFO [train.py:1198] (0/2) Epoch 34, batch 2950, loss[loss=0.2134, ctc_loss=0.1441, cr_loss=0.3465, over 20834.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1473, cr_loss=0.3708, over 4107231.04 frames. ], batch size: 59, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:58:20,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=605899.8333333334, ans=0.125 2024-09-17 12:58:30,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=605928.1666666666, ans=0.125 2024-09-17 12:58:53,698 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=12.0 2024-09-17 12:58:56,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=605984.8333333334, ans=0.025 2024-09-17 12:59:00,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.87 vs. limit=22.5 2024-09-17 12:59:23,957 INFO [train.py:1198] (0/2) Epoch 34, batch 3000, loss[loss=0.2473, ctc_loss=0.1686, cr_loss=0.3936, over 19402.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1479, cr_loss=0.3721, over 4112804.67 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 12:59:23,958 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 12:59:45,047 INFO [train.py:1230] (0/2) Epoch 34, validation: loss=0.04044, ctc_loss=0.04044, cr_loss=1.35e-14, over 944034.00 frames. 2024-09-17 12:59:45,047 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 13:00:01,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=606069.8333333334, ans=0.125 2024-09-17 13:00:09,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=606069.8333333334, ans=0.125 2024-09-17 13:00:10,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=606069.8333333334, ans=0.2 2024-09-17 13:00:21,294 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.196e+02 2.327e+02 2.458e+02 4.975e+02, threshold=4.653e+02, percent-clipped=1.0 2024-09-17 13:00:26,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=606098.1666666666, ans=0.0 2024-09-17 13:00:42,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=606126.5, ans=0.125 2024-09-17 13:01:00,627 INFO [train.py:1198] (0/2) Epoch 34, batch 3050, loss[loss=0.2027, ctc_loss=0.1327, cr_loss=0.3504, over 20958.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1478, cr_loss=0.3719, over 4114629.35 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:01:13,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=606183.1666666666, ans=0.2 2024-09-17 13:01:50,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-17 13:01:59,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=606268.1666666666, ans=0.125 2024-09-17 13:02:16,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=606296.5, ans=0.025 2024-09-17 13:02:19,238 INFO [train.py:1198] (0/2) Epoch 34, batch 3100, loss[loss=0.2194, ctc_loss=0.1433, cr_loss=0.3808, over 20974.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3719, over 4108321.42 frames. ], batch size: 64, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:02:26,401 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-17 13:02:55,827 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.167e+02 2.286e+02 2.394e+02 3.278e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-17 13:03:06,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=606409.8333333334, ans=0.0 2024-09-17 13:03:21,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=606438.1666666666, ans=0.125 2024-09-17 13:03:22,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=606438.1666666666, ans=0.125 2024-09-17 13:03:38,052 INFO [train.py:1198] (0/2) Epoch 34, batch 3150, loss[loss=0.211, ctc_loss=0.1414, cr_loss=0.3479, over 21033.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1473, cr_loss=0.3711, over 4109635.13 frames. ], batch size: 62, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:03:52,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=606494.8333333334, ans=0.5 2024-09-17 13:04:54,101 INFO [train.py:1198] (0/2) Epoch 34, batch 3200, loss[loss=0.1876, ctc_loss=0.1231, cr_loss=0.3225, over 20778.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1476, cr_loss=0.3715, over 4112471.32 frames. ], batch size: 53, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:05:21,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=606636.5, ans=0.125 2024-09-17 13:05:30,470 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.113e+02 2.249e+02 2.408e+02 5.715e+02, threshold=4.498e+02, percent-clipped=1.0 2024-09-17 13:05:40,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=606693.1666666666, ans=0.125 2024-09-17 13:05:44,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=606693.1666666666, ans=0.125 2024-09-17 13:05:47,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=606693.1666666666, ans=0.0 2024-09-17 13:05:52,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=22.5 2024-09-17 13:06:01,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=606721.5, ans=10.0 2024-09-17 13:06:09,854 INFO [train.py:1198] (0/2) Epoch 34, batch 3250, loss[loss=0.1734, ctc_loss=0.1117, cr_loss=0.3082, over 20987.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1468, cr_loss=0.3707, over 4119047.63 frames. ], batch size: 51, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:06:13,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=606749.8333333334, ans=0.125 2024-09-17 13:06:26,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606778.1666666666, ans=0.1 2024-09-17 13:06:26,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2024-09-17 13:06:33,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=606778.1666666666, ans=0.0 2024-09-17 13:06:38,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=606806.5, ans=0.125 2024-09-17 13:07:27,822 INFO [train.py:1198] (0/2) Epoch 34, batch 3300, loss[loss=0.2366, ctc_loss=0.1586, cr_loss=0.3904, over 19574.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1478, cr_loss=0.3728, over 4111153.96 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:07:40,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606891.5, ans=0.1 2024-09-17 13:07:40,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=606891.5, ans=0.125 2024-09-17 13:07:43,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=606919.8333333334, ans=0.125 2024-09-17 13:08:05,535 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.213e+02 2.360e+02 2.548e+02 3.998e+02, threshold=4.719e+02, percent-clipped=0.0 2024-09-17 13:08:22,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=606976.5, ans=0.2 2024-09-17 13:08:25,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=606976.5, ans=0.125 2024-09-17 13:08:43,882 INFO [train.py:1198] (0/2) Epoch 34, batch 3350, loss[loss=0.1943, ctc_loss=0.1268, cr_loss=0.3378, over 20963.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1482, cr_loss=0.3734, over 4108175.40 frames. ], batch size: 49, lr: 2.43e-03, grad_scale: 16.0 2024-09-17 13:09:01,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=15.0 2024-09-17 13:09:04,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=607061.5, ans=0.2 2024-09-17 13:09:13,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=607061.5, ans=0.125 2024-09-17 13:10:03,119 INFO [train.py:1198] (0/2) Epoch 34, batch 3400, loss[loss=0.2237, ctc_loss=0.1515, cr_loss=0.3607, over 20897.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1475, cr_loss=0.3714, over 4097923.35 frames. ], batch size: 57, lr: 2.43e-03, grad_scale: 16.0 2024-09-17 13:10:30,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=607203.1666666666, ans=0.05 2024-09-17 13:10:41,086 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.194e+02 2.342e+02 2.524e+02 4.486e+02, threshold=4.685e+02, percent-clipped=0.0 2024-09-17 13:11:04,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=607288.1666666666, ans=0.125 2024-09-17 13:11:17,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=607316.5, ans=0.125 2024-09-17 13:11:18,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=607316.5, ans=0.1 2024-09-17 13:11:19,185 INFO [train.py:1198] (0/2) Epoch 34, batch 3450, loss[loss=0.2417, ctc_loss=0.1592, cr_loss=0.4125, over 21030.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1465, cr_loss=0.3701, over 4107522.73 frames. ], batch size: 62, lr: 2.43e-03, grad_scale: 16.0 2024-09-17 13:11:28,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.07 vs. limit=22.5 2024-09-17 13:11:37,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=607344.8333333334, ans=0.125 2024-09-17 13:11:51,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607373.1666666666, ans=0.1 2024-09-17 13:11:51,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=12.0 2024-09-17 13:12:00,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=607373.1666666666, ans=0.2 2024-09-17 13:12:34,736 INFO [train.py:1198] (0/2) Epoch 34, batch 3500, loss[loss=0.24, ctc_loss=0.1604, cr_loss=0.3979, over 20689.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1474, cr_loss=0.3712, over 4110501.96 frames. ], batch size: 71, lr: 2.43e-03, grad_scale: 16.0 2024-09-17 13:12:44,542 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=12.0 2024-09-17 13:12:46,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.12 vs. limit=10.0 2024-09-17 13:13:07,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=607514.8333333334, ans=0.125 2024-09-17 13:13:15,989 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.189e+02 2.290e+02 2.442e+02 5.551e+02, threshold=4.579e+02, percent-clipped=1.0 2024-09-17 13:13:20,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607514.8333333334, ans=0.1 2024-09-17 13:13:34,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=607543.1666666666, ans=0.0 2024-09-17 13:13:47,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=607571.5, ans=0.125 2024-09-17 13:13:53,712 INFO [train.py:1198] (0/2) Epoch 34, batch 3550, loss[loss=0.2429, ctc_loss=0.1646, cr_loss=0.3915, over 21068.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3711, over 4117905.18 frames. ], batch size: 59, lr: 2.43e-03, grad_scale: 16.0 2024-09-17 13:14:21,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607628.1666666666, ans=0.1 2024-09-17 13:14:36,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=607656.5, ans=0.125 2024-09-17 13:14:45,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=607684.8333333334, ans=0.125 2024-09-17 13:14:58,457 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2024-09-17 13:14:59,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=607713.1666666666, ans=0.95 2024-09-17 13:15:01,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=15.0 2024-09-17 13:15:11,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=607741.5, ans=0.0 2024-09-17 13:15:12,599 INFO [train.py:1198] (0/2) Epoch 34, batch 3600, loss[loss=0.2271, ctc_loss=0.1487, cr_loss=0.392, over 20793.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1473, cr_loss=0.3708, over 4113444.74 frames. ], batch size: 53, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:15:26,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=607769.8333333334, ans=0.125 2024-09-17 13:15:50,526 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.214e+02 2.345e+02 2.506e+02 3.046e+02, threshold=4.691e+02, percent-clipped=0.0 2024-09-17 13:15:55,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=607798.1666666666, ans=0.125 2024-09-17 13:16:09,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=607826.5, ans=0.125 2024-09-17 13:16:28,399 INFO [train.py:1198] (0/2) Epoch 34, batch 3650, loss[loss=0.1735, ctc_loss=0.1138, cr_loss=0.2986, over 19832.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1482, cr_loss=0.3722, over 4111480.98 frames. ], batch size: 44, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:16:32,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=22.5 2024-09-17 13:16:34,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=607883.1666666666, ans=0.0 2024-09-17 13:17:18,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=607968.1666666666, ans=0.125 2024-09-17 13:17:20,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=607968.1666666666, ans=0.125 2024-09-17 13:17:44,283 INFO [train.py:1198] (0/2) Epoch 34, batch 3700, loss[loss=0.183, ctc_loss=0.1228, cr_loss=0.3012, over 20945.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1478, cr_loss=0.371, over 4107883.24 frames. ], batch size: 50, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:17:47,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=608024.8333333334, ans=0.0 2024-09-17 13:18:03,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.61 vs. limit=15.0 2024-09-17 13:18:24,948 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.180e+02 2.320e+02 2.477e+02 4.347e+02, threshold=4.640e+02, percent-clipped=0.0 2024-09-17 13:18:31,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=608109.8333333334, ans=0.125 2024-09-17 13:18:51,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=608138.1666666666, ans=0.2 2024-09-17 13:19:02,808 INFO [train.py:1198] (0/2) Epoch 34, batch 3750, loss[loss=0.247, ctc_loss=0.1647, cr_loss=0.4116, over 20656.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1464, cr_loss=0.3693, over 4115993.11 frames. ], batch size: 68, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:19:03,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=608166.5, ans=0.125 2024-09-17 13:19:16,944 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:19:28,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=608194.8333333334, ans=0.0 2024-09-17 13:19:53,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=608251.5, ans=0.125 2024-09-17 13:19:58,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=608251.5, ans=0.125 2024-09-17 13:20:08,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=608279.8333333334, ans=0.2 2024-09-17 13:20:18,286 INFO [train.py:1198] (0/2) Epoch 34, batch 3800, loss[loss=0.1836, ctc_loss=0.1192, cr_loss=0.3219, over 21004.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1467, cr_loss=0.3703, over 4123419.75 frames. ], batch size: 52, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:20:34,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=608308.1666666666, ans=0.0 2024-09-17 13:20:43,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=608336.5, ans=0.0 2024-09-17 13:20:59,695 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.150e+02 2.272e+02 2.403e+02 2.988e+02, threshold=4.544e+02, percent-clipped=0.0 2024-09-17 13:21:11,094 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:21:37,794 INFO [train.py:1198] (0/2) Epoch 34, batch 3850, loss[loss=0.2057, ctc_loss=0.1371, cr_loss=0.343, over 20998.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1468, cr_loss=0.3701, over 4120909.38 frames. ], batch size: 52, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:21:58,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=608478.1666666666, ans=0.2 2024-09-17 13:22:01,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2024-09-17 13:22:19,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2024-09-17 13:22:52,513 INFO [train.py:1198] (0/2) Epoch 34, batch 3900, loss[loss=0.2318, ctc_loss=0.1544, cr_loss=0.387, over 21048.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1475, cr_loss=0.3712, over 4114277.41 frames. ], batch size: 63, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:23:03,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=608591.5, ans=0.125 2024-09-17 13:23:28,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=608648.1666666666, ans=0.125 2024-09-17 13:23:28,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-17 13:23:30,558 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.185e+02 2.295e+02 2.500e+02 3.064e+02, threshold=4.591e+02, percent-clipped=0.0 2024-09-17 13:23:44,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=608676.5, ans=10.0 2024-09-17 13:24:11,289 INFO [train.py:1198] (0/2) Epoch 34, batch 3950, loss[loss=0.2419, ctc_loss=0.1622, cr_loss=0.3983, over 20304.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1478, cr_loss=0.3714, over 4108630.79 frames. ], batch size: 74, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:24:12,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-17 13:24:19,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=608733.1666666666, ans=0.0 2024-09-17 13:24:49,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-17 13:24:53,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2024-09-17 13:25:27,189 INFO [train.py:1198] (0/2) Epoch 34, batch 4000, loss[loss=0.2396, ctc_loss=0.1588, cr_loss=0.4039, over 20614.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1479, cr_loss=0.372, over 4107819.41 frames. ], batch size: 68, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:25:38,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=22.5 2024-09-17 13:26:05,415 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.156e+02 2.298e+02 2.524e+02 3.297e+02, threshold=4.596e+02, percent-clipped=0.0 2024-09-17 13:26:11,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=608931.5, ans=0.125 2024-09-17 13:26:35,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=608988.1666666666, ans=0.0 2024-09-17 13:26:43,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=22.5 2024-09-17 13:26:45,810 INFO [train.py:1198] (0/2) Epoch 34, batch 4050, loss[loss=0.2409, ctc_loss=0.16, cr_loss=0.4044, over 20964.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1472, cr_loss=0.3707, over 4103939.11 frames. ], batch size: 58, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:26:58,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=609016.5, ans=0.0 2024-09-17 13:27:19,327 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:27:58,218 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:28:00,778 INFO [train.py:1198] (0/2) Epoch 34, batch 4100, loss[loss=0.2208, ctc_loss=0.1431, cr_loss=0.3886, over 21082.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.3727, over 4098513.84 frames. ], batch size: 59, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:28:20,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-09-17 13:28:28,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=609186.5, ans=0.125 2024-09-17 13:28:38,261 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.179e+02 2.331e+02 2.481e+02 7.163e+02, threshold=4.661e+02, percent-clipped=1.0 2024-09-17 13:29:15,887 INFO [train.py:1198] (0/2) Epoch 34, batch 4150, loss[loss=0.2442, ctc_loss=0.1637, cr_loss=0.4025, over 20110.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1488, cr_loss=0.373, over 4081003.73 frames. ], batch size: 80, lr: 2.43e-03, grad_scale: 32.0 2024-09-17 13:29:36,633 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-17 13:30:01,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=609356.5, ans=0.04949747468305833 2024-09-17 13:30:08,373 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-17 13:30:17,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=609384.8333333334, ans=0.125 2024-09-17 13:30:35,016 INFO [train.py:1198] (0/2) Epoch 34, batch 4200, loss[loss=0.2393, ctc_loss=0.1606, cr_loss=0.3936, over 20876.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1488, cr_loss=0.3734, over 4074646.82 frames. ], batch size: 57, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:31:13,062 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.775e+02 2.207e+02 2.340e+02 2.535e+02 3.723e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-17 13:31:20,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=609526.5, ans=0.125 2024-09-17 13:31:36,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=609554.8333333334, ans=0.125 2024-09-17 13:31:53,655 INFO [train.py:1198] (0/2) Epoch 34, batch 4250, loss[loss=0.2735, ctc_loss=0.1933, cr_loss=0.4009, over 14601.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1493, cr_loss=0.374, over 4075866.19 frames. ], batch size: 149, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:32:41,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=609668.1666666666, ans=0.04949747468305833 2024-09-17 13:32:44,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=609668.1666666666, ans=0.0 2024-09-17 13:32:54,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=609696.5, ans=0.2 2024-09-17 13:33:09,311 INFO [train.py:1198] (0/2) Epoch 34, batch 4300, loss[loss=0.2452, ctc_loss=0.1672, cr_loss=0.39, over 19440.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1487, cr_loss=0.373, over 4093867.76 frames. ], batch size: 90, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:33:37,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2024-09-17 13:33:43,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=609781.5, ans=0.1 2024-09-17 13:33:46,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.91 vs. limit=15.0 2024-09-17 13:33:47,297 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.170e+02 2.319e+02 2.498e+02 3.591e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-17 13:33:49,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=609781.5, ans=0.125 2024-09-17 13:33:50,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=609781.5, ans=0.2 2024-09-17 13:33:58,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=609809.8333333334, ans=0.0 2024-09-17 13:34:14,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2024-09-17 13:34:25,508 INFO [train.py:1198] (0/2) Epoch 34, batch 4350, loss[loss=0.2014, ctc_loss=0.1309, cr_loss=0.3529, over 20959.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1485, cr_loss=0.3728, over 4085947.45 frames. ], batch size: 49, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:34:45,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=609894.8333333334, ans=0.125 2024-09-17 13:34:49,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=609894.8333333334, ans=0.125 2024-09-17 13:34:56,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=609923.1666666666, ans=0.125 2024-09-17 13:35:30,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=609979.8333333334, ans=0.0 2024-09-17 13:35:42,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=609979.8333333334, ans=0.2 2024-09-17 13:35:44,793 INFO [train.py:1198] (0/2) Epoch 34, batch 4400, loss[loss=0.2327, ctc_loss=0.155, cr_loss=0.3887, over 20818.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1479, cr_loss=0.3722, over 4093086.28 frames. ], batch size: 59, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:35:59,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=610036.5, ans=0.2 2024-09-17 13:36:08,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610036.5, ans=0.1 2024-09-17 13:36:22,960 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.182e+02 2.345e+02 2.533e+02 5.369e+02, threshold=4.690e+02, percent-clipped=1.0 2024-09-17 13:36:32,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=610093.1666666666, ans=0.2 2024-09-17 13:36:37,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=610093.1666666666, ans=0.0 2024-09-17 13:36:52,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=610121.5, ans=0.125 2024-09-17 13:37:00,569 INFO [train.py:1198] (0/2) Epoch 34, batch 4450, loss[loss=0.2462, ctc_loss=0.1689, cr_loss=0.3864, over 19319.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3719, over 4099238.55 frames. ], batch size: 90, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:37:35,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=610206.5, ans=0.125 2024-09-17 13:38:14,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=610263.1666666666, ans=0.125 2024-09-17 13:38:16,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=610263.1666666666, ans=0.125 2024-09-17 13:38:19,086 INFO [train.py:1198] (0/2) Epoch 34, batch 4500, loss[loss=0.2425, ctc_loss=0.1629, cr_loss=0.3977, over 21077.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1479, cr_loss=0.3724, over 4111400.88 frames. ], batch size: 59, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:38:34,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=610319.8333333334, ans=0.125 2024-09-17 13:38:44,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=610319.8333333334, ans=0.05 2024-09-17 13:38:57,296 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.222e+02 2.306e+02 2.536e+02 4.017e+02, threshold=4.612e+02, percent-clipped=0.0 2024-09-17 13:39:34,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=610433.1666666666, ans=0.0 2024-09-17 13:39:35,275 INFO [train.py:1198] (0/2) Epoch 34, batch 4550, loss[loss=0.2556, ctc_loss=0.172, cr_loss=0.4178, over 20026.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3711, over 4105488.05 frames. ], batch size: 80, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:39:35,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=610433.1666666666, ans=0.125 2024-09-17 13:39:38,786 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:40:07,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=610489.8333333334, ans=0.1 2024-09-17 13:40:43,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=610546.5, ans=0.0 2024-09-17 13:40:50,477 INFO [train.py:1198] (0/2) Epoch 34, batch 4600, loss[loss=0.2179, ctc_loss=0.1448, cr_loss=0.3655, over 20930.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3713, over 4098590.32 frames. ], batch size: 60, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:41:24,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=610631.5, ans=0.0 2024-09-17 13:41:31,860 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.138e+02 2.254e+02 2.371e+02 3.030e+02, threshold=4.507e+02, percent-clipped=0.0 2024-09-17 13:41:53,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=610688.1666666666, ans=0.125 2024-09-17 13:42:10,412 INFO [train.py:1198] (0/2) Epoch 34, batch 4650, loss[loss=0.2196, ctc_loss=0.1459, cr_loss=0.3689, over 20652.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1467, cr_loss=0.3705, over 4106668.66 frames. ], batch size: 66, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:42:29,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-09-17 13:43:08,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=610801.5, ans=0.125 2024-09-17 13:43:17,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-09-17 13:43:22,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-17 13:43:28,840 INFO [train.py:1198] (0/2) Epoch 34, batch 4700, loss[loss=0.2189, ctc_loss=0.1427, cr_loss=0.3811, over 20927.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1468, cr_loss=0.3705, over 4100650.60 frames. ], batch size: 60, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:44:06,442 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.173e+02 2.272e+02 2.414e+02 3.059e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-17 13:44:18,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=610943.1666666666, ans=0.125 2024-09-17 13:44:21,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=610943.1666666666, ans=0.125 2024-09-17 13:44:26,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=610943.1666666666, ans=0.125 2024-09-17 13:44:32,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=610971.5, ans=0.0 2024-09-17 13:44:32,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=610971.5, ans=0.2 2024-09-17 13:44:33,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=610971.5, ans=0.0 2024-09-17 13:44:38,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=610971.5, ans=0.0 2024-09-17 13:44:39,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=610971.5, ans=0.125 2024-09-17 13:44:43,838 INFO [train.py:1198] (0/2) Epoch 34, batch 4750, loss[loss=0.2528, ctc_loss=0.1712, cr_loss=0.4078, over 20849.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1473, cr_loss=0.3712, over 4100488.15 frames. ], batch size: 65, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:45:34,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=611084.8333333334, ans=0.125 2024-09-17 13:45:51,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.55 vs. limit=10.0 2024-09-17 13:45:53,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=611113.1666666666, ans=0.1 2024-09-17 13:45:59,809 INFO [train.py:1198] (0/2) Epoch 34, batch 4800, loss[loss=0.2288, ctc_loss=0.1527, cr_loss=0.3805, over 20070.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1477, cr_loss=0.3715, over 4100836.19 frames. ], batch size: 80, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:46:27,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=611169.8333333334, ans=0.035 2024-09-17 13:46:34,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=611198.1666666666, ans=0.125 2024-09-17 13:46:37,621 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.186e+02 2.296e+02 2.536e+02 6.072e+02, threshold=4.591e+02, percent-clipped=1.0 2024-09-17 13:47:00,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=611254.8333333334, ans=0.125 2024-09-17 13:47:11,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=611254.8333333334, ans=0.125 2024-09-17 13:47:18,191 INFO [train.py:1198] (0/2) Epoch 34, batch 4850, loss[loss=0.2533, ctc_loss=0.1686, cr_loss=0.4236, over 18118.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1481, cr_loss=0.3724, over 4093989.72 frames. ], batch size: 108, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:47:36,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=611311.5, ans=0.125 2024-09-17 13:48:33,879 INFO [train.py:1198] (0/2) Epoch 34, batch 4900, loss[loss=0.1824, ctc_loss=0.1186, cr_loss=0.3192, over 20922.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1483, cr_loss=0.3722, over 4073519.49 frames. ], batch size: 49, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:48:34,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=611424.8333333334, ans=0.2 2024-09-17 13:49:01,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=611453.1666666666, ans=0.125 2024-09-17 13:49:10,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=611481.5, ans=0.2 2024-09-17 13:49:11,597 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.145e+02 2.313e+02 2.528e+02 4.796e+02, threshold=4.626e+02, percent-clipped=1.0 2024-09-17 13:49:36,042 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2024-09-17 13:49:50,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=611566.5, ans=0.125 2024-09-17 13:49:50,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=611566.5, ans=0.2 2024-09-17 13:49:51,935 INFO [train.py:1198] (0/2) Epoch 34, batch 4950, loss[loss=0.2063, ctc_loss=0.1347, cr_loss=0.3578, over 20932.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1483, cr_loss=0.3731, over 4093265.99 frames. ], batch size: 49, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:49:53,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=611566.5, ans=0.1 2024-09-17 13:49:55,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=611566.5, ans=0.125 2024-09-17 13:50:01,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=611566.5, ans=0.0 2024-09-17 13:50:41,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=611651.5, ans=0.0 2024-09-17 13:50:43,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-09-17 13:50:46,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=611651.5, ans=0.125 2024-09-17 13:51:00,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=611679.8333333334, ans=0.125 2024-09-17 13:51:06,969 INFO [train.py:1198] (0/2) Epoch 34, batch 5000, loss[loss=0.2728, ctc_loss=0.1822, cr_loss=0.453, over 18556.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1485, cr_loss=0.3735, over 4099704.33 frames. ], batch size: 108, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:51:39,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=611764.8333333334, ans=0.125 2024-09-17 13:51:43,488 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.204e+02 2.316e+02 2.456e+02 4.302e+02, threshold=4.633e+02, percent-clipped=0.0 2024-09-17 13:51:43,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=611764.8333333334, ans=0.125 2024-09-17 13:52:06,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=611821.5, ans=0.0 2024-09-17 13:52:20,766 INFO [train.py:1198] (0/2) Epoch 34, batch 5050, loss[loss=0.2671, ctc_loss=0.1852, cr_loss=0.4096, over 18457.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1486, cr_loss=0.3736, over 4107387.21 frames. ], batch size: 108, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:52:31,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-09-17 13:53:29,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=611963.1666666666, ans=0.0 2024-09-17 13:53:32,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=611963.1666666666, ans=0.125 2024-09-17 13:53:34,838 INFO [train.py:1198] (0/2) Epoch 34, batch 5100, loss[loss=0.217, ctc_loss=0.1437, cr_loss=0.3668, over 21049.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1494, cr_loss=0.3747, over 4097696.38 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:53:35,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611991.5, ans=0.1 2024-09-17 13:53:38,012 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-216000.pt 2024-09-17 13:54:13,203 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.198e+02 2.370e+02 2.602e+02 4.625e+02, threshold=4.741e+02, percent-clipped=0.0 2024-09-17 13:54:17,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=612048.1666666666, ans=0.0 2024-09-17 13:54:20,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=612076.5, ans=0.0 2024-09-17 13:54:50,297 INFO [train.py:1198] (0/2) Epoch 34, batch 5150, loss[loss=0.2705, ctc_loss=0.1895, cr_loss=0.405, over 18359.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1491, cr_loss=0.3736, over 4097230.23 frames. ], batch size: 109, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:55:14,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.53 vs. limit=5.0 2024-09-17 13:55:20,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=612189.8333333334, ans=0.125 2024-09-17 13:55:23,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=612189.8333333334, ans=0.125 2024-09-17 13:55:54,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=612246.5, ans=0.125 2024-09-17 13:56:07,613 INFO [train.py:1198] (0/2) Epoch 34, batch 5200, loss[loss=0.2283, ctc_loss=0.1508, cr_loss=0.3874, over 20698.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1483, cr_loss=0.3725, over 4100494.15 frames. ], batch size: 68, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:56:24,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=612303.1666666666, ans=0.125 2024-09-17 13:56:27,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=612303.1666666666, ans=0.1 2024-09-17 13:56:30,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=612303.1666666666, ans=0.125 2024-09-17 13:56:45,568 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.166e+02 2.330e+02 2.489e+02 8.863e+02, threshold=4.659e+02, percent-clipped=1.0 2024-09-17 13:57:16,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=612388.1666666666, ans=0.2 2024-09-17 13:57:22,370 INFO [train.py:1198] (0/2) Epoch 34, batch 5250, loss[loss=0.2407, ctc_loss=0.164, cr_loss=0.3834, over 20668.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1485, cr_loss=0.3731, over 4094951.24 frames. ], batch size: 66, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:57:30,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-09-17 13:58:19,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-09-17 13:58:32,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=612529.8333333334, ans=0.125 2024-09-17 13:58:36,687 INFO [train.py:1198] (0/2) Epoch 34, batch 5300, loss[loss=0.2315, ctc_loss=0.1533, cr_loss=0.391, over 20863.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1488, cr_loss=0.373, over 4076406.56 frames. ], batch size: 57, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 13:58:56,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=612586.5, ans=10.0 2024-09-17 13:58:56,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=612586.5, ans=15.0 2024-09-17 13:58:58,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=612586.5, ans=0.125 2024-09-17 13:59:02,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2024-09-17 13:59:08,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-17 13:59:16,372 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.156e+02 2.275e+02 2.449e+02 5.228e+02, threshold=4.550e+02, percent-clipped=1.0 2024-09-17 13:59:22,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=612643.1666666666, ans=0.0 2024-09-17 13:59:39,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=612671.5, ans=0.125 2024-09-17 13:59:44,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=612671.5, ans=0.1 2024-09-17 13:59:48,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=612671.5, ans=0.125 2024-09-17 13:59:53,790 INFO [train.py:1198] (0/2) Epoch 34, batch 5350, loss[loss=0.2225, ctc_loss=0.1472, cr_loss=0.3766, over 20935.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1486, cr_loss=0.3727, over 4085994.30 frames. ], batch size: 60, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:00:19,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=612728.1666666666, ans=0.025 2024-09-17 14:00:37,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=612784.8333333334, ans=0.2 2024-09-17 14:00:39,935 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2024-09-17 14:00:45,472 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2024-09-17 14:01:08,329 INFO [train.py:1198] (0/2) Epoch 34, batch 5400, loss[loss=0.2338, ctc_loss=0.1558, cr_loss=0.3902, over 19916.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.3727, over 4069318.32 frames. ], batch size: 80, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:01:13,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=612841.5, ans=0.125 2024-09-17 14:01:15,260 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-17 14:01:32,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=612869.8333333334, ans=0.025 2024-09-17 14:01:47,609 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.167e+02 2.303e+02 2.469e+02 5.351e+02, threshold=4.605e+02, percent-clipped=1.0 2024-09-17 14:02:13,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=612954.8333333334, ans=0.0 2024-09-17 14:02:20,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=612954.8333333334, ans=0.125 2024-09-17 14:02:23,183 INFO [train.py:1198] (0/2) Epoch 34, batch 5450, loss[loss=0.252, ctc_loss=0.1698, cr_loss=0.4107, over 20962.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1479, cr_loss=0.3724, over 4083094.40 frames. ], batch size: 64, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:03:12,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=613068.1666666666, ans=0.1 2024-09-17 14:03:37,786 INFO [train.py:1198] (0/2) Epoch 34, batch 5500, loss[loss=0.2484, ctc_loss=0.1636, cr_loss=0.4241, over 20957.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1488, cr_loss=0.3742, over 4082937.68 frames. ], batch size: 64, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:04:09,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=613181.5, ans=0.2 2024-09-17 14:04:16,823 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.199e+02 2.307e+02 2.476e+02 3.144e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-17 14:04:27,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=22.5 2024-09-17 14:04:36,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-09-17 14:04:49,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=613238.1666666666, ans=0.2 2024-09-17 14:04:55,190 INFO [train.py:1198] (0/2) Epoch 34, batch 5550, loss[loss=0.2246, ctc_loss=0.1485, cr_loss=0.3807, over 20972.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1494, cr_loss=0.3744, over 4097147.91 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 16.0 2024-09-17 14:05:08,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=613294.8333333334, ans=0.1 2024-09-17 14:05:28,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-09-17 14:06:09,900 INFO [train.py:1198] (0/2) Epoch 34, batch 5600, loss[loss=0.2136, ctc_loss=0.1388, cr_loss=0.3741, over 20932.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1487, cr_loss=0.3733, over 4089348.95 frames. ], batch size: 49, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:06:19,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=613408.1666666666, ans=0.125 2024-09-17 14:06:29,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=613436.5, ans=0.035 2024-09-17 14:06:43,119 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:06:46,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-09-17 14:06:50,083 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.188e+02 2.330e+02 2.477e+02 4.047e+02, threshold=4.661e+02, percent-clipped=0.0 2024-09-17 14:07:23,950 INFO [train.py:1198] (0/2) Epoch 34, batch 5650, loss[loss=0.2223, ctc_loss=0.1498, cr_loss=0.3625, over 20630.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3735, over 4070344.55 frames. ], batch size: 68, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:07:33,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-17 14:07:42,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=613578.1666666666, ans=0.125 2024-09-17 14:07:59,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=613606.5, ans=0.125 2024-09-17 14:08:09,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=613634.8333333334, ans=0.1 2024-09-17 14:08:39,065 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2024-09-17 14:08:41,334 INFO [train.py:1198] (0/2) Epoch 34, batch 5700, loss[loss=0.2051, ctc_loss=0.1337, cr_loss=0.3569, over 20973.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1481, cr_loss=0.3721, over 4074661.03 frames. ], batch size: 58, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:08:41,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=613691.5, ans=0.0 2024-09-17 14:09:21,038 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.163e+02 2.283e+02 2.479e+02 3.080e+02, threshold=4.566e+02, percent-clipped=0.0 2024-09-17 14:09:41,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=613804.8333333334, ans=0.0 2024-09-17 14:09:45,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=613804.8333333334, ans=0.0 2024-09-17 14:09:46,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=613804.8333333334, ans=0.1 2024-09-17 14:09:50,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=613804.8333333334, ans=0.04949747468305833 2024-09-17 14:09:55,692 INFO [train.py:1198] (0/2) Epoch 34, batch 5750, loss[loss=0.241, ctc_loss=0.1592, cr_loss=0.4093, over 20643.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1485, cr_loss=0.373, over 4069886.78 frames. ], batch size: 68, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:10:00,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=613833.1666666666, ans=0.1 2024-09-17 14:10:10,856 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:10:19,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=613861.5, ans=0.05 2024-09-17 14:10:46,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=613918.1666666666, ans=0.125 2024-09-17 14:11:10,043 INFO [train.py:1198] (0/2) Epoch 34, batch 5800, loss[loss=0.2441, ctc_loss=0.1647, cr_loss=0.397, over 20970.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1483, cr_loss=0.3721, over 4075309.28 frames. ], batch size: 64, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:11:37,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=614003.1666666666, ans=0.0 2024-09-17 14:11:41,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=614031.5, ans=0.125 2024-09-17 14:11:45,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=614031.5, ans=0.0 2024-09-17 14:11:50,078 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.219e+02 2.320e+02 2.534e+02 4.392e+02, threshold=4.641e+02, percent-clipped=0.0 2024-09-17 14:12:24,580 INFO [train.py:1198] (0/2) Epoch 34, batch 5850, loss[loss=0.2263, ctc_loss=0.1526, cr_loss=0.3687, over 21021.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1489, cr_loss=0.3727, over 4078891.99 frames. ], batch size: 62, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:12:31,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614116.5, ans=0.1 2024-09-17 14:12:32,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=614116.5, ans=0.0 2024-09-17 14:12:34,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=614116.5, ans=0.0 2024-09-17 14:12:46,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=614144.8333333334, ans=0.025 2024-09-17 14:13:23,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=614201.5, ans=10.0 2024-09-17 14:13:28,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=614229.8333333334, ans=0.125 2024-09-17 14:13:31,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2024-09-17 14:13:32,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=614229.8333333334, ans=0.05 2024-09-17 14:13:41,568 INFO [train.py:1198] (0/2) Epoch 34, batch 5900, loss[loss=0.204, ctc_loss=0.1334, cr_loss=0.3531, over 20943.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1496, cr_loss=0.3739, over 4069452.08 frames. ], batch size: 49, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:13:49,348 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:14:08,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=614286.5, ans=0.0 2024-09-17 14:14:19,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=614314.8333333334, ans=0.2 2024-09-17 14:14:21,636 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.170e+02 2.303e+02 2.431e+02 5.111e+02, threshold=4.605e+02, percent-clipped=1.0 2024-09-17 14:14:50,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=614371.5, ans=0.125 2024-09-17 14:14:55,685 INFO [train.py:1198] (0/2) Epoch 34, batch 5950, loss[loss=0.2782, ctc_loss=0.2, cr_loss=0.3909, over 13934.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1494, cr_loss=0.3736, over 4063964.13 frames. ], batch size: 150, lr: 2.42e-03, grad_scale: 32.0 2024-09-17 14:15:05,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614399.8333333334, ans=0.1 2024-09-17 14:15:08,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=614399.8333333334, ans=0.125 2024-09-17 14:15:34,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=614456.5, ans=0.125 2024-09-17 14:15:34,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=614456.5, ans=0.125 2024-09-17 14:15:49,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=614484.8333333334, ans=0.0 2024-09-17 14:16:04,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.24 vs. limit=15.0 2024-09-17 14:16:05,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=614513.1666666666, ans=0.2 2024-09-17 14:16:11,993 INFO [train.py:1198] (0/2) Epoch 34, batch 6000, loss[loss=0.251, ctc_loss=0.1696, cr_loss=0.4072, over 20667.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1495, cr_loss=0.3739, over 4064311.37 frames. ], batch size: 68, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:16:11,993 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 14:16:33,499 INFO [train.py:1230] (0/2) Epoch 34, validation: loss=0.04046, ctc_loss=0.04046, cr_loss=1.343e-14, over 944034.00 frames. 2024-09-17 14:16:33,500 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 14:17:09,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=614598.1666666666, ans=0.0 2024-09-17 14:17:11,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=614598.1666666666, ans=0.125 2024-09-17 14:17:11,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2024-09-17 14:17:14,111 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.004e+02 2.225e+02 2.353e+02 2.545e+02 3.255e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-17 14:17:26,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=614626.5, ans=0.0 2024-09-17 14:17:36,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=614654.8333333334, ans=0.5 2024-09-17 14:17:48,388 INFO [train.py:1198] (0/2) Epoch 34, batch 6050, loss[loss=0.2281, ctc_loss=0.1504, cr_loss=0.3883, over 21051.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1494, cr_loss=0.3741, over 4078302.46 frames. ], batch size: 56, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:18:20,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=12.0 2024-09-17 14:18:28,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=614739.8333333334, ans=0.125 2024-09-17 14:18:41,132 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2024-09-17 14:19:04,256 INFO [train.py:1198] (0/2) Epoch 34, batch 6100, loss[loss=0.2338, ctc_loss=0.154, cr_loss=0.3988, over 20704.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1489, cr_loss=0.373, over 4079519.10 frames. ], batch size: 71, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:19:23,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=614853.1666666666, ans=0.0 2024-09-17 14:19:44,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.172e+02 2.312e+02 2.473e+02 3.608e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-17 14:19:56,837 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2024-09-17 14:20:08,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=614938.1666666666, ans=0.025 2024-09-17 14:20:08,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=614938.1666666666, ans=0.125 2024-09-17 14:20:18,549 INFO [train.py:1198] (0/2) Epoch 34, batch 6150, loss[loss=0.1969, ctc_loss=0.1302, cr_loss=0.3337, over 21005.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1491, cr_loss=0.373, over 4083279.29 frames. ], batch size: 52, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:20:24,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=614966.5, ans=0.2 2024-09-17 14:20:28,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-17 14:20:33,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=614994.8333333334, ans=0.0 2024-09-17 14:20:43,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=614994.8333333334, ans=0.125 2024-09-17 14:20:54,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615023.1666666666, ans=0.1 2024-09-17 14:21:05,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=615051.5, ans=0.125 2024-09-17 14:21:24,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=615079.8333333334, ans=0.2 2024-09-17 14:21:33,264 INFO [train.py:1198] (0/2) Epoch 34, batch 6200, loss[loss=0.2373, ctc_loss=0.1607, cr_loss=0.3834, over 21035.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.15, cr_loss=0.3742, over 4061046.38 frames. ], batch size: 63, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:21:37,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=615108.1666666666, ans=0.125 2024-09-17 14:21:41,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.49 vs. limit=6.0 2024-09-17 14:22:13,211 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.184e+02 2.290e+02 2.420e+02 3.320e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-17 14:22:35,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=615221.5, ans=0.0 2024-09-17 14:22:47,219 INFO [train.py:1198] (0/2) Epoch 34, batch 6250, loss[loss=0.1991, ctc_loss=0.1295, cr_loss=0.3481, over 19885.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1507, cr_loss=0.3756, over 4045117.00 frames. ], batch size: 44, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:22:54,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=615249.8333333334, ans=0.125 2024-09-17 14:23:08,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=615278.1666666666, ans=0.1 2024-09-17 14:23:34,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=615334.8333333334, ans=10.0 2024-09-17 14:24:01,151 INFO [train.py:1198] (0/2) Epoch 34, batch 6300, loss[loss=0.2395, ctc_loss=0.1614, cr_loss=0.3905, over 20746.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1519, cr_loss=0.3765, over 4004949.54 frames. ], batch size: 71, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:24:02,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=615391.5, ans=0.125 2024-09-17 14:24:02,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=615391.5, ans=0.125 2024-09-17 14:24:20,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615419.8333333334, ans=0.1 2024-09-17 14:24:29,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=615448.1666666666, ans=0.1 2024-09-17 14:24:39,829 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.278e+02 2.453e+02 2.629e+02 4.068e+02, threshold=4.906e+02, percent-clipped=0.0 2024-09-17 14:24:44,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=615476.5, ans=6.0 2024-09-17 14:24:47,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=615476.5, ans=0.025 2024-09-17 14:24:51,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615476.5, ans=0.1 2024-09-17 14:24:53,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=615476.5, ans=0.125 2024-09-17 14:25:06,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=615504.8333333334, ans=0.125 2024-09-17 14:25:11,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=615533.1666666666, ans=0.125 2024-09-17 14:25:12,461 INFO [train.py:1198] (0/2) Epoch 34, batch 6350, loss[loss=0.2792, ctc_loss=0.2021, cr_loss=0.3854, over 13805.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1573, cr_loss=0.3813, over 3840071.89 frames. ], batch size: 149, lr: 2.41e-03, grad_scale: 32.0 2024-09-17 14:25:32,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615561.5, ans=0.1 2024-09-17 14:25:43,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=615589.8333333334, ans=0.2 2024-09-17 14:26:02,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615618.1666666666, ans=0.1 2024-09-17 14:26:10,883 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-34.pt 2024-09-17 14:27:02,488 INFO [train.py:1198] (0/2) Epoch 35, batch 0, loss[loss=0.2264, ctc_loss=0.1484, cr_loss=0.3897, over 20977.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1484, cr_loss=0.3897, over 20977.00 frames. ], batch size: 58, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:27:02,489 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 14:27:20,886 INFO [train.py:1230] (0/2) Epoch 35, validation: loss=0.04033, ctc_loss=0.04033, cr_loss=1.343e-14, over 944034.00 frames. 2024-09-17 14:27:20,886 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 14:27:24,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2024-09-17 14:28:11,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=615734.3333333334, ans=0.125 2024-09-17 14:28:15,414 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.346e+02 2.639e+02 2.882e+02 4.248e+02, threshold=5.277e+02, percent-clipped=0.0 2024-09-17 14:28:26,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2024-09-17 14:28:36,651 INFO [train.py:1198] (0/2) Epoch 35, batch 50, loss[loss=0.2283, ctc_loss=0.1515, cr_loss=0.3836, over 20291.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1474, cr_loss=0.3708, over 927615.65 frames. ], batch size: 74, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:28:36,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=615791.0, ans=0.025 2024-09-17 14:28:56,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=615819.3333333334, ans=0.125 2024-09-17 14:29:05,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=615847.6666666666, ans=0.0 2024-09-17 14:29:09,820 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-17 14:29:52,581 INFO [train.py:1198] (0/2) Epoch 35, batch 100, loss[loss=0.2263, ctc_loss=0.1489, cr_loss=0.3869, over 20778.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3724, over 1639685.66 frames. ], batch size: 53, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:29:54,611 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:30:08,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=615961.0, ans=0.0 2024-09-17 14:30:08,723 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-09-17 14:30:09,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=615961.0, ans=0.125 2024-09-17 14:30:42,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=616017.6666666666, ans=0.0 2024-09-17 14:30:46,825 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.125e+02 2.216e+02 2.377e+02 3.062e+02, threshold=4.432e+02, percent-clipped=0.0 2024-09-17 14:31:07,905 INFO [train.py:1198] (0/2) Epoch 35, batch 150, loss[loss=0.2048, ctc_loss=0.1354, cr_loss=0.3472, over 21054.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1468, cr_loss=0.3708, over 2189138.00 frames. ], batch size: 56, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:31:18,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=616074.3333333334, ans=0.025 2024-09-17 14:31:31,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616102.6666666666, ans=0.1 2024-09-17 14:31:41,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=616131.0, ans=0.125 2024-09-17 14:32:26,983 INFO [train.py:1198] (0/2) Epoch 35, batch 200, loss[loss=0.2146, ctc_loss=0.143, cr_loss=0.3581, over 20876.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3726, over 2617086.70 frames. ], batch size: 57, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:32:31,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616216.0, ans=0.1 2024-09-17 14:32:39,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=616216.0, ans=0.125 2024-09-17 14:33:17,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.25 vs. limit=15.0 2024-09-17 14:33:26,687 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.197e+02 2.340e+02 2.528e+02 4.891e+02, threshold=4.681e+02, percent-clipped=1.0 2024-09-17 14:33:41,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=616329.3333333334, ans=0.125 2024-09-17 14:33:46,222 INFO [train.py:1198] (0/2) Epoch 35, batch 250, loss[loss=0.2115, ctc_loss=0.1388, cr_loss=0.3633, over 20863.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3725, over 2945497.59 frames. ], batch size: 57, lr: 2.38e-03, grad_scale: 16.0 2024-09-17 14:33:49,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=616357.6666666666, ans=0.1 2024-09-17 14:33:56,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=616357.6666666666, ans=0.125 2024-09-17 14:33:59,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=616386.0, ans=0.1 2024-09-17 14:34:19,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616414.3333333334, ans=0.1 2024-09-17 14:34:32,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=616442.6666666666, ans=0.125 2024-09-17 14:35:01,897 INFO [train.py:1198] (0/2) Epoch 35, batch 300, loss[loss=0.245, ctc_loss=0.1676, cr_loss=0.3868, over 21009.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.148, cr_loss=0.3729, over 3191555.26 frames. ], batch size: 63, lr: 2.38e-03, grad_scale: 16.0 2024-09-17 14:35:57,693 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.179e+02 2.311e+02 2.475e+02 3.658e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-17 14:36:05,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=616612.6666666666, ans=0.1 2024-09-17 14:36:17,700 INFO [train.py:1198] (0/2) Epoch 35, batch 350, loss[loss=0.1925, ctc_loss=0.1266, cr_loss=0.3294, over 20937.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.372, over 3398936.88 frames. ], batch size: 49, lr: 2.38e-03, grad_scale: 16.0 2024-09-17 14:36:21,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616641.0, ans=0.1 2024-09-17 14:36:27,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=616641.0, ans=0.125 2024-09-17 14:36:28,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=616641.0, ans=0.2 2024-09-17 14:36:38,093 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:37:33,484 INFO [train.py:1198] (0/2) Epoch 35, batch 400, loss[loss=0.2293, ctc_loss=0.1554, cr_loss=0.3694, over 21014.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3716, over 3550553.87 frames. ], batch size: 61, lr: 2.38e-03, grad_scale: 32.0 2024-09-17 14:37:46,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=616782.6666666666, ans=0.125 2024-09-17 14:37:57,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2024-09-17 14:37:59,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=616811.0, ans=0.125 2024-09-17 14:38:04,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=616811.0, ans=0.2 2024-09-17 14:38:24,382 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-17 14:38:32,498 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.223e+02 2.348e+02 2.513e+02 3.156e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-17 14:38:42,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=616896.0, ans=0.1 2024-09-17 14:38:55,739 INFO [train.py:1198] (0/2) Epoch 35, batch 450, loss[loss=0.2654, ctc_loss=0.1814, cr_loss=0.4197, over 14755.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3722, over 3660108.01 frames. ], batch size: 149, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:38:57,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=616924.3333333334, ans=0.0 2024-09-17 14:39:37,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=616981.0, ans=0.125 2024-09-17 14:40:11,644 INFO [train.py:1198] (0/2) Epoch 35, batch 500, loss[loss=0.238, ctc_loss=0.1594, cr_loss=0.3926, over 19587.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.148, cr_loss=0.3726, over 3759296.33 frames. ], batch size: 90, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:40:18,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=617066.0, ans=0.125 2024-09-17 14:41:07,709 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.163e+02 2.284e+02 2.476e+02 3.109e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-17 14:41:27,197 INFO [train.py:1198] (0/2) Epoch 35, batch 550, loss[loss=0.2197, ctc_loss=0.1419, cr_loss=0.3891, over 20891.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3712, over 3842733.00 frames. ], batch size: 54, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:41:34,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=617207.6666666666, ans=0.04949747468305833 2024-09-17 14:41:39,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617207.6666666666, ans=0.1 2024-09-17 14:41:50,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=617236.0, ans=0.1 2024-09-17 14:42:19,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-17 14:42:32,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=617321.0, ans=0.2 2024-09-17 14:42:35,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=617321.0, ans=0.125 2024-09-17 14:42:41,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=617349.3333333334, ans=0.1 2024-09-17 14:42:42,707 INFO [train.py:1198] (0/2) Epoch 35, batch 600, loss[loss=0.1787, ctc_loss=0.1158, cr_loss=0.3145, over 20973.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1471, cr_loss=0.3713, over 3907654.56 frames. ], batch size: 52, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:42:44,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=617349.3333333334, ans=0.125 2024-09-17 14:43:08,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=617377.6666666666, ans=0.025 2024-09-17 14:43:41,984 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.137e+02 2.286e+02 2.474e+02 5.128e+02, threshold=4.572e+02, percent-clipped=1.0 2024-09-17 14:43:47,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-17 14:43:58,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=617462.6666666666, ans=0.2 2024-09-17 14:43:58,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=617462.6666666666, ans=0.125 2024-09-17 14:44:01,865 INFO [train.py:1198] (0/2) Epoch 35, batch 650, loss[loss=0.2439, ctc_loss=0.1656, cr_loss=0.3917, over 19992.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1481, cr_loss=0.3728, over 3947766.28 frames. ], batch size: 80, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:44:09,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617491.0, ans=0.1 2024-09-17 14:44:10,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617491.0, ans=0.1 2024-09-17 14:44:17,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=617519.3333333334, ans=10.0 2024-09-17 14:45:07,569 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2024-09-17 14:45:08,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=617604.3333333334, ans=0.2 2024-09-17 14:45:20,652 INFO [train.py:1198] (0/2) Epoch 35, batch 700, loss[loss=0.2338, ctc_loss=0.1542, cr_loss=0.398, over 20864.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1484, cr_loss=0.3736, over 3978094.09 frames. ], batch size: 57, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:45:20,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=617632.6666666666, ans=0.09899494936611666 2024-09-17 14:45:23,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=617632.6666666666, ans=0.125 2024-09-17 14:45:25,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=617632.6666666666, ans=0.0 2024-09-17 14:45:28,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=617632.6666666666, ans=0.125 2024-09-17 14:46:00,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=617689.3333333334, ans=0.05 2024-09-17 14:46:16,845 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.142e+02 2.223e+02 2.439e+02 3.650e+02, threshold=4.447e+02, percent-clipped=0.0 2024-09-17 14:46:24,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=617746.0, ans=0.0 2024-09-17 14:46:36,678 INFO [train.py:1198] (0/2) Epoch 35, batch 750, loss[loss=0.1895, ctc_loss=0.123, cr_loss=0.333, over 20990.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3732, over 4007821.96 frames. ], batch size: 50, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:46:39,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=617774.3333333334, ans=0.125 2024-09-17 14:47:15,242 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:47:17,065 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.89 vs. limit=10.0 2024-09-17 14:47:27,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=617859.3333333334, ans=0.09899494936611666 2024-09-17 14:47:33,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=617859.3333333334, ans=0.2 2024-09-17 14:47:34,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=617859.3333333334, ans=10.0 2024-09-17 14:47:34,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=617859.3333333334, ans=0.0 2024-09-17 14:47:52,445 INFO [train.py:1198] (0/2) Epoch 35, batch 800, loss[loss=0.2375, ctc_loss=0.1585, cr_loss=0.3949, over 20133.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3726, over 4025582.69 frames. ], batch size: 80, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:48:10,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=617944.3333333334, ans=0.0 2024-09-17 14:48:22,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=617972.6666666666, ans=0.125 2024-09-17 14:48:46,140 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-17 14:48:48,646 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.187e+02 2.316e+02 2.474e+02 3.498e+02, threshold=4.631e+02, percent-clipped=0.0 2024-09-17 14:49:08,730 INFO [train.py:1198] (0/2) Epoch 35, batch 850, loss[loss=0.2471, ctc_loss=0.1731, cr_loss=0.3698, over 14819.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1476, cr_loss=0.3721, over 4040832.76 frames. ], batch size: 149, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:49:13,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618057.6666666666, ans=0.1 2024-09-17 14:50:27,991 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2024-09-17 14:50:30,284 INFO [train.py:1198] (0/2) Epoch 35, batch 900, loss[loss=0.2593, ctc_loss=0.1693, cr_loss=0.4501, over 20679.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1473, cr_loss=0.372, over 4058424.25 frames. ], batch size: 71, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:50:38,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=618199.3333333334, ans=0.125 2024-09-17 14:50:47,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=618227.6666666666, ans=0.125 2024-09-17 14:50:58,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=618227.6666666666, ans=0.125 2024-09-17 14:51:03,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=618256.0, ans=0.0 2024-09-17 14:51:08,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=618256.0, ans=0.125 2024-09-17 14:51:16,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=618284.3333333334, ans=0.125 2024-09-17 14:51:18,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.88 vs. limit=10.0 2024-09-17 14:51:24,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=12.0 2024-09-17 14:51:24,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.72 vs. limit=10.0 2024-09-17 14:51:26,659 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.155e+02 2.255e+02 2.427e+02 3.618e+02, threshold=4.510e+02, percent-clipped=0.0 2024-09-17 14:51:46,176 INFO [train.py:1198] (0/2) Epoch 35, batch 950, loss[loss=0.2253, ctc_loss=0.1494, cr_loss=0.3794, over 20841.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1468, cr_loss=0.3708, over 4055421.26 frames. ], batch size: 65, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:51:53,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=618341.0, ans=0.0 2024-09-17 14:52:16,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618397.6666666666, ans=0.1 2024-09-17 14:52:18,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618397.6666666666, ans=0.1 2024-09-17 14:52:44,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=618426.0, ans=0.2 2024-09-17 14:53:02,153 INFO [train.py:1198] (0/2) Epoch 35, batch 1000, loss[loss=0.2661, ctc_loss=0.1782, cr_loss=0.4393, over 17947.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1471, cr_loss=0.3705, over 4055436.02 frames. ], batch size: 108, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:53:13,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=618482.6666666666, ans=0.125 2024-09-17 14:53:54,344 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.57 vs. limit=10.0 2024-09-17 14:53:58,287 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.192e+02 2.316e+02 2.461e+02 4.578e+02, threshold=4.632e+02, percent-clipped=1.0 2024-09-17 14:54:17,842 INFO [train.py:1198] (0/2) Epoch 35, batch 1050, loss[loss=0.2391, ctc_loss=0.1602, cr_loss=0.3946, over 20825.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1477, cr_loss=0.3717, over 4071462.60 frames. ], batch size: 59, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:55:20,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=618737.6666666666, ans=15.0 2024-09-17 14:55:36,569 INFO [train.py:1198] (0/2) Epoch 35, batch 1100, loss[loss=0.2268, ctc_loss=0.1494, cr_loss=0.387, over 20786.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1482, cr_loss=0.3721, over 4070453.41 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:55:38,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2024-09-17 14:56:05,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=618794.3333333334, ans=0.125 2024-09-17 14:56:08,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=618822.6666666666, ans=0.025 2024-09-17 14:56:14,322 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:56:14,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=618822.6666666666, ans=0.2 2024-09-17 14:56:34,773 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.196e+02 2.311e+02 2.530e+02 3.718e+02, threshold=4.622e+02, percent-clipped=0.0 2024-09-17 14:56:50,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618879.3333333334, ans=0.1 2024-09-17 14:56:54,252 INFO [train.py:1198] (0/2) Epoch 35, batch 1150, loss[loss=0.2296, ctc_loss=0.1522, cr_loss=0.387, over 20610.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1484, cr_loss=0.3719, over 4066045.53 frames. ], batch size: 75, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:57:02,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=618907.6666666666, ans=0.0 2024-09-17 14:57:47,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=618992.6666666666, ans=0.125 2024-09-17 14:57:53,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=619021.0, ans=0.0 2024-09-17 14:58:09,215 INFO [train.py:1198] (0/2) Epoch 35, batch 1200, loss[loss=0.1929, ctc_loss=0.1264, cr_loss=0.3326, over 20943.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1484, cr_loss=0.3719, over 4075714.50 frames. ], batch size: 48, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:59:05,038 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.162e+02 2.302e+02 2.491e+02 3.463e+02, threshold=4.605e+02, percent-clipped=0.0 2024-09-17 14:59:25,057 INFO [train.py:1198] (0/2) Epoch 35, batch 1250, loss[loss=0.2599, ctc_loss=0.1754, cr_loss=0.4226, over 18345.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1478, cr_loss=0.3714, over 4082918.96 frames. ], batch size: 108, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 14:59:46,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-17 15:00:03,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-17 15:00:43,953 INFO [train.py:1198] (0/2) Epoch 35, batch 1300, loss[loss=0.2304, ctc_loss=0.1537, cr_loss=0.3836, over 20969.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1476, cr_loss=0.3712, over 4081917.89 frames. ], batch size: 64, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:00:50,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=619332.6666666666, ans=0.0 2024-09-17 15:00:53,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=619332.6666666666, ans=0.0 2024-09-17 15:01:28,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=619417.6666666666, ans=0.2 2024-09-17 15:01:40,142 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.131e+02 2.259e+02 2.433e+02 4.361e+02, threshold=4.517e+02, percent-clipped=0.0 2024-09-17 15:02:02,510 INFO [train.py:1198] (0/2) Epoch 35, batch 1350, loss[loss=0.2062, ctc_loss=0.138, cr_loss=0.3411, over 21049.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1474, cr_loss=0.371, over 4078615.79 frames. ], batch size: 62, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:02:16,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=619502.6666666666, ans=0.125 2024-09-17 15:02:37,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=619531.0, ans=0.0 2024-09-17 15:02:46,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=619559.3333333334, ans=0.125 2024-09-17 15:03:18,023 INFO [train.py:1198] (0/2) Epoch 35, batch 1400, loss[loss=0.2332, ctc_loss=0.1539, cr_loss=0.3963, over 20879.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1471, cr_loss=0.3712, over 4088821.39 frames. ], batch size: 57, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:03:24,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=619616.0, ans=0.125 2024-09-17 15:03:26,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=619616.0, ans=0.125 2024-09-17 15:03:47,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619672.6666666666, ans=0.1 2024-09-17 15:04:11,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-17 15:04:14,297 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.162e+02 2.295e+02 2.482e+02 5.420e+02, threshold=4.591e+02, percent-clipped=3.0 2024-09-17 15:04:17,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=619729.3333333334, ans=0.2 2024-09-17 15:04:33,862 INFO [train.py:1198] (0/2) Epoch 35, batch 1450, loss[loss=0.2732, ctc_loss=0.1902, cr_loss=0.4147, over 18315.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1469, cr_loss=0.3709, over 4095576.74 frames. ], batch size: 108, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:04:43,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=619757.6666666666, ans=0.0 2024-09-17 15:05:10,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=619814.3333333334, ans=0.125 2024-09-17 15:05:12,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619814.3333333334, ans=0.1 2024-09-17 15:05:14,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2024-09-17 15:05:39,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=619871.0, ans=0.125 2024-09-17 15:05:49,864 INFO [train.py:1198] (0/2) Epoch 35, batch 1500, loss[loss=0.2334, ctc_loss=0.1564, cr_loss=0.3846, over 20845.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.3716, over 4099744.56 frames. ], batch size: 65, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:05:51,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=619899.3333333334, ans=0.125 2024-09-17 15:06:20,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=619956.0, ans=0.0 2024-09-17 15:06:29,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=12.0 2024-09-17 15:06:36,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=619956.0, ans=0.125 2024-09-17 15:06:42,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=619984.3333333334, ans=0.0 2024-09-17 15:06:45,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-17 15:06:49,211 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.135e+02 2.284e+02 2.403e+02 4.344e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-17 15:06:58,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=620012.6666666666, ans=0.125 2024-09-17 15:07:00,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=620012.6666666666, ans=0.0 2024-09-17 15:07:09,065 INFO [train.py:1198] (0/2) Epoch 35, batch 1550, loss[loss=0.2361, ctc_loss=0.1551, cr_loss=0.4047, over 20851.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1481, cr_loss=0.3725, over 4098088.57 frames. ], batch size: 65, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:07:09,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=620041.0, ans=0.07 2024-09-17 15:07:10,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620041.0, ans=0.1 2024-09-17 15:07:24,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620041.0, ans=0.1 2024-09-17 15:07:31,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=620069.3333333334, ans=0.1 2024-09-17 15:07:49,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=620097.6666666666, ans=0.0 2024-09-17 15:08:01,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=620126.0, ans=0.0 2024-09-17 15:08:07,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=620126.0, ans=0.0 2024-09-17 15:08:26,480 INFO [train.py:1198] (0/2) Epoch 35, batch 1600, loss[loss=0.21, ctc_loss=0.1401, cr_loss=0.3495, over 20954.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.372, over 4109746.16 frames. ], batch size: 64, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:09:22,225 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.194e+02 2.316e+02 2.550e+02 5.334e+02, threshold=4.632e+02, percent-clipped=1.0 2024-09-17 15:09:30,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=620296.0, ans=0.025 2024-09-17 15:09:42,077 INFO [train.py:1198] (0/2) Epoch 35, batch 1650, loss[loss=0.2281, ctc_loss=0.1494, cr_loss=0.3935, over 20778.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1473, cr_loss=0.3719, over 4109410.17 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:09:45,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=620324.3333333334, ans=0.125 2024-09-17 15:10:04,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-17 15:10:25,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=620381.0, ans=0.0 2024-09-17 15:10:55,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=620437.6666666666, ans=0.125 2024-09-17 15:10:58,414 INFO [train.py:1198] (0/2) Epoch 35, batch 1700, loss[loss=0.1837, ctc_loss=0.1187, cr_loss=0.3252, over 20976.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1468, cr_loss=0.3707, over 4100114.31 frames. ], batch size: 52, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:11:09,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620466.0, ans=0.1 2024-09-17 15:11:39,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=620522.6666666666, ans=0.125 2024-09-17 15:11:42,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=620551.0, ans=0.025 2024-09-17 15:11:42,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=620551.0, ans=0.0 2024-09-17 15:11:54,422 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.152e+02 2.290e+02 2.430e+02 3.451e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-17 15:12:05,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=620579.3333333334, ans=0.2 2024-09-17 15:12:13,974 INFO [train.py:1198] (0/2) Epoch 35, batch 1750, loss[loss=0.2224, ctc_loss=0.1476, cr_loss=0.3741, over 20997.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1464, cr_loss=0.3696, over 4107874.64 frames. ], batch size: 63, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:12:17,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=620607.6666666666, ans=0.0 2024-09-17 15:12:34,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=620636.0, ans=0.125 2024-09-17 15:13:17,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=620692.6666666666, ans=0.125 2024-09-17 15:13:35,307 INFO [train.py:1198] (0/2) Epoch 35, batch 1800, loss[loss=0.2557, ctc_loss=0.1722, cr_loss=0.4172, over 20973.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1473, cr_loss=0.3709, over 4101573.36 frames. ], batch size: 64, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:14:07,643 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:14:21,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=620834.3333333334, ans=0.025 2024-09-17 15:14:24,686 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2024-09-17 15:14:31,590 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.180e+02 2.316e+02 2.502e+02 4.613e+02, threshold=4.632e+02, percent-clipped=1.0 2024-09-17 15:14:48,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=620862.6666666666, ans=0.2 2024-09-17 15:14:51,179 INFO [train.py:1198] (0/2) Epoch 35, batch 1850, loss[loss=0.2719, ctc_loss=0.1885, cr_loss=0.4169, over 18301.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1476, cr_loss=0.371, over 4094522.43 frames. ], batch size: 108, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:14:51,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-09-17 15:15:02,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=620891.0, ans=0.2 2024-09-17 15:15:14,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=620919.3333333334, ans=0.1 2024-09-17 15:15:17,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-09-17 15:15:20,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=620947.6666666666, ans=0.125 2024-09-17 15:15:50,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=621004.3333333334, ans=0.125 2024-09-17 15:16:06,330 INFO [train.py:1198] (0/2) Epoch 35, batch 1900, loss[loss=0.2589, ctc_loss=0.1809, cr_loss=0.39, over 14169.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1478, cr_loss=0.3709, over 4089655.63 frames. ], batch size: 150, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:16:23,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621061.0, ans=0.1 2024-09-17 15:17:02,750 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.203e+02 2.324e+02 2.466e+02 3.414e+02, threshold=4.648e+02, percent-clipped=0.0 2024-09-17 15:17:03,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621117.6666666666, ans=0.1 2024-09-17 15:17:17,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2024-09-17 15:17:18,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=621146.0, ans=0.04949747468305833 2024-09-17 15:17:18,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=621146.0, ans=0.125 2024-09-17 15:17:21,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=621174.3333333334, ans=0.125 2024-09-17 15:17:22,330 INFO [train.py:1198] (0/2) Epoch 35, batch 1950, loss[loss=0.191, ctc_loss=0.1239, cr_loss=0.3354, over 21022.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1478, cr_loss=0.3713, over 4080009.12 frames. ], batch size: 52, lr: 2.37e-03, grad_scale: 16.0 2024-09-17 15:17:50,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=621202.6666666666, ans=0.2 2024-09-17 15:18:02,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=621231.0, ans=0.0 2024-09-17 15:18:34,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=621287.6666666666, ans=0.2 2024-09-17 15:18:41,248 INFO [train.py:1198] (0/2) Epoch 35, batch 2000, loss[loss=0.2627, ctc_loss=0.176, cr_loss=0.4334, over 20959.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1486, cr_loss=0.3727, over 4068930.46 frames. ], batch size: 58, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:19:00,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=22.5 2024-09-17 15:19:10,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=621344.3333333334, ans=0.2 2024-09-17 15:19:41,587 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.171e+02 2.282e+02 2.466e+02 4.272e+02, threshold=4.565e+02, percent-clipped=0.0 2024-09-17 15:19:54,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=621429.3333333334, ans=0.125 2024-09-17 15:19:59,806 INFO [train.py:1198] (0/2) Epoch 35, batch 2050, loss[loss=0.2158, ctc_loss=0.1415, cr_loss=0.3714, over 21015.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.148, cr_loss=0.3721, over 4081797.60 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:20:29,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=621514.3333333334, ans=0.125 2024-09-17 15:21:14,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=621599.3333333334, ans=0.0 2024-09-17 15:21:15,654 INFO [train.py:1198] (0/2) Epoch 35, batch 2100, loss[loss=0.1784, ctc_loss=0.1159, cr_loss=0.3122, over 20953.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1475, cr_loss=0.3712, over 4087387.75 frames. ], batch size: 49, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:21:51,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=621656.0, ans=0.125 2024-09-17 15:22:09,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621684.3333333334, ans=0.1 2024-09-17 15:22:09,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=621684.3333333334, ans=0.0 2024-09-17 15:22:12,555 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.173e+02 2.293e+02 2.439e+02 5.713e+02, threshold=4.587e+02, percent-clipped=1.0 2024-09-17 15:22:30,926 INFO [train.py:1198] (0/2) Epoch 35, batch 2150, loss[loss=0.222, ctc_loss=0.1439, cr_loss=0.3907, over 21007.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.147, cr_loss=0.371, over 4097583.61 frames. ], batch size: 48, lr: 2.37e-03, grad_scale: 32.0 2024-09-17 15:22:51,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=621769.3333333334, ans=0.5 2024-09-17 15:23:05,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2024-09-17 15:23:49,241 INFO [train.py:1198] (0/2) Epoch 35, batch 2200, loss[loss=0.2177, ctc_loss=0.1442, cr_loss=0.3676, over 20806.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.372, over 4101333.80 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 16.0 2024-09-17 15:24:06,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-17 15:24:12,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=621911.0, ans=0.05 2024-09-17 15:24:32,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-17 15:24:34,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=621939.3333333334, ans=0.2 2024-09-17 15:24:50,719 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.142e+02 2.263e+02 2.441e+02 4.069e+02, threshold=4.526e+02, percent-clipped=0.0 2024-09-17 15:24:55,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=621996.0, ans=0.125 2024-09-17 15:24:55,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=621996.0, ans=0.125 2024-09-17 15:25:07,429 INFO [train.py:1198] (0/2) Epoch 35, batch 2250, loss[loss=0.2171, ctc_loss=0.1443, cr_loss=0.364, over 20978.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1471, cr_loss=0.3704, over 4090170.33 frames. ], batch size: 64, lr: 2.37e-03, grad_scale: 16.0 2024-09-17 15:25:13,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=622024.3333333334, ans=0.125 2024-09-17 15:25:26,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-09-17 15:25:57,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=622109.3333333334, ans=0.2 2024-09-17 15:26:05,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=622109.3333333334, ans=0.125 2024-09-17 15:26:14,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=622137.6666666666, ans=0.125 2024-09-17 15:26:22,374 INFO [train.py:1198] (0/2) Epoch 35, batch 2300, loss[loss=0.2387, ctc_loss=0.1585, cr_loss=0.401, over 20943.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.147, cr_loss=0.3705, over 4088086.45 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 15:26:30,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=622166.0, ans=0.0 2024-09-17 15:26:32,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=622166.0, ans=0.125 2024-09-17 15:26:48,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=622194.3333333334, ans=0.05 2024-09-17 15:26:57,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=622222.6666666666, ans=0.0 2024-09-17 15:27:06,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=622251.0, ans=0.2 2024-09-17 15:27:21,223 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.148e+02 2.271e+02 2.470e+02 4.144e+02, threshold=4.542e+02, percent-clipped=0.0 2024-09-17 15:27:37,713 INFO [train.py:1198] (0/2) Epoch 35, batch 2350, loss[loss=0.2256, ctc_loss=0.1451, cr_loss=0.402, over 20755.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1466, cr_loss=0.3698, over 4082253.19 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 15:28:03,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=622336.0, ans=0.2 2024-09-17 15:28:32,155 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:28:44,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=622421.0, ans=0.125 2024-09-17 15:28:50,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=622421.0, ans=0.0 2024-09-17 15:28:52,932 INFO [train.py:1198] (0/2) Epoch 35, batch 2400, loss[loss=0.1925, ctc_loss=0.1262, cr_loss=0.3314, over 20967.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1468, cr_loss=0.3701, over 4083534.88 frames. ], batch size: 50, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:29:02,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=622449.3333333334, ans=0.0 2024-09-17 15:29:05,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=622449.3333333334, ans=0.2 2024-09-17 15:29:19,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=622477.6666666666, ans=0.2 2024-09-17 15:29:26,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=622506.0, ans=0.02 2024-09-17 15:29:55,159 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.145e+02 2.299e+02 2.502e+02 3.196e+02, threshold=4.597e+02, percent-clipped=0.0 2024-09-17 15:30:14,791 INFO [train.py:1198] (0/2) Epoch 35, batch 2450, loss[loss=0.1807, ctc_loss=0.1168, cr_loss=0.3197, over 20964.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1469, cr_loss=0.3703, over 4086546.29 frames. ], batch size: 51, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:30:21,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-17 15:31:19,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=622704.3333333334, ans=0.0 2024-09-17 15:31:29,808 INFO [train.py:1198] (0/2) Epoch 35, batch 2500, loss[loss=0.1864, ctc_loss=0.1221, cr_loss=0.3216, over 20952.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1467, cr_loss=0.3707, over 4093719.52 frames. ], batch size: 49, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:31:48,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=622761.0, ans=0.125 2024-09-17 15:32:09,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2024-09-17 15:32:28,901 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.182e+02 2.280e+02 2.396e+02 3.155e+02, threshold=4.560e+02, percent-clipped=0.0 2024-09-17 15:32:45,762 INFO [train.py:1198] (0/2) Epoch 35, batch 2550, loss[loss=0.213, ctc_loss=0.1373, cr_loss=0.3787, over 20884.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1478, cr_loss=0.3723, over 4091545.62 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:33:36,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=622959.3333333334, ans=0.2 2024-09-17 15:34:02,074 INFO [train.py:1198] (0/2) Epoch 35, batch 2600, loss[loss=0.2509, ctc_loss=0.1737, cr_loss=0.3859, over 18305.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3722, over 4097382.55 frames. ], batch size: 108, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:34:16,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=623044.3333333334, ans=0.125 2024-09-17 15:34:28,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=623044.3333333334, ans=0.125 2024-09-17 15:34:29,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=623044.3333333334, ans=0.2 2024-09-17 15:34:40,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=623072.6666666666, ans=0.1 2024-09-17 15:34:52,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=623101.0, ans=0.125 2024-09-17 15:35:05,143 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.182e+02 2.307e+02 2.495e+02 3.528e+02, threshold=4.613e+02, percent-clipped=0.0 2024-09-17 15:35:20,435 INFO [train.py:1198] (0/2) Epoch 35, batch 2650, loss[loss=0.1761, ctc_loss=0.1166, cr_loss=0.2976, over 20956.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1479, cr_loss=0.3723, over 4092434.00 frames. ], batch size: 50, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 15:36:15,985 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:36:39,811 INFO [train.py:1198] (0/2) Epoch 35, batch 2700, loss[loss=0.2012, ctc_loss=0.1316, cr_loss=0.3479, over 20923.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.3718, over 4099246.05 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 15:36:44,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=623299.3333333334, ans=0.0 2024-09-17 15:36:55,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=623327.6666666666, ans=0.125 2024-09-17 15:36:56,700 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-220000.pt 2024-09-17 15:37:10,183 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:37:21,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=623356.0, ans=0.125 2024-09-17 15:37:26,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=623384.3333333334, ans=0.1 2024-09-17 15:37:26,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.82 vs. limit=10.0 2024-09-17 15:37:41,294 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.194e+02 2.296e+02 2.509e+02 4.831e+02, threshold=4.592e+02, percent-clipped=1.0 2024-09-17 15:37:56,535 INFO [train.py:1198] (0/2) Epoch 35, batch 2750, loss[loss=0.2342, ctc_loss=0.1567, cr_loss=0.3875, over 21022.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1469, cr_loss=0.3706, over 4091464.42 frames. ], batch size: 63, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 15:38:09,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=623441.0, ans=0.1 2024-09-17 15:39:12,062 INFO [train.py:1198] (0/2) Epoch 35, batch 2800, loss[loss=0.2231, ctc_loss=0.1491, cr_loss=0.3699, over 20777.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3715, over 4097614.69 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:39:22,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=623582.6666666666, ans=0.025 2024-09-17 15:39:24,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=623582.6666666666, ans=0.025 2024-09-17 15:39:48,099 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2024-09-17 15:39:55,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=623639.3333333334, ans=0.1 2024-09-17 15:40:12,763 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.171e+02 2.308e+02 2.428e+02 3.905e+02, threshold=4.616e+02, percent-clipped=0.0 2024-09-17 15:40:25,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=623696.0, ans=0.1 2024-09-17 15:40:27,628 INFO [train.py:1198] (0/2) Epoch 35, batch 2850, loss[loss=0.2528, ctc_loss=0.1729, cr_loss=0.3995, over 19371.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1473, cr_loss=0.372, over 4111438.00 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:40:34,159 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.878e-03 2024-09-17 15:40:44,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=623752.6666666666, ans=0.125 2024-09-17 15:40:45,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=623752.6666666666, ans=0.015 2024-09-17 15:40:59,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=623781.0, ans=0.0 2024-09-17 15:41:09,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=623781.0, ans=0.0 2024-09-17 15:41:19,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=623809.3333333334, ans=0.125 2024-09-17 15:41:30,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.28 vs. limit=15.0 2024-09-17 15:41:39,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-09-17 15:41:41,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=623837.6666666666, ans=0.2 2024-09-17 15:41:49,030 INFO [train.py:1198] (0/2) Epoch 35, batch 2900, loss[loss=0.2, ctc_loss=0.1317, cr_loss=0.3416, over 21080.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3713, over 4122554.71 frames. ], batch size: 53, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:42:07,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=623894.3333333334, ans=0.125 2024-09-17 15:42:19,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=623922.6666666666, ans=0.025 2024-09-17 15:42:49,625 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.198e+02 2.335e+02 2.468e+02 3.737e+02, threshold=4.670e+02, percent-clipped=0.0 2024-09-17 15:42:54,733 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2024-09-17 15:43:04,597 INFO [train.py:1198] (0/2) Epoch 35, batch 2950, loss[loss=0.2052, ctc_loss=0.1317, cr_loss=0.3676, over 20981.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1477, cr_loss=0.3733, over 4119840.49 frames. ], batch size: 55, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:43:30,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=624036.0, ans=0.025 2024-09-17 15:43:55,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=624092.6666666666, ans=0.025 2024-09-17 15:44:09,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.36 vs. limit=10.0 2024-09-17 15:44:13,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=624121.0, ans=0.125 2024-09-17 15:44:20,636 INFO [train.py:1198] (0/2) Epoch 35, batch 3000, loss[loss=0.2391, ctc_loss=0.1596, cr_loss=0.3975, over 19446.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1471, cr_loss=0.372, over 4123822.79 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:44:20,636 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 15:44:40,494 INFO [train.py:1230] (0/2) Epoch 35, validation: loss=0.04061, ctc_loss=0.04061, cr_loss=1.315e-14, over 944034.00 frames. 2024-09-17 15:44:40,495 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 15:44:52,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=624149.3333333334, ans=0.0 2024-09-17 15:44:52,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=624149.3333333334, ans=0.025 2024-09-17 15:45:00,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=624177.6666666666, ans=0.015 2024-09-17 15:45:05,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=624177.6666666666, ans=0.025 2024-09-17 15:45:40,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.174e+02 2.285e+02 2.453e+02 8.705e+02, threshold=4.570e+02, percent-clipped=1.0 2024-09-17 15:45:52,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=624262.6666666666, ans=0.0 2024-09-17 15:45:55,634 INFO [train.py:1198] (0/2) Epoch 35, batch 3050, loss[loss=0.2351, ctc_loss=0.1571, cr_loss=0.39, over 20927.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.3722, over 4105336.29 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:45:57,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624291.0, ans=0.1 2024-09-17 15:46:53,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=624376.0, ans=0.125 2024-09-17 15:46:58,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=624404.3333333334, ans=0.125 2024-09-17 15:47:08,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=624404.3333333334, ans=0.2 2024-09-17 15:47:11,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=624404.3333333334, ans=0.025 2024-09-17 15:47:17,360 INFO [train.py:1198] (0/2) Epoch 35, batch 3100, loss[loss=0.1926, ctc_loss=0.1236, cr_loss=0.3449, over 20986.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1472, cr_loss=0.3712, over 4109147.63 frames. ], batch size: 49, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:47:28,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-17 15:47:35,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=624461.0, ans=0.0 2024-09-17 15:47:37,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=624461.0, ans=0.125 2024-09-17 15:48:03,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=624517.6666666666, ans=0.025 2024-09-17 15:48:13,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=624517.6666666666, ans=0.125 2024-09-17 15:48:17,480 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.186e+02 2.317e+02 2.450e+02 4.076e+02, threshold=4.634e+02, percent-clipped=0.0 2024-09-17 15:48:32,234 INFO [train.py:1198] (0/2) Epoch 35, batch 3150, loss[loss=0.2567, ctc_loss=0.1747, cr_loss=0.4101, over 20660.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.3721, over 4106079.73 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:48:42,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-17 15:48:43,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624574.3333333334, ans=0.1 2024-09-17 15:49:47,709 INFO [train.py:1198] (0/2) Epoch 35, batch 3200, loss[loss=0.2178, ctc_loss=0.1443, cr_loss=0.3674, over 20825.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3706, over 4117600.83 frames. ], batch size: 59, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:49:56,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=624716.0, ans=0.0 2024-09-17 15:50:04,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=624744.3333333334, ans=0.2 2024-09-17 15:50:15,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=624744.3333333334, ans=0.05 2024-09-17 15:50:43,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624801.0, ans=0.1 2024-09-17 15:50:49,135 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.202e+02 2.301e+02 2.445e+02 3.308e+02, threshold=4.601e+02, percent-clipped=0.0 2024-09-17 15:51:04,470 INFO [train.py:1198] (0/2) Epoch 35, batch 3250, loss[loss=0.191, ctc_loss=0.1235, cr_loss=0.3377, over 20982.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1463, cr_loss=0.3703, over 4102642.07 frames. ], batch size: 51, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:51:09,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=624857.6666666666, ans=0.04949747468305833 2024-09-17 15:51:23,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=624886.0, ans=0.125 2024-09-17 15:51:35,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=624914.3333333334, ans=0.025 2024-09-17 15:51:49,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=15.0 2024-09-17 15:52:23,528 INFO [train.py:1198] (0/2) Epoch 35, batch 3300, loss[loss=0.2384, ctc_loss=0.1601, cr_loss=0.3913, over 20424.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1474, cr_loss=0.3712, over 4078570.39 frames. ], batch size: 74, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:52:39,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=625027.6666666666, ans=0.125 2024-09-17 15:53:07,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=625056.0, ans=0.0 2024-09-17 15:53:09,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=625056.0, ans=0.125 2024-09-17 15:53:27,326 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.205e+02 2.322e+02 2.508e+02 2.970e+02, threshold=4.645e+02, percent-clipped=0.0 2024-09-17 15:53:27,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=625112.6666666666, ans=0.0 2024-09-17 15:53:35,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=625112.6666666666, ans=0.125 2024-09-17 15:53:35,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=625112.6666666666, ans=0.125 2024-09-17 15:53:42,444 INFO [train.py:1198] (0/2) Epoch 35, batch 3350, loss[loss=0.1865, ctc_loss=0.1227, cr_loss=0.3191, over 21055.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3711, over 4086784.08 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:54:55,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=625254.3333333334, ans=0.125 2024-09-17 15:54:58,020 INFO [train.py:1198] (0/2) Epoch 35, batch 3400, loss[loss=0.1891, ctc_loss=0.1213, cr_loss=0.3387, over 21010.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1481, cr_loss=0.3725, over 4085612.78 frames. ], batch size: 52, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:55:26,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=625339.3333333334, ans=0.125 2024-09-17 15:55:30,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-17 15:55:37,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625339.3333333334, ans=0.1 2024-09-17 15:55:48,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=625367.6666666666, ans=0.125 2024-09-17 15:55:58,494 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.170e+02 2.299e+02 2.445e+02 5.340e+02, threshold=4.598e+02, percent-clipped=1.0 2024-09-17 15:56:13,232 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.38 vs. limit=8.0 2024-09-17 15:56:13,506 INFO [train.py:1198] (0/2) Epoch 35, batch 3450, loss[loss=0.1966, ctc_loss=0.1286, cr_loss=0.3403, over 20869.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3712, over 4088814.30 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:56:22,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=625424.3333333334, ans=0.0 2024-09-17 15:56:36,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=625452.6666666666, ans=0.125 2024-09-17 15:56:38,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=22.5 2024-09-17 15:56:46,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=625481.0, ans=0.125 2024-09-17 15:56:48,691 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:56:48,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=625481.0, ans=0.2 2024-09-17 15:56:58,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=15.0 2024-09-17 15:57:24,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=625537.6666666666, ans=0.125 2024-09-17 15:57:29,028 INFO [train.py:1198] (0/2) Epoch 35, batch 3500, loss[loss=0.1912, ctc_loss=0.1242, cr_loss=0.3351, over 20001.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1469, cr_loss=0.3702, over 4094430.30 frames. ], batch size: 44, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:57:29,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=625566.0, ans=0.125 2024-09-17 15:58:02,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=625622.6666666666, ans=0.0 2024-09-17 15:58:23,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=625651.0, ans=0.125 2024-09-17 15:58:25,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=625651.0, ans=0.125 2024-09-17 15:58:32,399 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.172e+02 2.338e+02 2.458e+02 5.405e+02, threshold=4.675e+02, percent-clipped=1.0 2024-09-17 15:58:39,158 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2024-09-17 15:58:51,009 INFO [train.py:1198] (0/2) Epoch 35, batch 3550, loss[loss=0.2504, ctc_loss=0.1686, cr_loss=0.4088, over 20998.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3708, over 4103247.75 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 15:59:52,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=625821.0, ans=0.0 2024-09-17 16:00:06,457 INFO [train.py:1198] (0/2) Epoch 35, batch 3600, loss[loss=0.197, ctc_loss=0.1275, cr_loss=0.3472, over 20972.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1473, cr_loss=0.371, over 4106025.28 frames. ], batch size: 49, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 16:00:36,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625906.0, ans=0.1 2024-09-17 16:00:41,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-17 16:00:46,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=22.5 2024-09-17 16:01:05,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.193e+02 2.318e+02 2.457e+02 2.955e+02, threshold=4.635e+02, percent-clipped=0.0 2024-09-17 16:01:18,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=15.0 2024-09-17 16:01:20,755 INFO [train.py:1198] (0/2) Epoch 35, batch 3650, loss[loss=0.2122, ctc_loss=0.1401, cr_loss=0.3602, over 20998.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.372, over 4105012.34 frames. ], batch size: 55, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 16:01:27,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=625991.0, ans=12.0 2024-09-17 16:01:53,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2024-09-17 16:01:54,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=626047.6666666666, ans=0.125 2024-09-17 16:02:02,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626047.6666666666, ans=0.1 2024-09-17 16:02:21,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626104.3333333334, ans=0.1 2024-09-17 16:02:36,805 INFO [train.py:1198] (0/2) Epoch 35, batch 3700, loss[loss=0.2134, ctc_loss=0.143, cr_loss=0.3519, over 20871.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1478, cr_loss=0.373, over 4105485.64 frames. ], batch size: 57, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 16:03:12,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.28 vs. limit=15.0 2024-09-17 16:03:40,400 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.148e+02 2.309e+02 2.452e+02 3.277e+02, threshold=4.617e+02, percent-clipped=0.0 2024-09-17 16:03:55,551 INFO [train.py:1198] (0/2) Epoch 35, batch 3750, loss[loss=0.2162, ctc_loss=0.1443, cr_loss=0.3598, over 20986.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1477, cr_loss=0.3728, over 4103797.06 frames. ], batch size: 48, lr: 2.36e-03, grad_scale: 32.0 2024-09-17 16:04:29,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=626331.0, ans=0.125 2024-09-17 16:04:29,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=626331.0, ans=0.125 2024-09-17 16:05:14,654 INFO [train.py:1198] (0/2) Epoch 35, batch 3800, loss[loss=0.2241, ctc_loss=0.1491, cr_loss=0.375, over 20337.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1471, cr_loss=0.3719, over 4104853.59 frames. ], batch size: 74, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 16:06:13,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=626501.0, ans=15.0 2024-09-17 16:06:18,322 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.197e+02 2.330e+02 2.463e+02 5.185e+02, threshold=4.661e+02, percent-clipped=1.0 2024-09-17 16:06:30,866 INFO [train.py:1198] (0/2) Epoch 35, batch 3850, loss[loss=0.2588, ctc_loss=0.1753, cr_loss=0.4173, over 20024.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.148, cr_loss=0.3734, over 4111474.77 frames. ], batch size: 80, lr: 2.36e-03, grad_scale: 8.0 2024-09-17 16:06:35,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=626557.6666666666, ans=0.0 2024-09-17 16:06:44,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=626586.0, ans=0.125 2024-09-17 16:06:51,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=626586.0, ans=0.5 2024-09-17 16:06:55,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=626586.0, ans=0.0 2024-09-17 16:07:06,820 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-09-17 16:07:14,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2024-09-17 16:07:46,869 INFO [train.py:1198] (0/2) Epoch 35, batch 3900, loss[loss=0.2374, ctc_loss=0.1575, cr_loss=0.3995, over 20963.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1471, cr_loss=0.3728, over 4124693.31 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 8.0 2024-09-17 16:08:03,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=626727.6666666666, ans=0.0 2024-09-17 16:08:09,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=626727.6666666666, ans=0.0 2024-09-17 16:08:14,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=626727.6666666666, ans=0.0 2024-09-17 16:08:17,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=626756.0, ans=0.125 2024-09-17 16:08:25,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.68 vs. limit=5.0 2024-09-17 16:08:27,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=626756.0, ans=0.0 2024-09-17 16:08:42,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-17 16:08:50,481 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.188e+02 2.328e+02 2.499e+02 4.813e+02, threshold=4.655e+02, percent-clipped=1.0 2024-09-17 16:09:02,355 INFO [train.py:1198] (0/2) Epoch 35, batch 3950, loss[loss=0.208, ctc_loss=0.1374, cr_loss=0.3531, over 20317.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1483, cr_loss=0.3735, over 4113885.86 frames. ], batch size: 45, lr: 2.36e-03, grad_scale: 8.0 2024-09-17 16:09:07,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=626841.0, ans=0.0 2024-09-17 16:09:08,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=626841.0, ans=0.125 2024-09-17 16:09:11,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=626841.0, ans=0.2 2024-09-17 16:09:13,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=626841.0, ans=0.125 2024-09-17 16:09:39,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=12.0 2024-09-17 16:10:01,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=626926.0, ans=0.025 2024-09-17 16:10:04,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=626954.3333333334, ans=0.0 2024-09-17 16:10:14,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-09-17 16:10:23,788 INFO [train.py:1198] (0/2) Epoch 35, batch 4000, loss[loss=0.232, ctc_loss=0.159, cr_loss=0.3652, over 20979.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3732, over 4122197.32 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 16:10:57,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=627039.3333333334, ans=0.125 2024-09-17 16:11:27,065 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.189e+02 2.287e+02 2.402e+02 3.769e+02, threshold=4.574e+02, percent-clipped=0.0 2024-09-17 16:11:29,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=627096.0, ans=0.0 2024-09-17 16:11:39,269 INFO [train.py:1198] (0/2) Epoch 35, batch 4050, loss[loss=0.2107, ctc_loss=0.1382, cr_loss=0.3622, over 20968.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1483, cr_loss=0.3738, over 4123549.36 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 16:11:39,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=627124.3333333334, ans=0.125 2024-09-17 16:12:01,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=627152.6666666666, ans=0.125 2024-09-17 16:12:49,835 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:12:55,443 INFO [train.py:1198] (0/2) Epoch 35, batch 4100, loss[loss=0.2555, ctc_loss=0.1727, cr_loss=0.4139, over 20104.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1481, cr_loss=0.373, over 4107319.60 frames. ], batch size: 80, lr: 2.36e-03, grad_scale: 16.0 2024-09-17 16:13:59,110 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.199e+02 2.308e+02 2.462e+02 3.670e+02, threshold=4.616e+02, percent-clipped=0.0 2024-09-17 16:14:11,465 INFO [train.py:1198] (0/2) Epoch 35, batch 4150, loss[loss=0.2286, ctc_loss=0.1534, cr_loss=0.3759, over 20837.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1477, cr_loss=0.3721, over 4112158.29 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:14:31,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2024-09-17 16:15:30,581 INFO [train.py:1198] (0/2) Epoch 35, batch 4200, loss[loss=0.2016, ctc_loss=0.1325, cr_loss=0.3455, over 20964.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1462, cr_loss=0.3695, over 4102446.50 frames. ], batch size: 52, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:15:35,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=627549.3333333334, ans=0.1 2024-09-17 16:15:43,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-09-17 16:16:17,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=627634.3333333334, ans=0.05 2024-09-17 16:16:37,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.171e+02 2.331e+02 2.481e+02 3.624e+02, threshold=4.662e+02, percent-clipped=0.0 2024-09-17 16:16:42,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=627662.6666666666, ans=0.2 2024-09-17 16:16:49,842 INFO [train.py:1198] (0/2) Epoch 35, batch 4250, loss[loss=0.2221, ctc_loss=0.1468, cr_loss=0.3767, over 20896.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1465, cr_loss=0.3699, over 4104630.32 frames. ], batch size: 54, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:17:06,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=627719.3333333334, ans=0.2 2024-09-17 16:17:14,766 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=12.0 2024-09-17 16:18:05,729 INFO [train.py:1198] (0/2) Epoch 35, batch 4300, loss[loss=0.2582, ctc_loss=0.1727, cr_loss=0.4278, over 20735.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1465, cr_loss=0.3695, over 4097295.35 frames. ], batch size: 71, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:18:46,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=627889.3333333334, ans=0.0 2024-09-17 16:19:09,820 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.188e+02 2.309e+02 2.518e+02 1.008e+03, threshold=4.618e+02, percent-clipped=1.0 2024-09-17 16:19:21,947 INFO [train.py:1198] (0/2) Epoch 35, batch 4350, loss[loss=0.2066, ctc_loss=0.1367, cr_loss=0.3494, over 20971.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1468, cr_loss=0.3701, over 4094613.36 frames. ], batch size: 50, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:19:36,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628002.6666666666, ans=0.1 2024-09-17 16:20:25,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=628087.6666666666, ans=0.2 2024-09-17 16:20:37,616 INFO [train.py:1198] (0/2) Epoch 35, batch 4400, loss[loss=0.2185, ctc_loss=0.1424, cr_loss=0.3804, over 20848.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1475, cr_loss=0.3713, over 4087467.65 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:20:57,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=628144.3333333334, ans=0.0 2024-09-17 16:21:03,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628144.3333333334, ans=0.1 2024-09-17 16:21:39,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=628201.0, ans=0.0 2024-09-17 16:21:44,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=628229.3333333334, ans=0.125 2024-09-17 16:21:47,099 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.193e+02 2.302e+02 2.500e+02 3.112e+02, threshold=4.603e+02, percent-clipped=0.0 2024-09-17 16:21:59,277 INFO [train.py:1198] (0/2) Epoch 35, batch 4450, loss[loss=0.1883, ctc_loss=0.1254, cr_loss=0.3145, over 21062.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1466, cr_loss=0.3705, over 4088953.57 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:22:04,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.25 vs. limit=6.0 2024-09-17 16:22:39,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=628314.3333333334, ans=0.1 2024-09-17 16:22:48,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2024-09-17 16:23:14,778 INFO [train.py:1198] (0/2) Epoch 35, batch 4500, loss[loss=0.2286, ctc_loss=0.1507, cr_loss=0.3892, over 20852.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1468, cr_loss=0.3716, over 4099457.34 frames. ], batch size: 65, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:23:38,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2024-09-17 16:23:49,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=628456.0, ans=0.125 2024-09-17 16:23:50,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628456.0, ans=0.1 2024-09-17 16:24:11,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=628484.3333333334, ans=0.125 2024-09-17 16:24:18,442 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.150e+02 2.320e+02 2.465e+02 3.181e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-17 16:24:30,732 INFO [train.py:1198] (0/2) Epoch 35, batch 4550, loss[loss=0.2173, ctc_loss=0.1417, cr_loss=0.3777, over 20766.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.3721, over 4095350.44 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:24:54,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=628569.3333333334, ans=0.0 2024-09-17 16:25:05,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2024-09-17 16:25:06,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=628597.6666666666, ans=0.2 2024-09-17 16:25:24,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=628626.0, ans=0.125 2024-09-17 16:25:39,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=628654.3333333334, ans=0.125 2024-09-17 16:25:44,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=628654.3333333334, ans=0.0 2024-09-17 16:25:46,960 INFO [train.py:1198] (0/2) Epoch 35, batch 4600, loss[loss=0.2224, ctc_loss=0.1484, cr_loss=0.3699, over 20816.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1466, cr_loss=0.3705, over 4094968.49 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:26:41,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=628767.6666666666, ans=0.015 2024-09-17 16:26:42,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=628767.6666666666, ans=0.1 2024-09-17 16:26:48,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=628796.0, ans=0.0 2024-09-17 16:26:53,391 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.147e+02 2.321e+02 2.529e+02 3.159e+02, threshold=4.641e+02, percent-clipped=0.0 2024-09-17 16:27:05,501 INFO [train.py:1198] (0/2) Epoch 35, batch 4650, loss[loss=0.2232, ctc_loss=0.147, cr_loss=0.3808, over 20917.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1464, cr_loss=0.3703, over 4095773.94 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:27:34,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=628852.6666666666, ans=0.125 2024-09-17 16:27:55,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=628909.3333333334, ans=0.0 2024-09-17 16:27:55,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.57 vs. limit=10.0 2024-09-17 16:28:01,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=628909.3333333334, ans=0.07 2024-09-17 16:28:23,751 INFO [train.py:1198] (0/2) Epoch 35, batch 4700, loss[loss=0.2763, ctc_loss=0.1918, cr_loss=0.4228, over 18005.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1464, cr_loss=0.3699, over 4096605.46 frames. ], batch size: 108, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:28:24,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=628966.0, ans=0.125 2024-09-17 16:28:51,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=628994.3333333334, ans=0.125 2024-09-17 16:28:54,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-09-17 16:29:10,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.93 vs. limit=10.0 2024-09-17 16:29:27,082 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.147e+02 2.302e+02 2.455e+02 3.164e+02, threshold=4.605e+02, percent-clipped=0.0 2024-09-17 16:29:30,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=629079.3333333334, ans=0.125 2024-09-17 16:29:32,555 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.69 vs. limit=10.0 2024-09-17 16:29:39,486 INFO [train.py:1198] (0/2) Epoch 35, batch 4750, loss[loss=0.2658, ctc_loss=0.1805, cr_loss=0.4266, over 18331.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1468, cr_loss=0.371, over 4093668.86 frames. ], batch size: 108, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:29:39,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=629107.6666666666, ans=0.125 2024-09-17 16:29:44,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=629107.6666666666, ans=0.1 2024-09-17 16:30:09,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=629164.3333333334, ans=0.5 2024-09-17 16:30:19,176 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:30:38,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=629221.0, ans=0.0 2024-09-17 16:30:53,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=629249.3333333334, ans=0.125 2024-09-17 16:30:54,656 INFO [train.py:1198] (0/2) Epoch 35, batch 4800, loss[loss=0.2167, ctc_loss=0.1455, cr_loss=0.3557, over 20962.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1466, cr_loss=0.3699, over 4092878.54 frames. ], batch size: 55, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:30:58,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=629249.3333333334, ans=0.07 2024-09-17 16:31:45,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2024-09-17 16:31:58,131 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.176e+02 2.327e+02 2.496e+02 3.136e+02, threshold=4.655e+02, percent-clipped=0.0 2024-09-17 16:32:10,415 INFO [train.py:1198] (0/2) Epoch 35, batch 4850, loss[loss=0.2027, ctc_loss=0.1339, cr_loss=0.3442, over 21071.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1478, cr_loss=0.3715, over 4081539.84 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:32:24,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=629391.0, ans=0.125 2024-09-17 16:32:33,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=629419.3333333334, ans=0.1 2024-09-17 16:32:53,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=629447.6666666666, ans=0.0 2024-09-17 16:33:01,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=22.5 2024-09-17 16:33:28,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.56 vs. limit=15.0 2024-09-17 16:33:32,395 INFO [train.py:1198] (0/2) Epoch 35, batch 4900, loss[loss=0.2375, ctc_loss=0.1588, cr_loss=0.3935, over 21044.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1473, cr_loss=0.3705, over 4076265.80 frames. ], batch size: 63, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:33:40,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=629532.6666666666, ans=0.0 2024-09-17 16:34:14,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=629589.3333333334, ans=0.0 2024-09-17 16:34:17,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=629617.6666666666, ans=0.1 2024-09-17 16:34:23,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=629617.6666666666, ans=0.025 2024-09-17 16:34:35,242 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.199e+02 2.313e+02 2.504e+02 3.110e+02, threshold=4.625e+02, percent-clipped=0.0 2024-09-17 16:34:43,132 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:34:47,115 INFO [train.py:1198] (0/2) Epoch 35, batch 4950, loss[loss=0.1951, ctc_loss=0.1287, cr_loss=0.332, over 19974.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1474, cr_loss=0.3708, over 4077104.17 frames. ], batch size: 44, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:35:25,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=629731.0, ans=0.0 2024-09-17 16:35:43,125 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=12.0 2024-09-17 16:35:47,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=629787.6666666666, ans=0.125 2024-09-17 16:35:54,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=629787.6666666666, ans=0.1 2024-09-17 16:36:01,759 INFO [train.py:1198] (0/2) Epoch 35, batch 5000, loss[loss=0.2318, ctc_loss=0.1547, cr_loss=0.3855, over 20654.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1473, cr_loss=0.3705, over 4070950.14 frames. ], batch size: 68, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:36:15,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=629844.3333333334, ans=0.2 2024-09-17 16:37:04,782 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.157e+02 2.310e+02 2.447e+02 8.280e+02, threshold=4.620e+02, percent-clipped=1.0 2024-09-17 16:37:16,764 INFO [train.py:1198] (0/2) Epoch 35, batch 5050, loss[loss=0.2001, ctc_loss=0.1319, cr_loss=0.3412, over 21056.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.147, cr_loss=0.3701, over 4066703.38 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:37:38,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.16 vs. limit=15.0 2024-09-17 16:38:07,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=630042.6666666666, ans=0.035 2024-09-17 16:38:29,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=630099.3333333334, ans=0.125 2024-09-17 16:38:31,024 INFO [train.py:1198] (0/2) Epoch 35, batch 5100, loss[loss=0.2324, ctc_loss=0.1547, cr_loss=0.3882, over 20812.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1475, cr_loss=0.3709, over 4072268.83 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:38:32,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=630099.3333333334, ans=0.0 2024-09-17 16:38:54,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=630127.6666666666, ans=15.0 2024-09-17 16:39:07,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=630156.0, ans=0.125 2024-09-17 16:39:23,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=630184.3333333334, ans=0.1 2024-09-17 16:39:33,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.233e+02 2.373e+02 2.558e+02 1.219e+03, threshold=4.746e+02, percent-clipped=1.0 2024-09-17 16:39:45,568 INFO [train.py:1198] (0/2) Epoch 35, batch 5150, loss[loss=0.2227, ctc_loss=0.1488, cr_loss=0.3695, over 20768.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1473, cr_loss=0.3713, over 4067418.40 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:40:35,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=630326.0, ans=0.125 2024-09-17 16:40:42,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=630326.0, ans=0.125 2024-09-17 16:40:54,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=630354.3333333334, ans=0.2 2024-09-17 16:41:00,148 INFO [train.py:1198] (0/2) Epoch 35, batch 5200, loss[loss=0.1757, ctc_loss=0.1126, cr_loss=0.3154, over 20936.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1471, cr_loss=0.3713, over 4082491.32 frames. ], batch size: 50, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:41:27,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=630411.0, ans=0.125 2024-09-17 16:41:31,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=630439.3333333334, ans=0.125 2024-09-17 16:41:34,951 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:41:51,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=630467.6666666666, ans=0.125 2024-09-17 16:42:00,505 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:42:04,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=630496.0, ans=0.1 2024-09-17 16:42:08,638 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.162e+02 2.301e+02 2.494e+02 3.436e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-17 16:42:20,574 INFO [train.py:1198] (0/2) Epoch 35, batch 5250, loss[loss=0.2016, ctc_loss=0.1345, cr_loss=0.3356, over 20753.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3704, over 4086190.94 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:43:34,993 INFO [train.py:1198] (0/2) Epoch 35, batch 5300, loss[loss=0.2019, ctc_loss=0.1307, cr_loss=0.356, over 20805.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3704, over 4090976.21 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:43:38,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=630666.0, ans=0.1 2024-09-17 16:43:48,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=630694.3333333334, ans=0.125 2024-09-17 16:43:55,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=630694.3333333334, ans=0.2 2024-09-17 16:44:39,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.153e+02 2.272e+02 2.432e+02 4.479e+02, threshold=4.543e+02, percent-clipped=0.0 2024-09-17 16:44:42,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=630779.3333333334, ans=0.125 2024-09-17 16:44:50,154 INFO [train.py:1198] (0/2) Epoch 35, batch 5350, loss[loss=0.2106, ctc_loss=0.1384, cr_loss=0.361, over 20949.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3705, over 4092622.04 frames. ], batch size: 49, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:44:52,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=630807.6666666666, ans=0.125 2024-09-17 16:44:52,776 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.14 vs. limit=10.0 2024-09-17 16:45:07,123 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:45:47,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=630892.6666666666, ans=0.125 2024-09-17 16:46:02,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2024-09-17 16:46:04,704 INFO [train.py:1198] (0/2) Epoch 35, batch 5400, loss[loss=0.2216, ctc_loss=0.1488, cr_loss=0.3637, over 20354.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1471, cr_loss=0.3726, over 4103645.75 frames. ], batch size: 74, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:46:06,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=630949.3333333334, ans=0.07 2024-09-17 16:46:23,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=630977.6666666666, ans=0.2 2024-09-17 16:47:08,266 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.185e+02 2.325e+02 2.494e+02 2.975e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-17 16:47:18,625 INFO [train.py:1198] (0/2) Epoch 35, batch 5450, loss[loss=0.2295, ctc_loss=0.1519, cr_loss=0.3881, over 20870.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1475, cr_loss=0.3737, over 4107325.18 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:47:28,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2024-09-17 16:48:32,705 INFO [train.py:1198] (0/2) Epoch 35, batch 5500, loss[loss=0.2083, ctc_loss=0.1372, cr_loss=0.3553, over 20874.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1483, cr_loss=0.3745, over 4100173.43 frames. ], batch size: 54, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:48:36,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=631232.6666666666, ans=0.0 2024-09-17 16:49:16,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=631317.6666666666, ans=0.0 2024-09-17 16:49:30,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=631346.0, ans=0.125 2024-09-17 16:49:36,496 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.176e+02 2.294e+02 2.454e+02 3.147e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-17 16:49:41,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=631346.0, ans=0.125 2024-09-17 16:49:47,151 INFO [train.py:1198] (0/2) Epoch 35, batch 5550, loss[loss=0.2196, ctc_loss=0.1482, cr_loss=0.3574, over 20648.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1466, cr_loss=0.3712, over 4106323.99 frames. ], batch size: 66, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:49:50,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=631374.3333333334, ans=0.125 2024-09-17 16:49:53,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=631374.3333333334, ans=0.0 2024-09-17 16:50:31,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=631431.0, ans=0.125 2024-09-17 16:50:54,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=631487.6666666666, ans=0.0 2024-09-17 16:51:00,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=631487.6666666666, ans=15.0 2024-09-17 16:51:05,955 INFO [train.py:1198] (0/2) Epoch 35, batch 5600, loss[loss=0.2428, ctc_loss=0.1634, cr_loss=0.3973, over 20174.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1463, cr_loss=0.3707, over 4099860.57 frames. ], batch size: 80, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:51:19,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=631544.3333333334, ans=0.125 2024-09-17 16:51:32,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=631544.3333333334, ans=0.0 2024-09-17 16:52:09,018 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.232e+02 2.365e+02 2.537e+02 6.394e+02, threshold=4.729e+02, percent-clipped=2.0 2024-09-17 16:52:19,472 INFO [train.py:1198] (0/2) Epoch 35, batch 5650, loss[loss=0.1858, ctc_loss=0.1202, cr_loss=0.3278, over 21054.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1476, cr_loss=0.3726, over 4088153.47 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:52:53,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=631714.3333333334, ans=0.0 2024-09-17 16:53:00,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=631714.3333333334, ans=0.125 2024-09-17 16:53:02,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=631742.6666666666, ans=0.0 2024-09-17 16:53:03,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=631742.6666666666, ans=0.125 2024-09-17 16:53:05,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=631742.6666666666, ans=0.025 2024-09-17 16:53:08,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=22.5 2024-09-17 16:53:30,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=631771.0, ans=10.0 2024-09-17 16:53:32,858 INFO [train.py:1198] (0/2) Epoch 35, batch 5700, loss[loss=0.2201, ctc_loss=0.1466, cr_loss=0.3672, over 20974.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3721, over 4097061.13 frames. ], batch size: 67, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 16:53:54,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2024-09-17 16:53:59,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=631827.6666666666, ans=0.025 2024-09-17 16:54:04,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=631856.0, ans=0.1 2024-09-17 16:54:07,648 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:54:19,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=631884.3333333334, ans=0.0 2024-09-17 16:54:38,545 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 2.180e+02 2.294e+02 2.417e+02 3.007e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-17 16:54:40,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=631912.6666666666, ans=0.1 2024-09-17 16:54:47,645 INFO [train.py:1198] (0/2) Epoch 35, batch 5750, loss[loss=0.2103, ctc_loss=0.1403, cr_loss=0.3501, over 20957.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1473, cr_loss=0.3724, over 4106288.02 frames. ], batch size: 51, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:55:31,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=632026.0, ans=0.04949747468305833 2024-09-17 16:55:46,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=632054.3333333334, ans=0.125 2024-09-17 16:56:01,923 INFO [train.py:1198] (0/2) Epoch 35, batch 5800, loss[loss=0.2231, ctc_loss=0.1502, cr_loss=0.3646, over 19526.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1472, cr_loss=0.3714, over 4101944.97 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:56:36,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=632139.3333333334, ans=0.0 2024-09-17 16:56:39,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=632139.3333333334, ans=0.2 2024-09-17 16:57:03,265 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-09-17 16:57:07,301 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.185e+02 2.320e+02 2.430e+02 3.087e+02, threshold=4.640e+02, percent-clipped=0.0 2024-09-17 16:57:16,070 INFO [train.py:1198] (0/2) Epoch 35, batch 5850, loss[loss=0.2315, ctc_loss=0.1518, cr_loss=0.3983, over 21013.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.372, over 4106379.09 frames. ], batch size: 52, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:57:20,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=632224.3333333334, ans=0.125 2024-09-17 16:57:25,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=632224.3333333334, ans=0.04949747468305833 2024-09-17 16:58:15,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=632337.6666666666, ans=0.0 2024-09-17 16:58:30,135 INFO [train.py:1198] (0/2) Epoch 35, batch 5900, loss[loss=0.227, ctc_loss=0.1521, cr_loss=0.3743, over 21003.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3731, over 4095823.55 frames. ], batch size: 67, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 16:58:42,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=632366.0, ans=0.95 2024-09-17 16:58:45,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=632394.3333333334, ans=0.125 2024-09-17 16:59:15,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=632451.0, ans=0.125 2024-09-17 16:59:39,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=632479.3333333334, ans=0.05 2024-09-17 16:59:40,374 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.193e+02 2.324e+02 2.486e+02 3.833e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-17 16:59:49,388 INFO [train.py:1198] (0/2) Epoch 35, batch 5950, loss[loss=0.2124, ctc_loss=0.1401, cr_loss=0.3611, over 21017.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1472, cr_loss=0.3716, over 4108257.23 frames. ], batch size: 63, lr: 2.35e-03, grad_scale: 16.0 2024-09-17 17:00:28,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=632564.3333333334, ans=0.125 2024-09-17 17:01:03,875 INFO [train.py:1198] (0/2) Epoch 35, batch 6000, loss[loss=0.2249, ctc_loss=0.1497, cr_loss=0.3758, over 20783.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1468, cr_loss=0.3708, over 4096648.27 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 32.0 2024-09-17 17:01:03,875 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 17:01:22,662 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.2417, 4.3449, 4.0888, 4.4835], device='cuda:0') 2024-09-17 17:01:24,935 INFO [train.py:1230] (0/2) Epoch 35, validation: loss=0.04069, ctc_loss=0.04069, cr_loss=1.323e-14, over 944034.00 frames. 2024-09-17 17:01:24,935 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 17:01:47,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=632677.6666666666, ans=0.2 2024-09-17 17:02:23,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=632762.6666666666, ans=0.125 2024-09-17 17:02:31,055 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.221e+02 2.307e+02 2.507e+02 5.871e+02, threshold=4.614e+02, percent-clipped=1.0 2024-09-17 17:02:39,827 INFO [train.py:1198] (0/2) Epoch 35, batch 6050, loss[loss=0.2238, ctc_loss=0.15, cr_loss=0.3692, over 21056.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3719, over 4086479.44 frames. ], batch size: 56, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:02:43,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=632791.0, ans=0.125 2024-09-17 17:02:58,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=632819.3333333334, ans=0.125 2024-09-17 17:03:04,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-09-17 17:03:33,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=632876.0, ans=0.0 2024-09-17 17:03:54,866 INFO [train.py:1198] (0/2) Epoch 35, batch 6100, loss[loss=0.2126, ctc_loss=0.1424, cr_loss=0.3512, over 20894.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1486, cr_loss=0.3739, over 4079153.35 frames. ], batch size: 54, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:04:31,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=632989.3333333334, ans=0.05 2024-09-17 17:04:41,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=633017.6666666666, ans=0.95 2024-09-17 17:04:43,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=633017.6666666666, ans=0.04949747468305833 2024-09-17 17:04:51,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=633017.6666666666, ans=0.125 2024-09-17 17:05:00,443 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.197e+02 2.290e+02 2.426e+02 3.058e+02, threshold=4.579e+02, percent-clipped=0.0 2024-09-17 17:05:09,293 INFO [train.py:1198] (0/2) Epoch 35, batch 6150, loss[loss=0.2176, ctc_loss=0.1431, cr_loss=0.3725, over 20887.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1483, cr_loss=0.3731, over 4063477.82 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:05:09,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=633074.3333333334, ans=0.125 2024-09-17 17:05:32,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=633102.6666666666, ans=0.125 2024-09-17 17:06:22,251 INFO [train.py:1198] (0/2) Epoch 35, batch 6200, loss[loss=0.236, ctc_loss=0.1592, cr_loss=0.3836, over 20313.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1486, cr_loss=0.3727, over 4042448.84 frames. ], batch size: 74, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:06:25,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=633216.0, ans=0.125 2024-09-17 17:06:45,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=633244.3333333334, ans=0.125 2024-09-17 17:07:28,098 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.164e+02 2.343e+02 2.543e+02 3.749e+02, threshold=4.686e+02, percent-clipped=0.0 2024-09-17 17:07:36,814 INFO [train.py:1198] (0/2) Epoch 35, batch 6250, loss[loss=0.2367, ctc_loss=0.16, cr_loss=0.3831, over 19910.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1488, cr_loss=0.3721, over 4024801.29 frames. ], batch size: 80, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:08:01,184 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:08:03,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=633414.3333333334, ans=0.2 2024-09-17 17:08:17,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-09-17 17:08:32,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=633471.0, ans=0.125 2024-09-17 17:08:32,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=633471.0, ans=0.0 2024-09-17 17:08:47,270 INFO [train.py:1198] (0/2) Epoch 35, batch 6300, loss[loss=0.2768, ctc_loss=0.192, cr_loss=0.4236, over 14251.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.151, cr_loss=0.3739, over 3973871.25 frames. ], batch size: 149, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:09:14,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=633556.0, ans=0.125 2024-09-17 17:09:22,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=633556.0, ans=0.0 2024-09-17 17:09:27,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=12.0 2024-09-17 17:09:31,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=633584.3333333334, ans=0.0 2024-09-17 17:09:37,346 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.86 vs. limit=10.0 2024-09-17 17:09:49,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=633612.6666666666, ans=0.125 2024-09-17 17:09:50,581 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.337e+02 2.653e+02 2.907e+02 7.605e+02, threshold=5.307e+02, percent-clipped=2.0 2024-09-17 17:09:58,974 INFO [train.py:1198] (0/2) Epoch 35, batch 6350, loss[loss=0.2852, ctc_loss=0.2062, cr_loss=0.395, over 14101.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1568, cr_loss=0.3785, over 3765528.59 frames. ], batch size: 149, lr: 2.34e-03, grad_scale: 32.0 2024-09-17 17:10:15,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2024-09-17 17:10:30,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633697.6666666666, ans=0.1 2024-09-17 17:10:39,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=633726.0, ans=0.0 2024-09-17 17:10:45,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=633726.0, ans=0.125 2024-09-17 17:10:56,436 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-35.pt 2024-09-17 17:11:45,697 INFO [train.py:1198] (0/2) Epoch 36, batch 0, loss[loss=0.1975, ctc_loss=0.1305, cr_loss=0.3349, over 20874.00 frames. ], tot_loss[loss=0.1975, ctc_loss=0.1305, cr_loss=0.3349, over 20874.00 frames. ], batch size: 57, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:11:45,698 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 17:11:58,879 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7364, 4.3684, 4.2900, 4.3163], device='cuda:0') 2024-09-17 17:12:04,082 INFO [train.py:1230] (0/2) Epoch 36, validation: loss=0.03966, ctc_loss=0.03966, cr_loss=1.315e-14, over 944034.00 frames. 2024-09-17 17:12:04,082 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 17:12:04,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=633757.1666666666, ans=0.0 2024-09-17 17:12:21,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=633785.5, ans=0.125 2024-09-17 17:12:23,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2024-09-17 17:12:25,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=633785.5, ans=0.2 2024-09-17 17:13:22,592 INFO [train.py:1198] (0/2) Epoch 36, batch 50, loss[loss=0.2477, ctc_loss=0.1751, cr_loss=0.363, over 14190.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.15, cr_loss=0.3748, over 906252.11 frames. ], batch size: 150, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:13:27,250 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.249e+02 2.539e+02 2.784e+02 3.715e+02, threshold=5.077e+02, percent-clipped=0.0 2024-09-17 17:13:40,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.69 vs. limit=10.0 2024-09-17 17:13:45,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=633927.1666666666, ans=0.0 2024-09-17 17:13:48,942 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-09-17 17:13:59,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=633955.5, ans=0.05 2024-09-17 17:14:25,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-17 17:14:28,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=634012.1666666666, ans=0.125 2024-09-17 17:14:41,144 INFO [train.py:1198] (0/2) Epoch 36, batch 100, loss[loss=0.232, ctc_loss=0.1552, cr_loss=0.3842, over 20936.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.147, cr_loss=0.3708, over 1624115.14 frames. ], batch size: 60, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:15:04,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=634068.8333333334, ans=0.125 2024-09-17 17:15:19,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=634097.1666666666, ans=0.125 2024-09-17 17:15:23,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=634097.1666666666, ans=0.2 2024-09-17 17:15:37,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=15.0 2024-09-17 17:15:57,388 INFO [train.py:1198] (0/2) Epoch 36, batch 150, loss[loss=0.2223, ctc_loss=0.1503, cr_loss=0.3597, over 20283.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1455, cr_loss=0.3681, over 2171417.52 frames. ], batch size: 74, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:16:01,793 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.167e+02 2.323e+02 2.461e+02 3.967e+02, threshold=4.646e+02, percent-clipped=0.0 2024-09-17 17:16:08,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634182.1666666666, ans=0.1 2024-09-17 17:16:20,291 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-09-17 17:16:28,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=634238.8333333334, ans=0.0 2024-09-17 17:16:42,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=634267.1666666666, ans=0.2 2024-09-17 17:17:02,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-17 17:17:04,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=634295.5, ans=0.025 2024-09-17 17:17:12,036 INFO [train.py:1198] (0/2) Epoch 36, batch 200, loss[loss=0.2211, ctc_loss=0.1466, cr_loss=0.3726, over 20957.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1465, cr_loss=0.3692, over 2596062.57 frames. ], batch size: 64, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:18:27,611 INFO [train.py:1198] (0/2) Epoch 36, batch 250, loss[loss=0.2654, ctc_loss=0.1813, cr_loss=0.4207, over 20961.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.148, cr_loss=0.3723, over 2926233.41 frames. ], batch size: 67, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:18:32,107 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.213e+02 2.286e+02 2.476e+02 3.170e+02, threshold=4.571e+02, percent-clipped=0.0 2024-09-17 17:19:03,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.88 vs. limit=10.0 2024-09-17 17:19:50,550 INFO [train.py:1198] (0/2) Epoch 36, batch 300, loss[loss=0.2323, ctc_loss=0.1545, cr_loss=0.3889, over 20672.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1483, cr_loss=0.3732, over 3192860.88 frames. ], batch size: 66, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:19:52,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=634607.1666666666, ans=0.125 2024-09-17 17:19:53,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=634607.1666666666, ans=0.125 2024-09-17 17:20:20,897 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-224000.pt 2024-09-17 17:20:33,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-17 17:20:40,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634692.1666666666, ans=0.1 2024-09-17 17:21:06,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=634748.8333333334, ans=0.125 2024-09-17 17:21:07,288 INFO [train.py:1198] (0/2) Epoch 36, batch 350, loss[loss=0.2242, ctc_loss=0.1507, cr_loss=0.3677, over 20974.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3711, over 3392290.35 frames. ], batch size: 55, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:21:11,717 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.178e+02 2.288e+02 2.402e+02 5.671e+02, threshold=4.576e+02, percent-clipped=2.0 2024-09-17 17:22:23,216 INFO [train.py:1198] (0/2) Epoch 36, batch 400, loss[loss=0.2494, ctc_loss=0.1691, cr_loss=0.4012, over 20834.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1478, cr_loss=0.3715, over 3532299.82 frames. ], batch size: 59, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:22:23,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=634890.5, ans=0.0 2024-09-17 17:22:23,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=634890.5, ans=0.0 2024-09-17 17:22:46,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=634918.8333333334, ans=0.125 2024-09-17 17:23:25,244 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.15 vs. limit=6.0 2024-09-17 17:23:33,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=635003.8333333334, ans=0.0 2024-09-17 17:23:38,078 INFO [train.py:1198] (0/2) Epoch 36, batch 450, loss[loss=0.2443, ctc_loss=0.1647, cr_loss=0.3984, over 19517.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1484, cr_loss=0.3725, over 3639964.37 frames. ], batch size: 90, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:23:42,567 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.198e+02 2.332e+02 2.435e+02 3.393e+02, threshold=4.663e+02, percent-clipped=0.0 2024-09-17 17:23:46,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.90 vs. limit=10.0 2024-09-17 17:23:59,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=22.5 2024-09-17 17:24:16,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=635088.8333333334, ans=0.0 2024-09-17 17:24:16,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635088.8333333334, ans=0.125 2024-09-17 17:24:19,499 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:24:33,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=635117.1666666666, ans=0.125 2024-09-17 17:24:57,175 INFO [train.py:1198] (0/2) Epoch 36, batch 500, loss[loss=0.198, ctc_loss=0.1291, cr_loss=0.3441, over 20966.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1481, cr_loss=0.3724, over 3740039.67 frames. ], batch size: 49, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:25:04,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635173.8333333334, ans=0.1 2024-09-17 17:25:12,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=635173.8333333334, ans=0.125 2024-09-17 17:25:32,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2024-09-17 17:25:47,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635258.8333333334, ans=0.125 2024-09-17 17:26:13,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635287.1666666666, ans=0.125 2024-09-17 17:26:16,179 INFO [train.py:1198] (0/2) Epoch 36, batch 550, loss[loss=0.2447, ctc_loss=0.1619, cr_loss=0.4139, over 20959.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1474, cr_loss=0.3712, over 3815937.60 frames. ], batch size: 64, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:26:20,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.232e+02 2.420e+02 2.696e+02 5.768e+02, threshold=4.841e+02, percent-clipped=1.0 2024-09-17 17:26:27,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=635315.5, ans=0.125 2024-09-17 17:26:30,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=635343.8333333334, ans=0.125 2024-09-17 17:26:39,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=635343.8333333334, ans=0.125 2024-09-17 17:26:58,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635372.1666666666, ans=0.1 2024-09-17 17:27:26,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635428.8333333334, ans=0.125 2024-09-17 17:27:31,070 INFO [train.py:1198] (0/2) Epoch 36, batch 600, loss[loss=0.1791, ctc_loss=0.1168, cr_loss=0.3112, over 20972.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3718, over 3867282.46 frames. ], batch size: 49, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:28:27,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=635542.1666666666, ans=0.125 2024-09-17 17:28:31,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=635570.5, ans=0.125 2024-09-17 17:28:37,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635570.5, ans=0.1 2024-09-17 17:28:46,639 INFO [train.py:1198] (0/2) Epoch 36, batch 650, loss[loss=0.2094, ctc_loss=0.137, cr_loss=0.362, over 20966.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1473, cr_loss=0.3709, over 3924834.24 frames. ], batch size: 50, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:28:51,196 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.204e+02 2.299e+02 2.418e+02 3.202e+02, threshold=4.598e+02, percent-clipped=0.0 2024-09-17 17:29:13,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635627.1666666666, ans=0.1 2024-09-17 17:29:16,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=22.5 2024-09-17 17:29:29,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=635655.5, ans=0.125 2024-09-17 17:29:32,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635683.8333333334, ans=0.1 2024-09-17 17:29:38,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=635683.8333333334, ans=0.0 2024-09-17 17:29:44,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635683.8333333334, ans=0.1 2024-09-17 17:30:02,065 INFO [train.py:1198] (0/2) Epoch 36, batch 700, loss[loss=0.1886, ctc_loss=0.1225, cr_loss=0.3309, over 20957.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1473, cr_loss=0.371, over 3964428.11 frames. ], batch size: 48, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:30:06,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=635740.5, ans=0.125 2024-09-17 17:30:14,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635740.5, ans=0.1 2024-09-17 17:30:58,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=635825.5, ans=0.0 2024-09-17 17:31:24,108 INFO [train.py:1198] (0/2) Epoch 36, batch 750, loss[loss=0.2447, ctc_loss=0.1629, cr_loss=0.4089, over 20938.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1477, cr_loss=0.3715, over 3989732.42 frames. ], batch size: 60, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:31:28,467 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.957e+02 2.158e+02 2.270e+02 2.492e+02 3.770e+02, threshold=4.540e+02, percent-clipped=0.0 2024-09-17 17:31:37,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=635910.5, ans=0.125 2024-09-17 17:31:48,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=635910.5, ans=0.125 2024-09-17 17:32:39,659 INFO [train.py:1198] (0/2) Epoch 36, batch 800, loss[loss=0.2234, ctc_loss=0.1488, cr_loss=0.3733, over 20864.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1479, cr_loss=0.372, over 4003197.86 frames. ], batch size: 65, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:32:40,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=636023.8333333334, ans=0.0 2024-09-17 17:32:53,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=636052.1666666666, ans=0.125 2024-09-17 17:33:01,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-17 17:33:43,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=636137.1666666666, ans=0.2 2024-09-17 17:33:46,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=636137.1666666666, ans=0.025 2024-09-17 17:33:55,073 INFO [train.py:1198] (0/2) Epoch 36, batch 850, loss[loss=0.2588, ctc_loss=0.1708, cr_loss=0.44, over 20662.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1478, cr_loss=0.3723, over 4023635.42 frames. ], batch size: 71, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:33:59,514 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.282e+02 2.370e+02 2.547e+02 3.476e+02, threshold=4.740e+02, percent-clipped=0.0 2024-09-17 17:34:01,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=636165.5, ans=0.0 2024-09-17 17:34:04,405 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:34:10,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636193.8333333334, ans=0.1 2024-09-17 17:34:19,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=636193.8333333334, ans=0.125 2024-09-17 17:34:22,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=12.0 2024-09-17 17:34:26,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=636222.1666666666, ans=0.125 2024-09-17 17:35:09,564 INFO [train.py:1198] (0/2) Epoch 36, batch 900, loss[loss=0.2373, ctc_loss=0.1583, cr_loss=0.395, over 20650.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3731, over 4033943.31 frames. ], batch size: 66, lr: 2.31e-03, grad_scale: 32.0 2024-09-17 17:35:41,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=636363.8333333334, ans=0.125 2024-09-17 17:36:14,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=636420.5, ans=0.025 2024-09-17 17:36:31,062 INFO [train.py:1198] (0/2) Epoch 36, batch 950, loss[loss=0.2321, ctc_loss=0.1558, cr_loss=0.3812, over 19493.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3724, over 4053507.95 frames. ], batch size: 90, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:36:35,611 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.181e+02 2.297e+02 2.436e+02 3.143e+02, threshold=4.594e+02, percent-clipped=0.0 2024-09-17 17:36:59,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=636505.5, ans=0.015 2024-09-17 17:37:02,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636505.5, ans=0.1 2024-09-17 17:37:08,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636505.5, ans=0.1 2024-09-17 17:37:33,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=636562.1666666666, ans=0.0 2024-09-17 17:37:46,168 INFO [train.py:1198] (0/2) Epoch 36, batch 1000, loss[loss=0.1881, ctc_loss=0.1236, cr_loss=0.3227, over 20992.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1476, cr_loss=0.373, over 4072309.23 frames. ], batch size: 48, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:37:51,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=636590.5, ans=0.125 2024-09-17 17:37:59,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=636618.8333333334, ans=0.2 2024-09-17 17:38:08,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636618.8333333334, ans=0.1 2024-09-17 17:38:17,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=636647.1666666666, ans=0.125 2024-09-17 17:38:31,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=636675.5, ans=0.0 2024-09-17 17:38:53,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636703.8333333334, ans=0.1 2024-09-17 17:38:57,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=636703.8333333334, ans=0.0 2024-09-17 17:39:02,026 INFO [train.py:1198] (0/2) Epoch 36, batch 1050, loss[loss=0.2168, ctc_loss=0.1415, cr_loss=0.3764, over 20828.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3724, over 4084134.91 frames. ], batch size: 59, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:39:06,559 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.166e+02 2.296e+02 2.442e+02 3.647e+02, threshold=4.592e+02, percent-clipped=0.0 2024-09-17 17:39:48,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=636817.1666666666, ans=0.5 2024-09-17 17:40:04,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=22.5 2024-09-17 17:40:17,844 INFO [train.py:1198] (0/2) Epoch 36, batch 1100, loss[loss=0.1957, ctc_loss=0.1284, cr_loss=0.3366, over 20990.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1477, cr_loss=0.3733, over 4089858.49 frames. ], batch size: 51, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:40:49,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=636930.5, ans=0.125 2024-09-17 17:41:05,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=636958.8333333334, ans=0.05 2024-09-17 17:41:14,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636958.8333333334, ans=0.1 2024-09-17 17:41:20,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-17 17:41:36,397 INFO [train.py:1198] (0/2) Epoch 36, batch 1150, loss[loss=0.1809, ctc_loss=0.1162, cr_loss=0.3236, over 20943.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.147, cr_loss=0.3722, over 4093491.52 frames. ], batch size: 50, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:41:41,001 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.186e+02 2.348e+02 2.544e+02 5.644e+02, threshold=4.697e+02, percent-clipped=1.0 2024-09-17 17:41:59,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=637043.8333333334, ans=0.0 2024-09-17 17:42:19,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=637072.1666666666, ans=0.0 2024-09-17 17:42:52,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2024-09-17 17:42:54,644 INFO [train.py:1198] (0/2) Epoch 36, batch 1200, loss[loss=0.1809, ctc_loss=0.1173, cr_loss=0.318, over 20979.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1469, cr_loss=0.3716, over 4086844.49 frames. ], batch size: 51, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:43:15,018 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.11 vs. limit=10.0 2024-09-17 17:43:26,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=637213.8333333334, ans=0.125 2024-09-17 17:43:59,366 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:44:10,023 INFO [train.py:1198] (0/2) Epoch 36, batch 1250, loss[loss=0.2422, ctc_loss=0.1613, cr_loss=0.4047, over 21065.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1465, cr_loss=0.3715, over 4095558.36 frames. ], batch size: 59, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:44:13,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=637298.8333333334, ans=0.025 2024-09-17 17:44:15,879 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.209e+02 2.351e+02 2.569e+02 5.360e+02, threshold=4.702e+02, percent-clipped=1.0 2024-09-17 17:45:04,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=637383.8333333334, ans=0.025 2024-09-17 17:45:10,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=637412.1666666666, ans=0.0 2024-09-17 17:45:25,423 INFO [train.py:1198] (0/2) Epoch 36, batch 1300, loss[loss=0.2471, ctc_loss=0.1643, cr_loss=0.4143, over 21062.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3726, over 4090733.72 frames. ], batch size: 56, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:45:46,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=637468.8333333334, ans=0.125 2024-09-17 17:46:33,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=637553.8333333334, ans=0.04949747468305833 2024-09-17 17:46:37,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=637553.8333333334, ans=0.1 2024-09-17 17:46:38,243 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.41 vs. limit=22.5 2024-09-17 17:46:40,604 INFO [train.py:1198] (0/2) Epoch 36, batch 1350, loss[loss=0.2062, ctc_loss=0.1355, cr_loss=0.3533, over 20940.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3715, over 4097119.23 frames. ], batch size: 60, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:46:46,587 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.218e+02 2.362e+02 2.528e+02 8.573e+02, threshold=4.725e+02, percent-clipped=2.0 2024-09-17 17:46:57,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=637610.5, ans=0.2 2024-09-17 17:47:05,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-09-17 17:48:02,120 INFO [train.py:1198] (0/2) Epoch 36, batch 1400, loss[loss=0.2433, ctc_loss=0.1626, cr_loss=0.4036, over 20926.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1471, cr_loss=0.3722, over 4106709.20 frames. ], batch size: 64, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:48:23,626 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:48:52,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=637808.8333333334, ans=0.125 2024-09-17 17:48:56,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=637808.8333333334, ans=0.0 2024-09-17 17:48:57,504 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2024-09-17 17:48:58,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=637808.8333333334, ans=0.025 2024-09-17 17:49:18,008 INFO [train.py:1198] (0/2) Epoch 36, batch 1450, loss[loss=0.2121, ctc_loss=0.1405, cr_loss=0.358, over 20980.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1467, cr_loss=0.3715, over 4092373.95 frames. ], batch size: 55, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:49:23,986 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.197e+02 2.319e+02 2.514e+02 4.162e+02, threshold=4.638e+02, percent-clipped=0.0 2024-09-17 17:49:45,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=637893.8333333334, ans=0.0 2024-09-17 17:49:59,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-09-17 17:50:00,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=637922.1666666666, ans=0.0 2024-09-17 17:50:27,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=637978.8333333334, ans=0.0 2024-09-17 17:50:33,609 INFO [train.py:1198] (0/2) Epoch 36, batch 1500, loss[loss=0.2219, ctc_loss=0.1467, cr_loss=0.3759, over 21048.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1473, cr_loss=0.3734, over 4102819.42 frames. ], batch size: 56, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:51:11,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=638063.8333333334, ans=0.125 2024-09-17 17:51:18,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=638092.1666666666, ans=0.04949747468305833 2024-09-17 17:51:21,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=638092.1666666666, ans=0.0 2024-09-17 17:51:30,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638092.1666666666, ans=0.1 2024-09-17 17:51:49,362 INFO [train.py:1198] (0/2) Epoch 36, batch 1550, loss[loss=0.2345, ctc_loss=0.1562, cr_loss=0.3914, over 20693.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1478, cr_loss=0.3736, over 4105658.59 frames. ], batch size: 71, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:51:54,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=638148.8333333334, ans=0.125 2024-09-17 17:51:55,321 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.219e+02 2.367e+02 2.541e+02 4.127e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-17 17:52:29,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.06 vs. limit=22.5 2024-09-17 17:52:35,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=638233.8333333334, ans=0.125 2024-09-17 17:53:08,189 INFO [train.py:1198] (0/2) Epoch 36, batch 1600, loss[loss=0.1948, ctc_loss=0.1265, cr_loss=0.3414, over 20944.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3728, over 4103499.10 frames. ], batch size: 50, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 17:53:19,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=638290.5, ans=0.0 2024-09-17 17:53:19,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=638290.5, ans=0.125 2024-09-17 17:53:58,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638375.5, ans=0.1 2024-09-17 17:54:26,794 INFO [train.py:1198] (0/2) Epoch 36, batch 1650, loss[loss=0.2213, ctc_loss=0.1464, cr_loss=0.3743, over 20638.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3714, over 4088172.28 frames. ], batch size: 66, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:54:34,387 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.176e+02 2.298e+02 2.478e+02 3.076e+02, threshold=4.596e+02, percent-clipped=0.0 2024-09-17 17:54:46,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=638460.5, ans=0.125 2024-09-17 17:55:03,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=638488.8333333334, ans=0.125 2024-09-17 17:55:32,219 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:55:42,278 INFO [train.py:1198] (0/2) Epoch 36, batch 1700, loss[loss=0.2069, ctc_loss=0.1353, cr_loss=0.3583, over 20963.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1468, cr_loss=0.371, over 4094495.99 frames. ], batch size: 55, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:55:44,109 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:55:44,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=638573.8333333334, ans=0.025 2024-09-17 17:56:33,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638658.8333333334, ans=0.1 2024-09-17 17:56:35,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=638658.8333333334, ans=0.125 2024-09-17 17:56:57,791 INFO [train.py:1198] (0/2) Epoch 36, batch 1750, loss[loss=0.2124, ctc_loss=0.1407, cr_loss=0.3585, over 20825.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1463, cr_loss=0.3703, over 4099135.80 frames. ], batch size: 59, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:57:04,996 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.182e+02 2.352e+02 2.498e+02 4.393e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-17 17:57:08,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=638715.5, ans=0.125 2024-09-17 17:57:28,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=638772.1666666666, ans=0.0 2024-09-17 17:57:36,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=638772.1666666666, ans=0.025 2024-09-17 17:58:04,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=638828.8333333334, ans=0.0 2024-09-17 17:58:12,673 INFO [train.py:1198] (0/2) Epoch 36, batch 1800, loss[loss=0.2349, ctc_loss=0.1549, cr_loss=0.3999, over 20706.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1465, cr_loss=0.3707, over 4104880.14 frames. ], batch size: 71, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:58:19,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638857.1666666666, ans=0.1 2024-09-17 17:58:27,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=22.5 2024-09-17 17:58:44,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=638913.8333333334, ans=0.04949747468305833 2024-09-17 17:58:47,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=638913.8333333334, ans=0.125 2024-09-17 17:59:06,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=638942.1666666666, ans=0.025 2024-09-17 17:59:22,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=638970.5, ans=0.125 2024-09-17 17:59:33,943 INFO [train.py:1198] (0/2) Epoch 36, batch 1850, loss[loss=0.2345, ctc_loss=0.157, cr_loss=0.3871, over 21042.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1463, cr_loss=0.3696, over 4106030.39 frames. ], batch size: 62, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 17:59:34,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=638998.8333333334, ans=0.125 2024-09-17 17:59:41,543 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.201e+02 2.316e+02 2.445e+02 2.993e+02, threshold=4.631e+02, percent-clipped=0.0 2024-09-17 18:00:19,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639083.8333333334, ans=0.1 2024-09-17 18:00:20,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-09-17 18:00:27,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=639083.8333333334, ans=0.0 2024-09-17 18:00:28,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639083.8333333334, ans=0.1 2024-09-17 18:00:49,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.36 vs. limit=12.0 2024-09-17 18:00:49,620 INFO [train.py:1198] (0/2) Epoch 36, batch 1900, loss[loss=0.2381, ctc_loss=0.1621, cr_loss=0.3799, over 19502.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1463, cr_loss=0.37, over 4104958.10 frames. ], batch size: 90, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 18:00:51,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=639140.5, ans=0.025 2024-09-17 18:00:54,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=639140.5, ans=0.125 2024-09-17 18:01:02,263 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-09-17 18:01:23,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=639197.1666666666, ans=10.0 2024-09-17 18:01:24,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=8.0 2024-09-17 18:01:27,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639197.1666666666, ans=0.1 2024-09-17 18:01:56,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=639253.8333333334, ans=0.2 2024-09-17 18:02:05,116 INFO [train.py:1198] (0/2) Epoch 36, batch 1950, loss[loss=0.243, ctc_loss=0.1635, cr_loss=0.3975, over 20831.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1466, cr_loss=0.3706, over 4103016.42 frames. ], batch size: 65, lr: 2.30e-03, grad_scale: 16.0 2024-09-17 18:02:12,515 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.217e+02 2.351e+02 2.530e+02 3.464e+02, threshold=4.702e+02, percent-clipped=0.0 2024-09-17 18:02:35,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=639338.8333333334, ans=0.5 2024-09-17 18:02:39,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=639338.8333333334, ans=0.025 2024-09-17 18:03:11,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=639395.5, ans=0.125 2024-09-17 18:03:20,126 INFO [train.py:1198] (0/2) Epoch 36, batch 2000, loss[loss=0.2614, ctc_loss=0.1786, cr_loss=0.4136, over 19464.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1482, cr_loss=0.373, over 4100893.96 frames. ], batch size: 90, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:03:44,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=639452.1666666666, ans=0.2 2024-09-17 18:03:50,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-17 18:04:17,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=639508.8333333334, ans=0.0 2024-09-17 18:04:38,331 INFO [train.py:1198] (0/2) Epoch 36, batch 2050, loss[loss=0.2646, ctc_loss=0.184, cr_loss=0.4029, over 14083.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1485, cr_loss=0.3732, over 4093506.40 frames. ], batch size: 149, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:04:41,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=639565.5, ans=0.0 2024-09-17 18:04:43,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639565.5, ans=0.1 2024-09-17 18:04:45,283 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=12.0 2024-09-17 18:04:45,951 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.231e+02 2.362e+02 2.541e+02 3.290e+02, threshold=4.725e+02, percent-clipped=0.0 2024-09-17 18:04:50,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=639565.5, ans=0.2 2024-09-17 18:04:51,681 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.49 vs. limit=5.0 2024-09-17 18:05:26,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=639650.5, ans=0.0 2024-09-17 18:05:28,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=639650.5, ans=0.2 2024-09-17 18:05:56,842 INFO [train.py:1198] (0/2) Epoch 36, batch 2100, loss[loss=0.2015, ctc_loss=0.1311, cr_loss=0.3522, over 20293.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.148, cr_loss=0.3719, over 4086006.91 frames. ], batch size: 45, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:06:23,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639735.5, ans=0.1 2024-09-17 18:06:26,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=639763.8333333334, ans=0.125 2024-09-17 18:06:30,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639763.8333333334, ans=0.1 2024-09-17 18:07:07,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=639820.5, ans=0.125 2024-09-17 18:07:13,027 INFO [train.py:1198] (0/2) Epoch 36, batch 2150, loss[loss=0.1735, ctc_loss=0.1116, cr_loss=0.3095, over 21057.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1473, cr_loss=0.3708, over 4082380.51 frames. ], batch size: 53, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:07:20,663 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.192e+02 2.324e+02 2.492e+02 3.311e+02, threshold=4.648e+02, percent-clipped=0.0 2024-09-17 18:07:25,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=639848.8333333334, ans=0.2 2024-09-17 18:07:25,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=639848.8333333334, ans=0.2 2024-09-17 18:07:25,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=639848.8333333334, ans=0.125 2024-09-17 18:07:31,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639877.1666666666, ans=0.1 2024-09-17 18:07:36,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639877.1666666666, ans=0.1 2024-09-17 18:08:25,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639962.1666666666, ans=0.1 2024-09-17 18:08:28,382 INFO [train.py:1198] (0/2) Epoch 36, batch 2200, loss[loss=0.2197, ctc_loss=0.149, cr_loss=0.3535, over 18274.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1467, cr_loss=0.3699, over 4086418.95 frames. ], batch size: 108, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:09:45,558 INFO [train.py:1198] (0/2) Epoch 36, batch 2250, loss[loss=0.2363, ctc_loss=0.157, cr_loss=0.3965, over 20830.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1473, cr_loss=0.3717, over 4091170.30 frames. ], batch size: 59, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:09:47,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=640132.1666666666, ans=0.125 2024-09-17 18:09:53,253 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.185e+02 2.390e+02 2.556e+02 6.882e+02, threshold=4.780e+02, percent-clipped=1.0 2024-09-17 18:10:26,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=640188.8333333334, ans=0.07 2024-09-17 18:11:04,370 INFO [train.py:1198] (0/2) Epoch 36, batch 2300, loss[loss=0.2165, ctc_loss=0.1426, cr_loss=0.3696, over 20972.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3719, over 4085821.27 frames. ], batch size: 58, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:12:20,034 INFO [train.py:1198] (0/2) Epoch 36, batch 2350, loss[loss=0.2074, ctc_loss=0.1353, cr_loss=0.3604, over 20951.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3716, over 4086544.69 frames. ], batch size: 51, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:12:27,634 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.254e+02 2.364e+02 2.535e+02 4.607e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-17 18:12:28,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=640415.5, ans=0.125 2024-09-17 18:12:35,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=640443.8333333334, ans=0.2 2024-09-17 18:12:38,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=640443.8333333334, ans=0.2 2024-09-17 18:12:41,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=640443.8333333334, ans=0.125 2024-09-17 18:12:49,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=640472.1666666666, ans=0.0 2024-09-17 18:13:35,675 INFO [train.py:1198] (0/2) Epoch 36, batch 2400, loss[loss=0.2422, ctc_loss=0.1635, cr_loss=0.3936, over 20626.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1483, cr_loss=0.3724, over 4089125.60 frames. ], batch size: 66, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:13:48,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=640557.1666666666, ans=0.0 2024-09-17 18:14:18,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=640613.8333333334, ans=0.0 2024-09-17 18:14:30,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=640642.1666666666, ans=0.0 2024-09-17 18:14:32,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-17 18:14:36,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=640670.5, ans=0.125 2024-09-17 18:14:51,160 INFO [train.py:1198] (0/2) Epoch 36, batch 2450, loss[loss=0.2507, ctc_loss=0.1669, cr_loss=0.419, over 20639.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3712, over 4098004.20 frames. ], batch size: 66, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:14:58,573 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.183e+02 2.280e+02 2.456e+02 3.188e+02, threshold=4.561e+02, percent-clipped=0.0 2024-09-17 18:15:01,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=640698.8333333334, ans=0.2 2024-09-17 18:15:17,568 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=22.5 2024-09-17 18:15:30,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=640755.5, ans=0.1 2024-09-17 18:15:31,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.62 vs. limit=15.0 2024-09-17 18:15:34,203 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2024-09-17 18:16:09,482 INFO [train.py:1198] (0/2) Epoch 36, batch 2500, loss[loss=0.2162, ctc_loss=0.1442, cr_loss=0.36, over 20689.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3718, over 4090986.63 frames. ], batch size: 68, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:16:13,142 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=12.0 2024-09-17 18:16:18,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=640840.5, ans=0.0 2024-09-17 18:16:40,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-09-17 18:16:51,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=640897.1666666666, ans=0.125 2024-09-17 18:17:15,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=640953.8333333334, ans=0.025 2024-09-17 18:17:19,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=640953.8333333334, ans=0.95 2024-09-17 18:17:28,207 INFO [train.py:1198] (0/2) Epoch 36, batch 2550, loss[loss=0.2292, ctc_loss=0.1507, cr_loss=0.3924, over 21082.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1483, cr_loss=0.3726, over 4080919.69 frames. ], batch size: 59, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:17:35,838 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.184e+02 2.392e+02 2.564e+02 5.258e+02, threshold=4.785e+02, percent-clipped=1.0 2024-09-17 18:17:39,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=640982.1666666666, ans=0.2 2024-09-17 18:18:40,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641095.5, ans=0.1 2024-09-17 18:18:43,483 INFO [train.py:1198] (0/2) Epoch 36, batch 2600, loss[loss=0.2163, ctc_loss=0.1432, cr_loss=0.3655, over 21062.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3735, over 4079458.41 frames. ], batch size: 59, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:18:47,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-17 18:18:48,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=641123.8333333334, ans=0.125 2024-09-17 18:18:52,336 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.04 vs. limit=10.0 2024-09-17 18:19:11,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=641152.1666666666, ans=0.125 2024-09-17 18:19:15,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=641180.5, ans=0.0 2024-09-17 18:19:17,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=641180.5, ans=0.125 2024-09-17 18:19:38,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=641208.8333333334, ans=0.95 2024-09-17 18:19:50,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2024-09-17 18:19:59,241 INFO [train.py:1198] (0/2) Epoch 36, batch 2650, loss[loss=0.2227, ctc_loss=0.1465, cr_loss=0.3808, over 20889.00 frames. ], tot_loss[loss=0.2243, ctc_loss=0.1494, cr_loss=0.3748, over 4085836.09 frames. ], batch size: 54, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:20:06,806 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.150e+02 2.308e+02 2.486e+02 2.992e+02, threshold=4.617e+02, percent-clipped=0.0 2024-09-17 18:20:46,675 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:20:49,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=641350.5, ans=0.2 2024-09-17 18:20:52,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=641350.5, ans=0.125 2024-09-17 18:21:04,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=641378.8333333334, ans=0.0 2024-09-17 18:21:18,082 INFO [train.py:1198] (0/2) Epoch 36, batch 2700, loss[loss=0.2204, ctc_loss=0.1464, cr_loss=0.3702, over 20672.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1492, cr_loss=0.3746, over 4084758.98 frames. ], batch size: 68, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:21:22,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=641407.1666666666, ans=0.025 2024-09-17 18:21:25,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.66 vs. limit=22.5 2024-09-17 18:21:37,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-09-17 18:21:54,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=641463.8333333334, ans=0.125 2024-09-17 18:22:07,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=641492.1666666666, ans=0.0 2024-09-17 18:22:21,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=641520.5, ans=0.0 2024-09-17 18:22:26,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=641520.5, ans=0.0 2024-09-17 18:22:36,195 INFO [train.py:1198] (0/2) Epoch 36, batch 2750, loss[loss=0.2379, ctc_loss=0.1588, cr_loss=0.3954, over 20670.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1491, cr_loss=0.3753, over 4082999.23 frames. ], batch size: 71, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:22:38,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=641548.8333333334, ans=0.0 2024-09-17 18:22:43,627 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.214e+02 2.373e+02 2.514e+02 4.691e+02, threshold=4.747e+02, percent-clipped=1.0 2024-09-17 18:23:02,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-09-17 18:23:35,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=641662.1666666666, ans=0.125 2024-09-17 18:23:46,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=641662.1666666666, ans=0.125 2024-09-17 18:23:51,122 INFO [train.py:1198] (0/2) Epoch 36, batch 2800, loss[loss=0.2023, ctc_loss=0.1332, cr_loss=0.3453, over 20988.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1483, cr_loss=0.3733, over 4084588.98 frames. ], batch size: 52, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:24:21,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=641747.1666666666, ans=0.015 2024-09-17 18:24:30,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=641747.1666666666, ans=0.125 2024-09-17 18:24:38,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=12.0 2024-09-17 18:25:05,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=641832.1666666666, ans=0.05 2024-09-17 18:25:06,330 INFO [train.py:1198] (0/2) Epoch 36, batch 2850, loss[loss=0.2632, ctc_loss=0.1859, cr_loss=0.3861, over 14250.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1487, cr_loss=0.3733, over 4065643.62 frames. ], batch size: 149, lr: 2.30e-03, grad_scale: 32.0 2024-09-17 18:25:14,091 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.193e+02 2.376e+02 2.552e+02 4.510e+02, threshold=4.752e+02, percent-clipped=0.0 2024-09-17 18:25:49,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=641888.8333333334, ans=0.0 2024-09-17 18:26:00,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.72 vs. limit=10.0 2024-09-17 18:26:09,542 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-09-17 18:26:12,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=641945.5, ans=0.0 2024-09-17 18:26:22,568 INFO [train.py:1198] (0/2) Epoch 36, batch 2900, loss[loss=0.1933, ctc_loss=0.126, cr_loss=0.3363, over 19883.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1494, cr_loss=0.3751, over 4066227.58 frames. ], batch size: 44, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:26:24,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=641973.8333333334, ans=0.125 2024-09-17 18:26:59,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=642030.5, ans=0.125 2024-09-17 18:27:09,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=642058.8333333334, ans=0.125 2024-09-17 18:27:31,735 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=15.0 2024-09-17 18:27:32,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=642087.1666666666, ans=0.125 2024-09-17 18:27:41,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=642087.1666666666, ans=0.0 2024-09-17 18:27:44,324 INFO [train.py:1198] (0/2) Epoch 36, batch 2950, loss[loss=0.2038, ctc_loss=0.1375, cr_loss=0.3314, over 21090.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1482, cr_loss=0.3736, over 4079353.73 frames. ], batch size: 53, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:27:51,747 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.169e+02 2.279e+02 2.477e+02 2.946e+02, threshold=4.558e+02, percent-clipped=0.0 2024-09-17 18:28:13,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=642172.1666666666, ans=0.0 2024-09-17 18:28:59,419 INFO [train.py:1198] (0/2) Epoch 36, batch 3000, loss[loss=0.23, ctc_loss=0.1551, cr_loss=0.3743, over 20833.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1473, cr_loss=0.3719, over 4093904.09 frames. ], batch size: 59, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:28:59,420 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 18:29:19,010 INFO [train.py:1230] (0/2) Epoch 36, validation: loss=0.03997, ctc_loss=0.03997, cr_loss=1.39e-14, over 944034.00 frames. 2024-09-17 18:29:19,010 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 18:29:33,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=642285.5, ans=0.125 2024-09-17 18:30:16,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=642342.1666666666, ans=0.125 2024-09-17 18:30:35,383 INFO [train.py:1198] (0/2) Epoch 36, batch 3050, loss[loss=0.1794, ctc_loss=0.1152, cr_loss=0.3212, over 20969.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1462, cr_loss=0.3702, over 4098594.48 frames. ], batch size: 52, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:30:40,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=642398.8333333334, ans=0.0 2024-09-17 18:30:42,934 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.224e+02 2.334e+02 2.457e+02 5.651e+02, threshold=4.668e+02, percent-clipped=1.0 2024-09-17 18:31:20,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=642483.8333333334, ans=0.125 2024-09-17 18:31:28,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=642483.8333333334, ans=0.2 2024-09-17 18:31:50,894 INFO [train.py:1198] (0/2) Epoch 36, batch 3100, loss[loss=0.219, ctc_loss=0.1436, cr_loss=0.3769, over 20963.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3716, over 4095974.70 frames. ], batch size: 55, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:31:58,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=642540.5, ans=0.125 2024-09-17 18:32:50,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642625.5, ans=0.1 2024-09-17 18:32:51,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2024-09-17 18:33:01,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=642653.8333333334, ans=0.0 2024-09-17 18:33:12,834 INFO [train.py:1198] (0/2) Epoch 36, batch 3150, loss[loss=0.2295, ctc_loss=0.1524, cr_loss=0.3853, over 20898.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.372, over 4090942.34 frames. ], batch size: 54, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:33:20,430 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.192e+02 2.310e+02 2.473e+02 3.052e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-17 18:33:37,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=642710.5, ans=0.07 2024-09-17 18:33:43,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642738.8333333334, ans=0.1 2024-09-17 18:33:48,125 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-17 18:34:05,197 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.68 vs. limit=12.0 2024-09-17 18:34:28,460 INFO [train.py:1198] (0/2) Epoch 36, batch 3200, loss[loss=0.2105, ctc_loss=0.1402, cr_loss=0.3514, over 21066.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1472, cr_loss=0.3714, over 4086649.25 frames. ], batch size: 56, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:34:36,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=642823.8333333334, ans=0.2 2024-09-17 18:34:44,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-09-17 18:35:08,230 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-09-17 18:35:24,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642908.8333333334, ans=0.1 2024-09-17 18:35:44,099 INFO [train.py:1198] (0/2) Epoch 36, batch 3250, loss[loss=0.2108, ctc_loss=0.1343, cr_loss=0.3826, over 20891.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3715, over 4086650.24 frames. ], batch size: 54, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:35:50,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=642965.5, ans=0.2 2024-09-17 18:35:51,802 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.200e+02 2.311e+02 2.459e+02 3.830e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-17 18:36:11,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=642993.8333333334, ans=0.125 2024-09-17 18:36:11,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=642993.8333333334, ans=0.05 2024-09-17 18:36:25,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=643022.1666666666, ans=0.125 2024-09-17 18:36:28,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2024-09-17 18:36:30,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=643050.5, ans=0.1 2024-09-17 18:36:45,298 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-17 18:36:59,134 INFO [train.py:1198] (0/2) Epoch 36, batch 3300, loss[loss=0.2033, ctc_loss=0.135, cr_loss=0.3417, over 20896.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1464, cr_loss=0.3704, over 4088296.56 frames. ], batch size: 57, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:37:41,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=643163.8333333334, ans=0.125 2024-09-17 18:38:11,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643220.5, ans=0.1 2024-09-17 18:38:18,675 INFO [train.py:1198] (0/2) Epoch 36, batch 3350, loss[loss=0.2246, ctc_loss=0.1468, cr_loss=0.3892, over 21033.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.146, cr_loss=0.3701, over 4096823.15 frames. ], batch size: 61, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:38:19,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=643248.8333333334, ans=0.125 2024-09-17 18:38:20,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=643248.8333333334, ans=0.125 2024-09-17 18:38:26,256 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.174e+02 2.314e+02 2.461e+02 3.888e+02, threshold=4.628e+02, percent-clipped=0.0 2024-09-17 18:38:43,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=643277.1666666666, ans=0.0 2024-09-17 18:39:13,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=643333.8333333334, ans=0.125 2024-09-17 18:39:27,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=643362.1666666666, ans=15.0 2024-09-17 18:39:37,383 INFO [train.py:1198] (0/2) Epoch 36, batch 3400, loss[loss=0.2625, ctc_loss=0.1798, cr_loss=0.4136, over 19497.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1457, cr_loss=0.3699, over 4103467.19 frames. ], batch size: 90, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:40:12,840 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-17 18:40:20,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=643447.1666666666, ans=0.125 2024-09-17 18:40:52,967 INFO [train.py:1198] (0/2) Epoch 36, batch 3450, loss[loss=0.2101, ctc_loss=0.1387, cr_loss=0.3572, over 20889.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1461, cr_loss=0.37, over 4105822.80 frames. ], batch size: 54, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:41:00,689 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.180e+02 2.316e+02 2.487e+02 2.991e+02, threshold=4.632e+02, percent-clipped=0.0 2024-09-17 18:41:01,472 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.28 vs. limit=6.0 2024-09-17 18:41:16,549 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-17 18:41:35,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=643588.8333333334, ans=0.04949747468305833 2024-09-17 18:42:08,779 INFO [train.py:1198] (0/2) Epoch 36, batch 3500, loss[loss=0.1872, ctc_loss=0.1205, cr_loss=0.3335, over 20938.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1462, cr_loss=0.3704, over 4106109.01 frames. ], batch size: 49, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:42:43,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=643730.5, ans=0.125 2024-09-17 18:43:11,286 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.70 vs. limit=10.0 2024-09-17 18:43:12,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=643787.1666666666, ans=0.5 2024-09-17 18:43:24,384 INFO [train.py:1198] (0/2) Epoch 36, batch 3550, loss[loss=0.2168, ctc_loss=0.1451, cr_loss=0.3585, over 20970.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1465, cr_loss=0.3701, over 4099244.24 frames. ], batch size: 58, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:43:31,834 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.190e+02 2.348e+02 2.508e+02 4.507e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-17 18:43:41,461 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-17 18:43:45,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=643843.8333333334, ans=0.025 2024-09-17 18:44:02,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=643872.1666666666, ans=0.0 2024-09-17 18:44:02,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=643872.1666666666, ans=0.0 2024-09-17 18:44:21,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=643900.5, ans=0.025 2024-09-17 18:44:28,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=643928.8333333334, ans=0.125 2024-09-17 18:44:32,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=643928.8333333334, ans=0.125 2024-09-17 18:44:45,433 INFO [train.py:1198] (0/2) Epoch 36, batch 3600, loss[loss=0.2387, ctc_loss=0.1576, cr_loss=0.4054, over 20985.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.3718, over 4090952.80 frames. ], batch size: 67, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:44:57,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=643957.1666666666, ans=0.04949747468305833 2024-09-17 18:45:22,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-17 18:45:50,650 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:46:00,823 INFO [train.py:1198] (0/2) Epoch 36, batch 3650, loss[loss=0.2953, ctc_loss=0.2106, cr_loss=0.4234, over 14109.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1478, cr_loss=0.3718, over 4087558.81 frames. ], batch size: 149, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:46:08,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=644098.8333333334, ans=0.125 2024-09-17 18:46:09,978 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.231e+02 2.394e+02 2.586e+02 5.344e+02, threshold=4.788e+02, percent-clipped=1.0 2024-09-17 18:46:16,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=644127.1666666666, ans=0.0 2024-09-17 18:46:25,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=644127.1666666666, ans=0.125 2024-09-17 18:47:16,530 INFO [train.py:1198] (0/2) Epoch 36, batch 3700, loss[loss=0.2065, ctc_loss=0.1366, cr_loss=0.3494, over 20976.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3708, over 4083170.70 frames. ], batch size: 58, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:47:18,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=644240.5, ans=0.0 2024-09-17 18:47:31,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=644268.8333333334, ans=0.025 2024-09-17 18:47:40,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=644268.8333333334, ans=0.04949747468305833 2024-09-17 18:47:50,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=644297.1666666666, ans=0.125 2024-09-17 18:48:01,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=644325.5, ans=0.125 2024-09-17 18:48:05,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644325.5, ans=0.1 2024-09-17 18:48:09,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=644325.5, ans=0.125 2024-09-17 18:48:11,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=644325.5, ans=0.1 2024-09-17 18:48:15,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=644353.8333333334, ans=0.125 2024-09-17 18:48:32,126 INFO [train.py:1198] (0/2) Epoch 36, batch 3750, loss[loss=0.1999, ctc_loss=0.1319, cr_loss=0.3395, over 21064.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1473, cr_loss=0.3713, over 4087156.01 frames. ], batch size: 53, lr: 2.29e-03, grad_scale: 16.0 2024-09-17 18:48:35,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=644382.1666666666, ans=0.0 2024-09-17 18:48:41,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644382.1666666666, ans=0.1 2024-09-17 18:48:42,430 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.169e+02 2.328e+02 2.594e+02 5.231e+02, threshold=4.656e+02, percent-clipped=1.0 2024-09-17 18:48:42,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=644382.1666666666, ans=0.0 2024-09-17 18:49:50,561 INFO [train.py:1198] (0/2) Epoch 36, batch 3800, loss[loss=0.2239, ctc_loss=0.1459, cr_loss=0.39, over 20892.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.3723, over 4093009.12 frames. ], batch size: 54, lr: 2.29e-03, grad_scale: 16.0 2024-09-17 18:50:41,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.22 vs. limit=6.0 2024-09-17 18:51:00,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644637.1666666666, ans=0.1 2024-09-17 18:51:08,949 INFO [train.py:1198] (0/2) Epoch 36, batch 3850, loss[loss=0.2028, ctc_loss=0.1333, cr_loss=0.3475, over 20947.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.3722, over 4100039.75 frames. ], batch size: 50, lr: 2.29e-03, grad_scale: 16.0 2024-09-17 18:51:19,440 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.172e+02 2.267e+02 2.420e+02 3.328e+02, threshold=4.535e+02, percent-clipped=0.0 2024-09-17 18:51:28,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=644693.8333333334, ans=0.0 2024-09-17 18:51:53,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=644750.5, ans=0.025 2024-09-17 18:52:08,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=644778.8333333334, ans=0.125 2024-09-17 18:52:24,399 INFO [train.py:1198] (0/2) Epoch 36, batch 3900, loss[loss=0.2268, ctc_loss=0.1524, cr_loss=0.3718, over 20670.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1475, cr_loss=0.372, over 4106019.03 frames. ], batch size: 68, lr: 2.29e-03, grad_scale: 16.0 2024-09-17 18:52:27,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=644807.1666666666, ans=0.125 2024-09-17 18:52:44,681 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=22.5 2024-09-17 18:52:50,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=644835.5, ans=0.2 2024-09-17 18:52:51,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=644835.5, ans=0.0 2024-09-17 18:52:53,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644863.8333333334, ans=0.1 2024-09-17 18:52:59,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=644863.8333333334, ans=0.125 2024-09-17 18:53:39,687 INFO [train.py:1198] (0/2) Epoch 36, batch 3950, loss[loss=0.2031, ctc_loss=0.1346, cr_loss=0.3421, over 20994.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.147, cr_loss=0.3717, over 4101953.06 frames. ], batch size: 52, lr: 2.29e-03, grad_scale: 16.0 2024-09-17 18:53:50,100 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.216e+02 2.344e+02 2.513e+02 4.075e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-17 18:53:50,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=644948.8333333334, ans=0.1 2024-09-17 18:54:20,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=645005.5, ans=0.0 2024-09-17 18:54:44,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=645062.1666666666, ans=0.125 2024-09-17 18:54:47,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=645062.1666666666, ans=0.125 2024-09-17 18:54:57,572 INFO [train.py:1198] (0/2) Epoch 36, batch 4000, loss[loss=0.2118, ctc_loss=0.1373, cr_loss=0.3726, over 20906.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3715, over 4102268.75 frames. ], batch size: 54, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:55:23,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=645118.8333333334, ans=0.125 2024-09-17 18:55:35,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=645147.1666666666, ans=0.125 2024-09-17 18:56:08,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=645203.8333333334, ans=0.1 2024-09-17 18:56:10,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645203.8333333334, ans=0.1 2024-09-17 18:56:15,534 INFO [train.py:1198] (0/2) Epoch 36, batch 4050, loss[loss=0.2598, ctc_loss=0.1764, cr_loss=0.4166, over 18216.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3725, over 4105685.05 frames. ], batch size: 108, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:56:26,002 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.152e+02 2.288e+02 2.439e+02 3.778e+02, threshold=4.576e+02, percent-clipped=0.0 2024-09-17 18:56:30,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=645260.5, ans=0.125 2024-09-17 18:56:52,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=645288.8333333334, ans=0.125 2024-09-17 18:56:56,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=645288.8333333334, ans=0.1 2024-09-17 18:57:13,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-17 18:57:23,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.53 vs. limit=10.0 2024-09-17 18:57:31,319 INFO [train.py:1198] (0/2) Epoch 36, batch 4100, loss[loss=0.178, ctc_loss=0.1145, cr_loss=0.3173, over 20225.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1473, cr_loss=0.371, over 4086057.94 frames. ], batch size: 45, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:57:33,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=645373.8333333334, ans=0.0 2024-09-17 18:57:37,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=645373.8333333334, ans=0.125 2024-09-17 18:58:06,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=645430.5, ans=0.2 2024-09-17 18:58:21,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2024-09-17 18:58:42,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=645487.1666666666, ans=0.2 2024-09-17 18:58:46,253 INFO [train.py:1198] (0/2) Epoch 36, batch 4150, loss[loss=0.2081, ctc_loss=0.1359, cr_loss=0.3612, over 20564.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1471, cr_loss=0.3713, over 4093162.75 frames. ], batch size: 75, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 18:58:57,006 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.186e+02 2.345e+02 2.474e+02 3.263e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-17 18:58:57,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=645515.5, ans=0.125 2024-09-17 18:59:04,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=645543.8333333334, ans=0.2 2024-09-17 18:59:17,555 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.22 vs. limit=6.0 2024-09-17 18:59:35,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=645600.5, ans=0.125 2024-09-17 19:00:02,523 INFO [train.py:1198] (0/2) Epoch 36, batch 4200, loss[loss=0.2281, ctc_loss=0.1493, cr_loss=0.3941, over 21031.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1468, cr_loss=0.3712, over 4099583.53 frames. ], batch size: 61, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:00:27,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=645685.5, ans=10.0 2024-09-17 19:00:40,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=645713.8333333334, ans=0.0 2024-09-17 19:01:03,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645742.1666666666, ans=0.1 2024-09-17 19:01:20,893 INFO [train.py:1198] (0/2) Epoch 36, batch 4250, loss[loss=0.1678, ctc_loss=0.1096, cr_loss=0.291, over 20003.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.372, over 4089593.93 frames. ], batch size: 44, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:01:31,167 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.807e+02 2.186e+02 2.328e+02 2.483e+02 3.231e+02, threshold=4.655e+02, percent-clipped=0.0 2024-09-17 19:01:33,130 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:01:48,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=645827.1666666666, ans=0.125 2024-09-17 19:02:27,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=645912.1666666666, ans=0.0 2024-09-17 19:02:30,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=645912.1666666666, ans=0.125 2024-09-17 19:02:39,489 INFO [train.py:1198] (0/2) Epoch 36, batch 4300, loss[loss=0.2425, ctc_loss=0.1626, cr_loss=0.3998, over 20829.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3722, over 4090101.49 frames. ], batch size: 65, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:02:50,484 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:02:59,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=645968.8333333334, ans=0.2 2024-09-17 19:03:09,758 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-228000.pt 2024-09-17 19:03:17,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=645997.1666666666, ans=0.125 2024-09-17 19:03:56,044 INFO [train.py:1198] (0/2) Epoch 36, batch 4350, loss[loss=0.1837, ctc_loss=0.1202, cr_loss=0.3179, over 20953.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.372, over 4090429.74 frames. ], batch size: 49, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:03:58,360 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2024-09-17 19:04:06,750 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.137e+02 2.304e+02 2.491e+02 3.547e+02, threshold=4.607e+02, percent-clipped=0.0 2024-09-17 19:04:16,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=646110.5, ans=0.2 2024-09-17 19:04:58,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=646195.5, ans=0.2 2024-09-17 19:05:03,205 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:05:03,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=646195.5, ans=0.0 2024-09-17 19:05:11,969 INFO [train.py:1198] (0/2) Epoch 36, batch 4400, loss[loss=0.2646, ctc_loss=0.1764, cr_loss=0.4409, over 18261.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1479, cr_loss=0.373, over 4085156.73 frames. ], batch size: 108, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:05:20,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-17 19:05:28,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=646252.1666666666, ans=0.5 2024-09-17 19:05:47,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2024-09-17 19:05:54,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=646280.5, ans=0.09899494936611666 2024-09-17 19:06:28,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=646365.5, ans=0.0 2024-09-17 19:06:29,922 INFO [train.py:1198] (0/2) Epoch 36, batch 4450, loss[loss=0.1906, ctc_loss=0.124, cr_loss=0.333, over 21065.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1481, cr_loss=0.3738, over 4090863.58 frames. ], batch size: 53, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:06:40,488 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.211e+02 2.356e+02 2.581e+02 3.449e+02, threshold=4.712e+02, percent-clipped=0.0 2024-09-17 19:06:57,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=12.0 2024-09-17 19:07:29,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=646450.5, ans=0.0 2024-09-17 19:07:38,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.74 vs. limit=5.0 2024-09-17 19:07:48,785 INFO [train.py:1198] (0/2) Epoch 36, batch 4500, loss[loss=0.2081, ctc_loss=0.1335, cr_loss=0.3728, over 20940.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1476, cr_loss=0.3734, over 4092990.45 frames. ], batch size: 60, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:08:01,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=646507.1666666666, ans=0.0 2024-09-17 19:08:23,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=646563.8333333334, ans=0.0 2024-09-17 19:08:43,762 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=22.5 2024-09-17 19:08:46,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=646592.1666666666, ans=0.125 2024-09-17 19:09:01,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2024-09-17 19:09:04,045 INFO [train.py:1198] (0/2) Epoch 36, batch 4550, loss[loss=0.2502, ctc_loss=0.1673, cr_loss=0.4143, over 20986.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3722, over 4086600.02 frames. ], batch size: 64, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:09:14,512 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.761e+02 2.211e+02 2.333e+02 2.511e+02 5.023e+02, threshold=4.665e+02, percent-clipped=1.0 2024-09-17 19:09:30,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=646677.1666666666, ans=0.0 2024-09-17 19:10:03,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=646762.1666666666, ans=0.0 2024-09-17 19:10:20,145 INFO [train.py:1198] (0/2) Epoch 36, batch 4600, loss[loss=0.2423, ctc_loss=0.1623, cr_loss=0.3999, over 20682.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3723, over 4094785.94 frames. ], batch size: 71, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:10:21,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=646790.5, ans=0.035 2024-09-17 19:10:28,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-17 19:10:42,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=646818.8333333334, ans=15.0 2024-09-17 19:10:50,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=646847.1666666666, ans=0.1 2024-09-17 19:11:36,157 INFO [train.py:1198] (0/2) Epoch 36, batch 4650, loss[loss=0.2511, ctc_loss=0.1758, cr_loss=0.3768, over 18048.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.147, cr_loss=0.3714, over 4093753.47 frames. ], batch size: 108, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:11:43,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=8.0 2024-09-17 19:11:49,524 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.184e+02 2.321e+02 2.513e+02 4.292e+02, threshold=4.641e+02, percent-clipped=0.0 2024-09-17 19:11:56,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=646960.5, ans=0.025 2024-09-17 19:11:57,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=646960.5, ans=0.125 2024-09-17 19:12:52,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=647073.8333333334, ans=0.0 2024-09-17 19:12:53,761 INFO [train.py:1198] (0/2) Epoch 36, batch 4700, loss[loss=0.2198, ctc_loss=0.1448, cr_loss=0.3748, over 20874.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1467, cr_loss=0.3711, over 4096684.86 frames. ], batch size: 57, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:13:28,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647130.5, ans=0.1 2024-09-17 19:13:45,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=647158.8333333334, ans=0.0 2024-09-17 19:14:03,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=647187.1666666666, ans=0.025 2024-09-17 19:14:11,304 INFO [train.py:1198] (0/2) Epoch 36, batch 4750, loss[loss=0.2253, ctc_loss=0.1464, cr_loss=0.3947, over 20962.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1468, cr_loss=0.3717, over 4098037.62 frames. ], batch size: 58, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:14:13,542 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-17 19:14:16,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=647215.5, ans=0.125 2024-09-17 19:14:21,892 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.852e+02 2.204e+02 2.343e+02 2.499e+02 3.449e+02, threshold=4.685e+02, percent-clipped=0.0 2024-09-17 19:14:23,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=647215.5, ans=0.0 2024-09-17 19:15:00,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647300.5, ans=0.1 2024-09-17 19:15:09,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=647300.5, ans=0.125 2024-09-17 19:15:26,934 INFO [train.py:1198] (0/2) Epoch 36, batch 4800, loss[loss=0.2652, ctc_loss=0.1775, cr_loss=0.4386, over 21009.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1466, cr_loss=0.3711, over 4086604.00 frames. ], batch size: 64, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:16:17,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=647442.1666666666, ans=0.125 2024-09-17 19:16:42,756 INFO [train.py:1198] (0/2) Epoch 36, batch 4850, loss[loss=0.2219, ctc_loss=0.1481, cr_loss=0.3695, over 21027.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3725, over 4097711.77 frames. ], batch size: 62, lr: 2.29e-03, grad_scale: 32.0 2024-09-17 19:16:53,351 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.188e+02 2.296e+02 2.498e+02 4.248e+02, threshold=4.593e+02, percent-clipped=0.0 2024-09-17 19:16:53,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-09-17 19:17:16,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=647555.5, ans=0.125 2024-09-17 19:17:40,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647583.8333333334, ans=0.1 2024-09-17 19:17:41,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=647583.8333333334, ans=0.0 2024-09-17 19:17:59,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=647640.5, ans=0.125 2024-09-17 19:18:00,926 INFO [train.py:1198] (0/2) Epoch 36, batch 4900, loss[loss=0.2177, ctc_loss=0.142, cr_loss=0.3781, over 21038.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1469, cr_loss=0.3716, over 4102640.20 frames. ], batch size: 62, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:18:52,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=647725.5, ans=0.125 2024-09-17 19:19:06,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=647753.8333333334, ans=0.025 2024-09-17 19:19:14,665 INFO [train.py:1198] (0/2) Epoch 36, batch 4950, loss[loss=0.1721, ctc_loss=0.1112, cr_loss=0.3047, over 20961.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1472, cr_loss=0.372, over 4103691.37 frames. ], batch size: 49, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:19:25,019 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.181e+02 2.310e+02 2.451e+02 3.277e+02, threshold=4.619e+02, percent-clipped=0.0 2024-09-17 19:19:37,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=647810.5, ans=0.0 2024-09-17 19:19:37,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=647810.5, ans=0.0 2024-09-17 19:19:49,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.28 vs. limit=15.0 2024-09-17 19:20:05,759 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:20:21,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=647895.5, ans=0.0 2024-09-17 19:20:26,257 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:20:31,772 INFO [train.py:1198] (0/2) Epoch 36, batch 5000, loss[loss=0.2505, ctc_loss=0.1691, cr_loss=0.4071, over 20641.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1481, cr_loss=0.3736, over 4107793.27 frames. ], batch size: 68, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:21:02,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-17 19:21:34,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=648037.1666666666, ans=0.125 2024-09-17 19:21:45,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=648065.5, ans=0.025 2024-09-17 19:21:45,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=648065.5, ans=0.025 2024-09-17 19:21:46,216 INFO [train.py:1198] (0/2) Epoch 36, batch 5050, loss[loss=0.1841, ctc_loss=0.1187, cr_loss=0.3272, over 19948.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1476, cr_loss=0.3733, over 4111758.10 frames. ], batch size: 44, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:21:56,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.192e+02 2.288e+02 2.393e+02 3.006e+02, threshold=4.576e+02, percent-clipped=0.0 2024-09-17 19:22:35,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=648150.5, ans=0.02 2024-09-17 19:23:00,823 INFO [train.py:1198] (0/2) Epoch 36, batch 5100, loss[loss=0.2258, ctc_loss=0.15, cr_loss=0.3794, over 21085.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1481, cr_loss=0.3739, over 4116530.75 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:23:08,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=648207.1666666666, ans=0.125 2024-09-17 19:23:49,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=648292.1666666666, ans=0.025 2024-09-17 19:23:50,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=648292.1666666666, ans=0.125 2024-09-17 19:23:53,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=648292.1666666666, ans=0.0 2024-09-17 19:23:56,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=648292.1666666666, ans=0.0 2024-09-17 19:23:58,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-09-17 19:24:14,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=648348.8333333334, ans=0.125 2024-09-17 19:24:15,748 INFO [train.py:1198] (0/2) Epoch 36, batch 5150, loss[loss=0.2012, ctc_loss=0.1329, cr_loss=0.3415, over 20821.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1476, cr_loss=0.3732, over 4113661.93 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:24:23,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=648348.8333333334, ans=0.0 2024-09-17 19:24:26,054 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.219e+02 2.328e+02 2.472e+02 3.334e+02, threshold=4.656e+02, percent-clipped=0.0 2024-09-17 19:24:43,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=648405.5, ans=0.0 2024-09-17 19:25:20,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=648462.1666666666, ans=0.125 2024-09-17 19:25:30,025 INFO [train.py:1198] (0/2) Epoch 36, batch 5200, loss[loss=0.2436, ctc_loss=0.1663, cr_loss=0.3862, over 20098.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3722, over 4102376.43 frames. ], batch size: 80, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:25:46,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=648518.8333333334, ans=0.125 2024-09-17 19:26:14,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=648575.5, ans=0.125 2024-09-17 19:26:44,297 INFO [train.py:1198] (0/2) Epoch 36, batch 5250, loss[loss=0.225, ctc_loss=0.1492, cr_loss=0.3791, over 21043.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1475, cr_loss=0.3728, over 4101904.15 frames. ], batch size: 62, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:26:54,618 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.172e+02 2.334e+02 2.449e+02 4.552e+02, threshold=4.668e+02, percent-clipped=0.0 2024-09-17 19:27:10,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=22.5 2024-09-17 19:27:18,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=648688.8333333334, ans=0.125 2024-09-17 19:27:26,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=648688.8333333334, ans=10.0 2024-09-17 19:27:29,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=648717.1666666666, ans=0.125 2024-09-17 19:27:43,959 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-17 19:28:01,142 INFO [train.py:1198] (0/2) Epoch 36, batch 5300, loss[loss=0.251, ctc_loss=0.17, cr_loss=0.4051, over 20845.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1473, cr_loss=0.3732, over 4110502.80 frames. ], batch size: 65, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:28:01,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=648773.8333333334, ans=0.0 2024-09-17 19:28:04,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=648773.8333333334, ans=0.1 2024-09-17 19:28:07,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=648773.8333333334, ans=0.2 2024-09-17 19:29:15,282 INFO [train.py:1198] (0/2) Epoch 36, batch 5350, loss[loss=0.2295, ctc_loss=0.1547, cr_loss=0.3739, over 20380.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1473, cr_loss=0.3731, over 4108409.17 frames. ], batch size: 74, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:29:25,756 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.135e+02 2.318e+02 2.451e+02 3.354e+02, threshold=4.635e+02, percent-clipped=0.0 2024-09-17 19:29:38,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=12.0 2024-09-17 19:30:09,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=649000.5, ans=0.125 2024-09-17 19:30:31,936 INFO [train.py:1198] (0/2) Epoch 36, batch 5400, loss[loss=0.2264, ctc_loss=0.1486, cr_loss=0.3891, over 20655.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1469, cr_loss=0.3719, over 4101544.08 frames. ], batch size: 71, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:30:33,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=649057.1666666666, ans=0.05 2024-09-17 19:30:34,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=649057.1666666666, ans=0.1 2024-09-17 19:30:35,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=649057.1666666666, ans=0.125 2024-09-17 19:30:50,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=649085.5, ans=0.125 2024-09-17 19:30:56,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=649085.5, ans=0.5 2024-09-17 19:30:56,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2024-09-17 19:31:12,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649113.8333333334, ans=0.1 2024-09-17 19:31:40,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=649170.5, ans=0.125 2024-09-17 19:31:46,545 INFO [train.py:1198] (0/2) Epoch 36, batch 5450, loss[loss=0.2145, ctc_loss=0.1415, cr_loss=0.3648, over 20895.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1464, cr_loss=0.3704, over 4101343.73 frames. ], batch size: 54, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:31:51,661 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-09-17 19:31:56,989 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.180e+02 2.319e+02 2.454e+02 3.887e+02, threshold=4.638e+02, percent-clipped=0.0 2024-09-17 19:31:57,721 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=22.5 2024-09-17 19:32:18,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-09-17 19:32:34,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=649283.8333333334, ans=0.125 2024-09-17 19:32:42,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=649283.8333333334, ans=0.125 2024-09-17 19:32:55,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=649312.1666666666, ans=0.125 2024-09-17 19:33:00,843 INFO [train.py:1198] (0/2) Epoch 36, batch 5500, loss[loss=0.2246, ctc_loss=0.1522, cr_loss=0.3623, over 21038.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1463, cr_loss=0.3705, over 4103180.32 frames. ], batch size: 62, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:33:23,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649368.8333333334, ans=0.1 2024-09-17 19:34:14,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=649482.1666666666, ans=0.125 2024-09-17 19:34:15,402 INFO [train.py:1198] (0/2) Epoch 36, batch 5550, loss[loss=0.1811, ctc_loss=0.118, cr_loss=0.3153, over 20940.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3709, over 4104501.08 frames. ], batch size: 49, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:34:25,607 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.181e+02 2.300e+02 2.475e+02 3.632e+02, threshold=4.600e+02, percent-clipped=0.0 2024-09-17 19:34:30,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=649510.5, ans=0.0 2024-09-17 19:35:29,358 INFO [train.py:1198] (0/2) Epoch 36, batch 5600, loss[loss=0.1877, ctc_loss=0.1245, cr_loss=0.3159, over 19520.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.3709, over 4102136.19 frames. ], batch size: 43, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:35:34,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=649623.8333333334, ans=0.0 2024-09-17 19:35:43,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-09-17 19:36:15,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=649708.8333333334, ans=0.125 2024-09-17 19:36:45,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-17 19:36:46,329 INFO [train.py:1198] (0/2) Epoch 36, batch 5650, loss[loss=0.2346, ctc_loss=0.1545, cr_loss=0.4003, over 20621.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3713, over 4100882.08 frames. ], batch size: 66, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:36:56,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.184e+02 2.325e+02 2.525e+02 4.662e+02, threshold=4.649e+02, percent-clipped=1.0 2024-09-17 19:37:41,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=12.0 2024-09-17 19:37:45,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=649878.8333333334, ans=0.2 2024-09-17 19:37:59,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2024-09-17 19:38:00,580 INFO [train.py:1198] (0/2) Epoch 36, batch 5700, loss[loss=0.2311, ctc_loss=0.1566, cr_loss=0.3724, over 20974.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3716, over 4103776.78 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:38:17,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-17 19:38:26,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=649935.5, ans=0.0 2024-09-17 19:38:30,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649963.8333333334, ans=0.1 2024-09-17 19:38:53,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649992.1666666666, ans=0.1 2024-09-17 19:39:00,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650020.5, ans=0.1 2024-09-17 19:39:15,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650048.8333333334, ans=0.1 2024-09-17 19:39:16,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=22.5 2024-09-17 19:39:17,069 INFO [train.py:1198] (0/2) Epoch 36, batch 5750, loss[loss=0.2454, ctc_loss=0.1654, cr_loss=0.4001, over 21070.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1468, cr_loss=0.3718, over 4106385.30 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 64.0 2024-09-17 19:39:27,402 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.195e+02 2.353e+02 2.538e+02 7.012e+02, threshold=4.707e+02, percent-clipped=1.0 2024-09-17 19:39:47,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=650105.5, ans=0.125 2024-09-17 19:40:03,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650133.8333333334, ans=0.1 2024-09-17 19:40:10,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=650133.8333333334, ans=0.125 2024-09-17 19:40:16,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=650162.1666666666, ans=0.0 2024-09-17 19:40:31,177 INFO [train.py:1198] (0/2) Epoch 36, batch 5800, loss[loss=0.2271, ctc_loss=0.1532, cr_loss=0.3696, over 20625.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1466, cr_loss=0.3712, over 4110694.34 frames. ], batch size: 68, lr: 2.28e-03, grad_scale: 64.0 2024-09-17 19:40:35,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=650190.5, ans=0.125 2024-09-17 19:41:11,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=650247.1666666666, ans=0.125 2024-09-17 19:41:15,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=650275.5, ans=0.125 2024-09-17 19:41:43,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=650332.1666666666, ans=0.125 2024-09-17 19:41:44,645 INFO [train.py:1198] (0/2) Epoch 36, batch 5850, loss[loss=0.2179, ctc_loss=0.1444, cr_loss=0.3678, over 20959.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3723, over 4104812.41 frames. ], batch size: 58, lr: 2.28e-03, grad_scale: 64.0 2024-09-17 19:41:46,981 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=12.0 2024-09-17 19:41:56,476 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.230e+02 2.366e+02 2.519e+02 3.256e+02, threshold=4.732e+02, percent-clipped=0.0 2024-09-17 19:42:19,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=650388.8333333334, ans=0.125 2024-09-17 19:42:26,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=650388.8333333334, ans=0.125 2024-09-17 19:42:59,122 INFO [train.py:1198] (0/2) Epoch 36, batch 5900, loss[loss=0.1805, ctc_loss=0.1167, cr_loss=0.3188, over 20968.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1469, cr_loss=0.3718, over 4104397.14 frames. ], batch size: 51, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:42:59,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=650473.8333333334, ans=0.125 2024-09-17 19:43:29,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=650530.5, ans=0.125 2024-09-17 19:43:29,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650530.5, ans=0.1 2024-09-17 19:44:13,521 INFO [train.py:1198] (0/2) Epoch 36, batch 5950, loss[loss=0.2187, ctc_loss=0.1471, cr_loss=0.3577, over 20833.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1474, cr_loss=0.3728, over 4105465.32 frames. ], batch size: 59, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:44:25,426 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.183e+02 2.325e+02 2.502e+02 4.468e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-17 19:45:21,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-09-17 19:45:30,117 INFO [train.py:1198] (0/2) Epoch 36, batch 6000, loss[loss=0.2079, ctc_loss=0.1342, cr_loss=0.3689, over 21058.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1477, cr_loss=0.3725, over 4089744.83 frames. ], batch size: 53, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:45:30,118 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 19:45:50,930 INFO [train.py:1230] (0/2) Epoch 36, validation: loss=0.04013, ctc_loss=0.04013, cr_loss=1.364e-14, over 944034.00 frames. 2024-09-17 19:45:50,931 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 19:46:21,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=650813.8333333334, ans=0.125 2024-09-17 19:46:40,765 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-17 19:46:49,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650842.1666666666, ans=0.1 2024-09-17 19:46:55,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=650870.5, ans=0.125 2024-09-17 19:47:08,563 INFO [train.py:1198] (0/2) Epoch 36, batch 6050, loss[loss=0.2846, ctc_loss=0.1972, cr_loss=0.4373, over 19458.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.148, cr_loss=0.3729, over 4093251.08 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:47:20,168 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.238e+02 2.347e+02 2.486e+02 4.621e+02, threshold=4.694e+02, percent-clipped=0.0 2024-09-17 19:47:25,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=12.0 2024-09-17 19:47:33,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=650927.1666666666, ans=0.025 2024-09-17 19:47:47,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=650955.5, ans=0.07 2024-09-17 19:48:04,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=650983.8333333334, ans=0.0 2024-09-17 19:48:07,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=651012.1666666666, ans=0.0 2024-09-17 19:48:23,341 INFO [train.py:1198] (0/2) Epoch 36, batch 6100, loss[loss=0.2366, ctc_loss=0.1595, cr_loss=0.3857, over 20314.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.148, cr_loss=0.3732, over 4097857.39 frames. ], batch size: 74, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:48:56,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651097.1666666666, ans=0.1 2024-09-17 19:49:36,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2024-09-17 19:49:37,484 INFO [train.py:1198] (0/2) Epoch 36, batch 6150, loss[loss=0.2185, ctc_loss=0.1452, cr_loss=0.3667, over 20789.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1482, cr_loss=0.3729, over 4090817.68 frames. ], batch size: 53, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:49:45,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=651182.1666666666, ans=0.125 2024-09-17 19:49:48,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2024-09-17 19:49:49,435 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.197e+02 2.368e+02 2.541e+02 4.603e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-17 19:49:57,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=22.5 2024-09-17 19:50:09,908 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=22.5 2024-09-17 19:50:30,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-17 19:50:52,249 INFO [train.py:1198] (0/2) Epoch 36, batch 6200, loss[loss=0.1926, ctc_loss=0.1254, cr_loss=0.3363, over 21000.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1476, cr_loss=0.3709, over 4069315.62 frames. ], batch size: 52, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:51:27,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=651380.5, ans=0.0 2024-09-17 19:51:29,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=651380.5, ans=0.125 2024-09-17 19:52:05,564 INFO [train.py:1198] (0/2) Epoch 36, batch 6250, loss[loss=0.2234, ctc_loss=0.1444, cr_loss=0.3949, over 20972.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1474, cr_loss=0.3695, over 4031580.37 frames. ], batch size: 55, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:52:17,285 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.178e+02 2.351e+02 2.531e+02 3.776e+02, threshold=4.702e+02, percent-clipped=0.0 2024-09-17 19:52:29,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=651493.8333333334, ans=0.0 2024-09-17 19:52:38,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=651522.1666666666, ans=0.125 2024-09-17 19:52:52,744 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-09-17 19:53:19,875 INFO [train.py:1198] (0/2) Epoch 36, batch 6300, loss[loss=0.194, ctc_loss=0.1263, cr_loss=0.3386, over 20946.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1469, cr_loss=0.3691, over 4032169.38 frames. ], batch size: 48, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:53:20,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=651607.1666666666, ans=0.0 2024-09-17 19:53:24,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=651607.1666666666, ans=0.025 2024-09-17 19:53:38,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=651635.5, ans=0.04949747468305833 2024-09-17 19:53:50,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651663.8333333334, ans=0.1 2024-09-17 19:54:29,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651720.5, ans=0.1 2024-09-17 19:54:32,232 INFO [train.py:1198] (0/2) Epoch 36, batch 6350, loss[loss=0.2643, ctc_loss=0.1854, cr_loss=0.3943, over 14278.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1486, cr_loss=0.3684, over 3911940.70 frames. ], batch size: 149, lr: 2.28e-03, grad_scale: 32.0 2024-09-17 19:54:39,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=651748.8333333334, ans=0.2 2024-09-17 19:54:43,550 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.174e+02 2.432e+02 2.753e+02 3.592e+02, threshold=4.864e+02, percent-clipped=0.0 2024-09-17 19:54:59,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=651805.5, ans=0.0 2024-09-17 19:55:23,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=15.0 2024-09-17 19:55:30,934 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-36.pt 2024-09-17 19:56:19,160 INFO [train.py:1198] (0/2) Epoch 37, batch 0, loss[loss=0.2226, ctc_loss=0.147, cr_loss=0.3779, over 21026.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.147, cr_loss=0.3779, over 21026.00 frames. ], batch size: 63, lr: 2.25e-03, grad_scale: 32.0 2024-09-17 19:56:19,161 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 19:56:36,444 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.2089, 3.4081, 2.6882, 3.1934], device='cuda:0') 2024-09-17 19:56:37,674 INFO [train.py:1230] (0/2) Epoch 37, validation: loss=0.04008, ctc_loss=0.04008, cr_loss=1.353e-14, over 944034.00 frames. 2024-09-17 19:56:37,674 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 19:56:41,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=651865.0, ans=0.0 2024-09-17 19:56:44,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=651865.0, ans=0.125 2024-09-17 19:57:23,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651950.0, ans=0.1 2024-09-17 19:57:49,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=651978.3333333334, ans=0.2 2024-09-17 19:57:49,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=22.5 2024-09-17 19:57:53,597 INFO [train.py:1198] (0/2) Epoch 37, batch 50, loss[loss=0.2356, ctc_loss=0.1568, cr_loss=0.394, over 20883.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1478, cr_loss=0.3724, over 918534.14 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 32.0 2024-09-17 19:58:07,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=652035.0, ans=0.125 2024-09-17 19:58:13,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=652035.0, ans=0.125 2024-09-17 19:58:13,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=652035.0, ans=0.0 2024-09-17 19:58:19,622 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.284e+02 2.559e+02 2.893e+02 3.893e+02, threshold=5.119e+02, percent-clipped=0.0 2024-09-17 19:58:24,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=652063.3333333334, ans=0.2 2024-09-17 19:59:09,297 INFO [train.py:1198] (0/2) Epoch 37, batch 100, loss[loss=0.2308, ctc_loss=0.1543, cr_loss=0.3824, over 20941.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1498, cr_loss=0.3757, over 1624831.97 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 32.0 2024-09-17 19:59:15,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=652148.3333333334, ans=0.0 2024-09-17 19:59:26,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=22.5 2024-09-17 19:59:31,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=22.5 2024-09-17 20:00:22,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-09-17 20:00:27,936 INFO [train.py:1198] (0/2) Epoch 37, batch 150, loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3693, over 21033.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1486, cr_loss=0.3729, over 2155474.57 frames. ], batch size: 63, lr: 2.25e-03, grad_scale: 32.0 2024-09-17 20:00:56,607 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.208e+02 2.307e+02 2.466e+02 3.531e+02, threshold=4.615e+02, percent-clipped=0.0 2024-09-17 20:01:21,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=652375.0, ans=0.125 2024-09-17 20:01:34,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=652403.3333333334, ans=0.025 2024-09-17 20:01:46,637 INFO [train.py:1198] (0/2) Epoch 37, batch 200, loss[loss=0.2397, ctc_loss=0.1586, cr_loss=0.4057, over 20978.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.373, over 2593417.48 frames. ], batch size: 58, lr: 2.25e-03, grad_scale: 32.0 2024-09-17 20:02:29,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2024-09-17 20:03:01,053 INFO [train.py:1198] (0/2) Epoch 37, batch 250, loss[loss=0.2104, ctc_loss=0.1395, cr_loss=0.3543, over 20977.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1478, cr_loss=0.3724, over 2928868.22 frames. ], batch size: 58, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:03:20,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=652601.6666666666, ans=0.125 2024-09-17 20:03:20,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=652601.6666666666, ans=0.05 2024-09-17 20:03:27,911 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.192e+02 2.327e+02 2.518e+02 3.615e+02, threshold=4.653e+02, percent-clipped=0.0 2024-09-17 20:03:36,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2024-09-17 20:03:53,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=652658.3333333334, ans=0.0 2024-09-17 20:04:06,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=652686.6666666666, ans=0.2 2024-09-17 20:04:16,345 INFO [train.py:1198] (0/2) Epoch 37, batch 300, loss[loss=0.2096, ctc_loss=0.1379, cr_loss=0.3587, over 20773.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1468, cr_loss=0.3707, over 3177888.61 frames. ], batch size: 56, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:04:54,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=652771.6666666666, ans=0.125 2024-09-17 20:05:11,883 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-17 20:05:23,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=652828.3333333334, ans=0.125 2024-09-17 20:05:34,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=652856.6666666666, ans=0.0 2024-09-17 20:05:35,358 INFO [train.py:1198] (0/2) Epoch 37, batch 350, loss[loss=0.2231, ctc_loss=0.1468, cr_loss=0.3813, over 20945.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1461, cr_loss=0.3702, over 3389328.39 frames. ], batch size: 58, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:05:42,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=652856.6666666666, ans=0.125 2024-09-17 20:06:01,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=652885.0, ans=0.07 2024-09-17 20:06:02,986 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.987e+02 2.180e+02 2.290e+02 2.473e+02 3.463e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-17 20:06:25,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=652941.6666666666, ans=0.125 2024-09-17 20:06:54,636 INFO [train.py:1198] (0/2) Epoch 37, batch 400, loss[loss=0.2069, ctc_loss=0.1374, cr_loss=0.3475, over 20963.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1449, cr_loss=0.3679, over 3541450.03 frames. ], batch size: 51, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:07:05,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=652998.3333333334, ans=0.2 2024-09-17 20:07:11,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=653026.6666666666, ans=0.0 2024-09-17 20:07:17,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=653026.6666666666, ans=0.125 2024-09-17 20:07:46,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=653083.3333333334, ans=0.1 2024-09-17 20:07:47,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=653083.3333333334, ans=0.0 2024-09-17 20:07:52,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=653083.3333333334, ans=0.0 2024-09-17 20:07:54,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.36 vs. limit=6.0 2024-09-17 20:08:09,985 INFO [train.py:1198] (0/2) Epoch 37, batch 450, loss[loss=0.2003, ctc_loss=0.1293, cr_loss=0.3551, over 20778.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1446, cr_loss=0.3674, over 3659934.54 frames. ], batch size: 53, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:08:37,326 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.130e+02 2.307e+02 2.474e+02 3.560e+02, threshold=4.614e+02, percent-clipped=0.0 2024-09-17 20:08:43,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=653196.6666666666, ans=0.1 2024-09-17 20:08:57,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=653225.0, ans=0.0 2024-09-17 20:09:04,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=653225.0, ans=0.125 2024-09-17 20:09:13,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=653253.3333333334, ans=0.125 2024-09-17 20:09:25,263 INFO [train.py:1198] (0/2) Epoch 37, batch 500, loss[loss=0.2435, ctc_loss=0.1649, cr_loss=0.3931, over 19477.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1457, cr_loss=0.3689, over 3753517.07 frames. ], batch size: 90, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:09:37,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=653281.6666666666, ans=0.1 2024-09-17 20:10:40,964 INFO [train.py:1198] (0/2) Epoch 37, batch 550, loss[loss=0.2299, ctc_loss=0.1543, cr_loss=0.3778, over 21024.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1458, cr_loss=0.3689, over 3827064.73 frames. ], batch size: 61, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:10:41,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2024-09-17 20:11:11,744 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.154e+02 2.258e+02 2.391e+02 5.312e+02, threshold=4.516e+02, percent-clipped=1.0 2024-09-17 20:11:59,948 INFO [train.py:1198] (0/2) Epoch 37, batch 600, loss[loss=0.2332, ctc_loss=0.1581, cr_loss=0.3751, over 21081.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1475, cr_loss=0.3717, over 3877947.17 frames. ], batch size: 59, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:12:25,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=653593.3333333334, ans=0.125 2024-09-17 20:13:06,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=653678.3333333334, ans=0.0 2024-09-17 20:13:12,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=653678.3333333334, ans=0.125 2024-09-17 20:13:18,711 INFO [train.py:1198] (0/2) Epoch 37, batch 650, loss[loss=0.1904, ctc_loss=0.1224, cr_loss=0.3401, over 20937.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1481, cr_loss=0.3734, over 3932567.15 frames. ], batch size: 50, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:13:19,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=653706.6666666666, ans=0.2 2024-09-17 20:13:19,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2024-09-17 20:13:23,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=653706.6666666666, ans=0.125 2024-09-17 20:13:35,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=22.5 2024-09-17 20:13:45,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.215e+02 2.336e+02 2.471e+02 4.298e+02, threshold=4.672e+02, percent-clipped=0.0 2024-09-17 20:14:24,342 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-17 20:14:34,155 INFO [train.py:1198] (0/2) Epoch 37, batch 700, loss[loss=0.162, ctc_loss=0.106, cr_loss=0.2804, over 20979.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1482, cr_loss=0.3736, over 3959165.09 frames. ], batch size: 48, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:15:04,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=653905.0, ans=0.125 2024-09-17 20:15:17,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=653933.3333333334, ans=0.025 2024-09-17 20:15:26,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=653933.3333333334, ans=0.125 2024-09-17 20:15:44,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=653961.6666666666, ans=0.125 2024-09-17 20:15:48,986 INFO [train.py:1198] (0/2) Epoch 37, batch 750, loss[loss=0.2714, ctc_loss=0.1813, cr_loss=0.4509, over 20660.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.148, cr_loss=0.3735, over 3977282.03 frames. ], batch size: 68, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:15:56,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=653990.0, ans=0.025 2024-09-17 20:16:16,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=654018.3333333334, ans=0.125 2024-09-17 20:16:17,431 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.205e+02 2.295e+02 2.454e+02 5.146e+02, threshold=4.589e+02, percent-clipped=1.0 2024-09-17 20:16:23,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=654046.6666666666, ans=0.125 2024-09-17 20:16:39,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=654075.0, ans=0.125 2024-09-17 20:17:07,414 INFO [train.py:1198] (0/2) Epoch 37, batch 800, loss[loss=0.2555, ctc_loss=0.1683, cr_loss=0.4359, over 20877.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.1483, cr_loss=0.3741, over 3996468.56 frames. ], batch size: 65, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:17:33,722 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:18:23,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=654245.0, ans=0.1 2024-09-17 20:18:25,924 INFO [train.py:1198] (0/2) Epoch 37, batch 850, loss[loss=0.2133, ctc_loss=0.1407, cr_loss=0.363, over 21075.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.148, cr_loss=0.3735, over 4024504.11 frames. ], batch size: 56, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:18:27,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=654273.3333333334, ans=0.125 2024-09-17 20:18:27,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=654273.3333333334, ans=0.1 2024-09-17 20:18:54,124 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.215e+02 2.340e+02 2.529e+02 4.240e+02, threshold=4.681e+02, percent-clipped=0.0 2024-09-17 20:19:03,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=654330.0, ans=0.125 2024-09-17 20:19:41,313 INFO [train.py:1198] (0/2) Epoch 37, batch 900, loss[loss=0.2423, ctc_loss=0.1622, cr_loss=0.4004, over 20646.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1476, cr_loss=0.3729, over 4031463.18 frames. ], batch size: 68, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:19:41,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=654415.0, ans=0.0 2024-09-17 20:19:44,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=654415.0, ans=0.2 2024-09-17 20:19:56,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=654443.3333333334, ans=0.125 2024-09-17 20:20:56,695 INFO [train.py:1198] (0/2) Epoch 37, batch 950, loss[loss=0.2586, ctc_loss=0.1752, cr_loss=0.4171, over 19337.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3725, over 4058647.70 frames. ], batch size: 90, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:21:00,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=654556.6666666666, ans=0.035 2024-09-17 20:21:13,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=654585.0, ans=0.2 2024-09-17 20:21:15,683 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=22.5 2024-09-17 20:21:25,522 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.144e+02 2.286e+02 2.467e+02 2.816e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-17 20:22:12,422 INFO [train.py:1198] (0/2) Epoch 37, batch 1000, loss[loss=0.2341, ctc_loss=0.1567, cr_loss=0.3872, over 20822.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1468, cr_loss=0.3718, over 4066227.60 frames. ], batch size: 59, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:22:27,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2024-09-17 20:22:46,376 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=22.5 2024-09-17 20:23:01,826 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.45 vs. limit=15.0 2024-09-17 20:23:34,143 INFO [train.py:1198] (0/2) Epoch 37, batch 1050, loss[loss=0.2374, ctc_loss=0.1586, cr_loss=0.3937, over 21064.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1465, cr_loss=0.3717, over 4083212.42 frames. ], batch size: 59, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:23:49,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=654868.3333333334, ans=0.0 2024-09-17 20:24:01,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=654868.3333333334, ans=0.125 2024-09-17 20:24:02,763 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.234e+02 2.323e+02 2.517e+02 3.469e+02, threshold=4.646e+02, percent-clipped=0.0 2024-09-17 20:24:18,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=654925.0, ans=0.2 2024-09-17 20:24:49,676 INFO [train.py:1198] (0/2) Epoch 37, batch 1100, loss[loss=0.2327, ctc_loss=0.1583, cr_loss=0.3722, over 21001.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1471, cr_loss=0.3723, over 4073212.09 frames. ], batch size: 55, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:25:03,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=655010.0, ans=0.1 2024-09-17 20:25:38,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=655066.6666666666, ans=0.125 2024-09-17 20:25:54,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=655095.0, ans=0.0 2024-09-17 20:26:04,720 INFO [train.py:1198] (0/2) Epoch 37, batch 1150, loss[loss=0.2509, ctc_loss=0.1692, cr_loss=0.4082, over 20684.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1471, cr_loss=0.3733, over 4082104.97 frames. ], batch size: 66, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:26:09,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=655123.3333333334, ans=0.125 2024-09-17 20:26:17,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=655123.3333333334, ans=0.09899494936611666 2024-09-17 20:26:33,164 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.202e+02 2.384e+02 2.534e+02 4.350e+02, threshold=4.768e+02, percent-clipped=0.0 2024-09-17 20:26:35,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=655180.0, ans=0.125 2024-09-17 20:27:14,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=655236.6666666666, ans=0.125 2024-09-17 20:27:20,052 INFO [train.py:1198] (0/2) Epoch 37, batch 1200, loss[loss=0.2703, ctc_loss=0.1815, cr_loss=0.4439, over 20094.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1479, cr_loss=0.3743, over 4080213.59 frames. ], batch size: 80, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:27:41,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=655293.3333333334, ans=0.1 2024-09-17 20:28:02,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=655321.6666666666, ans=0.1 2024-09-17 20:28:23,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=655378.3333333334, ans=0.025 2024-09-17 20:28:38,811 INFO [train.py:1198] (0/2) Epoch 37, batch 1250, loss[loss=0.2643, ctc_loss=0.1797, cr_loss=0.4232, over 19453.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.147, cr_loss=0.3731, over 4092717.70 frames. ], batch size: 90, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:29:07,473 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.214e+02 2.300e+02 2.435e+02 5.575e+02, threshold=4.600e+02, percent-clipped=1.0 2024-09-17 20:29:18,424 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:29:57,252 INFO [train.py:1198] (0/2) Epoch 37, batch 1300, loss[loss=0.2336, ctc_loss=0.1534, cr_loss=0.4012, over 20656.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3726, over 4087705.23 frames. ], batch size: 71, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:30:45,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=655633.3333333334, ans=0.2 2024-09-17 20:31:12,404 INFO [train.py:1198] (0/2) Epoch 37, batch 1350, loss[loss=0.1953, ctc_loss=0.1297, cr_loss=0.3283, over 21032.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1475, cr_loss=0.3733, over 4091812.68 frames. ], batch size: 62, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:31:15,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-09-17 20:31:24,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=655690.0, ans=0.125 2024-09-17 20:31:41,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=655746.6666666666, ans=0.125 2024-09-17 20:31:42,398 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.207e+02 2.389e+02 2.536e+02 3.344e+02, threshold=4.779e+02, percent-clipped=0.0 2024-09-17 20:31:44,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=655746.6666666666, ans=0.04949747468305833 2024-09-17 20:31:44,545 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=12.0 2024-09-17 20:32:02,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=655775.0, ans=0.2 2024-09-17 20:32:05,718 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=15.0 2024-09-17 20:32:19,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-17 20:32:27,662 INFO [train.py:1198] (0/2) Epoch 37, batch 1400, loss[loss=0.2291, ctc_loss=0.1539, cr_loss=0.376, over 20970.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1476, cr_loss=0.3734, over 4094859.78 frames. ], batch size: 58, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:32:34,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-17 20:32:35,876 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:33:00,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=655888.3333333334, ans=0.0 2024-09-17 20:33:04,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=655888.3333333334, ans=0.2 2024-09-17 20:33:35,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=655945.0, ans=0.2 2024-09-17 20:33:46,941 INFO [train.py:1198] (0/2) Epoch 37, batch 1450, loss[loss=0.1813, ctc_loss=0.1175, cr_loss=0.3188, over 20348.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.147, cr_loss=0.3721, over 4093327.73 frames. ], batch size: 45, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:33:48,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=655973.3333333334, ans=0.125 2024-09-17 20:33:51,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=655973.3333333334, ans=0.015 2024-09-17 20:34:07,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=656001.6666666666, ans=0.125 2024-09-17 20:34:17,332 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.190e+02 2.304e+02 2.486e+02 3.171e+02, threshold=4.608e+02, percent-clipped=0.0 2024-09-17 20:34:29,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=656030.0, ans=0.125 2024-09-17 20:34:47,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2024-09-17 20:35:05,321 INFO [train.py:1198] (0/2) Epoch 37, batch 1500, loss[loss=0.177, ctc_loss=0.1132, cr_loss=0.3187, over 20960.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3716, over 4080569.73 frames. ], batch size: 50, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:35:15,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=656115.0, ans=0.0 2024-09-17 20:35:22,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-09-17 20:35:46,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=656171.6666666666, ans=0.1 2024-09-17 20:35:57,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=656200.0, ans=0.0 2024-09-17 20:35:58,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=656200.0, ans=0.125 2024-09-17 20:36:07,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-09-17 20:36:20,667 INFO [train.py:1198] (0/2) Epoch 37, batch 1550, loss[loss=0.257, ctc_loss=0.1832, cr_loss=0.369, over 14260.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3716, over 4083676.52 frames. ], batch size: 150, lr: 2.24e-03, grad_scale: 16.0 2024-09-17 20:36:31,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656256.6666666666, ans=0.1 2024-09-17 20:36:33,780 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-09-17 20:36:51,134 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.227e+02 2.432e+02 2.612e+02 3.752e+02, threshold=4.865e+02, percent-clipped=0.0 2024-09-17 20:37:36,380 INFO [train.py:1198] (0/2) Epoch 37, batch 1600, loss[loss=0.1846, ctc_loss=0.1197, cr_loss=0.3248, over 20960.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1468, cr_loss=0.3714, over 4075875.44 frames. ], batch size: 48, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:37:36,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=656398.3333333334, ans=0.125 2024-09-17 20:38:25,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=656483.3333333334, ans=0.125 2024-09-17 20:38:27,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-17 20:38:29,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=656483.3333333334, ans=0.1 2024-09-17 20:38:52,162 INFO [train.py:1198] (0/2) Epoch 37, batch 1650, loss[loss=0.213, ctc_loss=0.1405, cr_loss=0.3625, over 20937.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1466, cr_loss=0.3708, over 4081838.66 frames. ], batch size: 60, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:38:52,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=656540.0, ans=0.07 2024-09-17 20:39:14,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=656568.3333333334, ans=0.0 2024-09-17 20:39:24,705 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.218e+02 2.352e+02 2.526e+02 4.592e+02, threshold=4.705e+02, percent-clipped=0.0 2024-09-17 20:39:49,996 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=15.0 2024-09-17 20:40:01,333 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:40:09,998 INFO [train.py:1198] (0/2) Epoch 37, batch 1700, loss[loss=0.2418, ctc_loss=0.1637, cr_loss=0.3905, over 20730.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1468, cr_loss=0.3713, over 4086130.03 frames. ], batch size: 71, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:40:21,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=656681.6666666666, ans=0.125 2024-09-17 20:41:03,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=656766.6666666666, ans=0.0 2024-09-17 20:41:28,584 INFO [train.py:1198] (0/2) Epoch 37, batch 1750, loss[loss=0.2108, ctc_loss=0.1402, cr_loss=0.3529, over 21001.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1464, cr_loss=0.3705, over 4084965.78 frames. ], batch size: 63, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:41:48,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=656851.6666666666, ans=0.125 2024-09-17 20:41:50,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.23 vs. limit=10.0 2024-09-17 20:41:57,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=656880.0, ans=0.125 2024-09-17 20:41:58,609 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.174e+02 2.346e+02 2.490e+02 4.246e+02, threshold=4.691e+02, percent-clipped=0.0 2024-09-17 20:42:03,758 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=22.5 2024-09-17 20:42:26,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=656908.3333333334, ans=0.125 2024-09-17 20:42:41,322 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:42:42,760 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:42:43,874 INFO [train.py:1198] (0/2) Epoch 37, batch 1800, loss[loss=0.2019, ctc_loss=0.1346, cr_loss=0.3365, over 20973.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1453, cr_loss=0.3688, over 4085995.01 frames. ], batch size: 49, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:42:47,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=656965.0, ans=0.0 2024-09-17 20:42:56,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=656965.0, ans=0.0 2024-09-17 20:43:00,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=656993.3333333334, ans=0.2 2024-09-17 20:43:09,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=656993.3333333334, ans=0.125 2024-09-17 20:43:38,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=657050.0, ans=0.95 2024-09-17 20:43:41,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=657050.0, ans=0.07 2024-09-17 20:43:59,032 INFO [train.py:1198] (0/2) Epoch 37, batch 1850, loss[loss=0.214, ctc_loss=0.1393, cr_loss=0.3738, over 20839.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1446, cr_loss=0.3676, over 4100056.13 frames. ], batch size: 59, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:44:02,767 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-09-17 20:44:14,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-09-17 20:44:28,969 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.214e+02 2.294e+02 2.446e+02 2.982e+02, threshold=4.588e+02, percent-clipped=0.0 2024-09-17 20:44:29,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=657163.3333333334, ans=0.04949747468305833 2024-09-17 20:44:35,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=657163.3333333334, ans=0.2 2024-09-17 20:44:37,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=657163.3333333334, ans=0.1 2024-09-17 20:44:44,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=657191.6666666666, ans=0.125 2024-09-17 20:44:48,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=657191.6666666666, ans=0.1 2024-09-17 20:44:50,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=22.5 2024-09-17 20:45:17,143 INFO [train.py:1198] (0/2) Epoch 37, batch 1900, loss[loss=0.2518, ctc_loss=0.1676, cr_loss=0.4212, over 20979.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1448, cr_loss=0.3686, over 4112262.99 frames. ], batch size: 67, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:45:28,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=657248.3333333334, ans=0.025 2024-09-17 20:45:31,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.70 vs. limit=10.0 2024-09-17 20:46:04,047 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-232000.pt 2024-09-17 20:46:18,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=657333.3333333334, ans=0.1 2024-09-17 20:46:35,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=657390.0, ans=0.025 2024-09-17 20:46:37,004 INFO [train.py:1198] (0/2) Epoch 37, batch 1950, loss[loss=0.2179, ctc_loss=0.1454, cr_loss=0.3628, over 21007.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1451, cr_loss=0.3688, over 4114669.63 frames. ], batch size: 63, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:47:06,878 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.153e+02 2.309e+02 2.482e+02 3.673e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-17 20:47:20,849 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:47:29,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=657475.0, ans=0.125 2024-09-17 20:47:39,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=657503.3333333334, ans=0.125 2024-09-17 20:47:44,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=657503.3333333334, ans=0.2 2024-09-17 20:47:51,815 INFO [train.py:1198] (0/2) Epoch 37, batch 2000, loss[loss=0.2272, ctc_loss=0.1528, cr_loss=0.372, over 20849.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1448, cr_loss=0.3687, over 4113264.03 frames. ], batch size: 59, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:48:19,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=657560.0, ans=0.2 2024-09-17 20:48:42,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=657616.6666666666, ans=0.125 2024-09-17 20:48:48,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=657616.6666666666, ans=0.125 2024-09-17 20:49:02,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=657645.0, ans=0.125 2024-09-17 20:49:07,962 INFO [train.py:1198] (0/2) Epoch 37, batch 2050, loss[loss=0.1946, ctc_loss=0.1257, cr_loss=0.3447, over 21044.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1462, cr_loss=0.3707, over 4101179.66 frames. ], batch size: 56, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:49:17,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=657673.3333333334, ans=0.2 2024-09-17 20:49:38,715 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.185e+02 2.299e+02 2.491e+02 3.230e+02, threshold=4.598e+02, percent-clipped=0.0 2024-09-17 20:50:24,425 INFO [train.py:1198] (0/2) Epoch 37, batch 2100, loss[loss=0.1828, ctc_loss=0.1172, cr_loss=0.3279, over 20905.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1452, cr_loss=0.3695, over 4108140.78 frames. ], batch size: 54, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:50:57,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=657871.6666666666, ans=0.0 2024-09-17 20:51:08,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=657871.6666666666, ans=0.025 2024-09-17 20:51:10,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=657871.6666666666, ans=15.0 2024-09-17 20:51:20,534 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:51:44,757 INFO [train.py:1198] (0/2) Epoch 37, batch 2150, loss[loss=0.2062, ctc_loss=0.1339, cr_loss=0.3614, over 20939.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1448, cr_loss=0.3689, over 4115996.49 frames. ], batch size: 60, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:52:08,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-09-17 20:52:18,846 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.204e+02 2.342e+02 2.507e+02 8.272e+02, threshold=4.684e+02, percent-clipped=1.0 2024-09-17 20:52:48,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=658070.0, ans=0.125 2024-09-17 20:53:03,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=658098.3333333334, ans=0.2 2024-09-17 20:53:04,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.88 vs. limit=15.0 2024-09-17 20:53:04,912 INFO [train.py:1198] (0/2) Epoch 37, batch 2200, loss[loss=0.1872, ctc_loss=0.1212, cr_loss=0.3304, over 20977.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1451, cr_loss=0.369, over 4098941.40 frames. ], batch size: 51, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:53:08,778 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2024-09-17 20:53:16,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=658098.3333333334, ans=0.2 2024-09-17 20:53:22,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=658126.6666666666, ans=0.125 2024-09-17 20:53:46,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=658155.0, ans=0.125 2024-09-17 20:53:51,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=658183.3333333334, ans=0.0 2024-09-17 20:53:55,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=658183.3333333334, ans=0.0 2024-09-17 20:54:21,131 INFO [train.py:1198] (0/2) Epoch 37, batch 2250, loss[loss=0.2443, ctc_loss=0.163, cr_loss=0.4062, over 20937.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1448, cr_loss=0.3681, over 4102249.87 frames. ], batch size: 67, lr: 2.24e-03, grad_scale: 32.0 2024-09-17 20:54:51,909 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.176e+02 2.296e+02 2.495e+02 3.496e+02, threshold=4.591e+02, percent-clipped=0.0 2024-09-17 20:54:52,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=658296.6666666666, ans=0.125 2024-09-17 20:54:59,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=658296.6666666666, ans=0.125 2024-09-17 20:55:10,258 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:55:36,881 INFO [train.py:1198] (0/2) Epoch 37, batch 2300, loss[loss=0.1973, ctc_loss=0.1295, cr_loss=0.339, over 20959.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1457, cr_loss=0.3697, over 4091972.15 frames. ], batch size: 50, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 20:55:54,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=658410.0, ans=0.1 2024-09-17 20:55:56,998 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.66 vs. limit=10.0 2024-09-17 20:56:02,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=658410.0, ans=0.125 2024-09-17 20:56:05,559 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:56:25,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=658466.6666666666, ans=0.1 2024-09-17 20:56:25,931 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2024-09-17 20:56:55,769 INFO [train.py:1198] (0/2) Epoch 37, batch 2350, loss[loss=0.1973, ctc_loss=0.1313, cr_loss=0.3299, over 21057.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1454, cr_loss=0.3691, over 4098656.80 frames. ], batch size: 53, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 20:57:27,390 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.176e+02 2.299e+02 2.422e+02 3.074e+02, threshold=4.599e+02, percent-clipped=0.0 2024-09-17 20:57:50,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=658608.3333333334, ans=0.0 2024-09-17 20:58:14,449 INFO [train.py:1198] (0/2) Epoch 37, batch 2400, loss[loss=0.2342, ctc_loss=0.159, cr_loss=0.3756, over 21022.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1456, cr_loss=0.3699, over 4102428.58 frames. ], batch size: 61, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 20:58:40,572 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:58:48,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=658721.6666666666, ans=0.125 2024-09-17 20:58:51,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=658721.6666666666, ans=0.025 2024-09-17 20:58:57,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=658721.6666666666, ans=0.125 2024-09-17 20:59:28,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-17 20:59:30,679 INFO [train.py:1198] (0/2) Epoch 37, batch 2450, loss[loss=0.2225, ctc_loss=0.1481, cr_loss=0.3721, over 20011.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1453, cr_loss=0.3698, over 4114871.98 frames. ], batch size: 80, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 20:59:37,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=658806.6666666666, ans=0.0 2024-09-17 21:00:00,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=658863.3333333334, ans=0.125 2024-09-17 21:00:02,895 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.183e+02 2.326e+02 2.502e+02 3.624e+02, threshold=4.651e+02, percent-clipped=0.0 2024-09-17 21:00:10,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=658863.3333333334, ans=0.1 2024-09-17 21:00:15,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=658891.6666666666, ans=0.0 2024-09-17 21:00:24,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=658891.6666666666, ans=0.0 2024-09-17 21:00:29,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=658891.6666666666, ans=0.125 2024-09-17 21:00:47,114 INFO [train.py:1198] (0/2) Epoch 37, batch 2500, loss[loss=0.2167, ctc_loss=0.1428, cr_loss=0.3693, over 21041.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1446, cr_loss=0.3689, over 4111733.97 frames. ], batch size: 56, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:01:10,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=658976.6666666666, ans=0.125 2024-09-17 21:01:29,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=659005.0, ans=0.125 2024-09-17 21:02:00,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=659061.6666666666, ans=12.0 2024-09-17 21:02:03,051 INFO [train.py:1198] (0/2) Epoch 37, batch 2550, loss[loss=0.226, ctc_loss=0.1496, cr_loss=0.3823, over 20639.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.3702, over 4089626.61 frames. ], batch size: 68, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:02:03,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=659090.0, ans=0.125 2024-09-17 21:02:05,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=659090.0, ans=0.125 2024-09-17 21:02:09,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-09-17 21:02:14,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=659090.0, ans=0.125 2024-09-17 21:02:33,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=659118.3333333334, ans=0.125 2024-09-17 21:02:38,060 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.191e+02 2.323e+02 2.469e+02 4.775e+02, threshold=4.647e+02, percent-clipped=1.0 2024-09-17 21:02:39,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=659146.6666666666, ans=0.0 2024-09-17 21:02:59,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659175.0, ans=0.1 2024-09-17 21:03:21,503 INFO [train.py:1198] (0/2) Epoch 37, batch 2600, loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.3699, over 21029.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1466, cr_loss=0.3715, over 4080274.21 frames. ], batch size: 62, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:03:21,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=659231.6666666666, ans=0.04949747468305833 2024-09-17 21:03:38,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=659260.0, ans=0.125 2024-09-17 21:03:54,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=659288.3333333334, ans=0.025 2024-09-17 21:04:17,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=659316.6666666666, ans=0.0 2024-09-17 21:04:26,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=659345.0, ans=0.125 2024-09-17 21:04:32,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-09-17 21:04:39,731 INFO [train.py:1198] (0/2) Epoch 37, batch 2650, loss[loss=0.2114, ctc_loss=0.1428, cr_loss=0.3432, over 20976.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.3709, over 4093720.50 frames. ], batch size: 55, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:04:58,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=659401.6666666666, ans=0.0 2024-09-17 21:05:11,761 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.681e+02 2.203e+02 2.339e+02 2.518e+02 3.190e+02, threshold=4.677e+02, percent-clipped=0.0 2024-09-17 21:05:12,983 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2024-09-17 21:05:20,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2024-09-17 21:05:24,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=659458.3333333334, ans=0.0 2024-09-17 21:05:50,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2024-09-17 21:05:55,945 INFO [train.py:1198] (0/2) Epoch 37, batch 2700, loss[loss=0.1983, ctc_loss=0.1301, cr_loss=0.341, over 20953.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1465, cr_loss=0.3709, over 4083792.44 frames. ], batch size: 50, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:06:09,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659543.3333333334, ans=0.1 2024-09-17 21:06:43,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2024-09-17 21:07:07,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=659628.3333333334, ans=0.04949747468305833 2024-09-17 21:07:11,178 INFO [train.py:1198] (0/2) Epoch 37, batch 2750, loss[loss=0.2676, ctc_loss=0.1876, cr_loss=0.3999, over 14268.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1463, cr_loss=0.3707, over 4075864.86 frames. ], batch size: 149, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:07:28,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=659685.0, ans=0.07 2024-09-17 21:07:35,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=659685.0, ans=0.125 2024-09-17 21:07:42,783 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.226e+02 2.361e+02 2.574e+02 3.530e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-17 21:07:55,518 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-17 21:08:29,836 INFO [train.py:1198] (0/2) Epoch 37, batch 2800, loss[loss=0.2513, ctc_loss=0.1699, cr_loss=0.4069, over 20844.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1468, cr_loss=0.372, over 4088371.22 frames. ], batch size: 65, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:08:30,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=659798.3333333334, ans=0.2 2024-09-17 21:08:43,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659826.6666666666, ans=0.1 2024-09-17 21:09:06,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2024-09-17 21:09:42,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=659911.6666666666, ans=0.125 2024-09-17 21:09:48,070 INFO [train.py:1198] (0/2) Epoch 37, batch 2850, loss[loss=0.2056, ctc_loss=0.1366, cr_loss=0.3452, over 21051.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1467, cr_loss=0.3706, over 4068500.77 frames. ], batch size: 53, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:10:20,127 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.206e+02 2.347e+02 2.475e+02 3.246e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-17 21:11:04,117 INFO [train.py:1198] (0/2) Epoch 37, batch 2900, loss[loss=0.2365, ctc_loss=0.1595, cr_loss=0.3854, over 20975.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.372, over 4072462.03 frames. ], batch size: 58, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:11:05,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=660081.6666666666, ans=0.2 2024-09-17 21:12:03,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=660195.0, ans=0.025 2024-09-17 21:12:06,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=660195.0, ans=0.125 2024-09-17 21:12:18,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-17 21:12:19,597 INFO [train.py:1198] (0/2) Epoch 37, batch 2950, loss[loss=0.2418, ctc_loss=0.1636, cr_loss=0.3909, over 20849.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1473, cr_loss=0.372, over 4078752.61 frames. ], batch size: 65, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:12:50,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=660280.0, ans=0.2 2024-09-17 21:12:51,603 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.251e+02 2.373e+02 2.516e+02 7.516e+02, threshold=4.746e+02, percent-clipped=1.0 2024-09-17 21:13:27,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=660336.6666666666, ans=0.125 2024-09-17 21:13:36,222 INFO [train.py:1198] (0/2) Epoch 37, batch 3000, loss[loss=0.2386, ctc_loss=0.1602, cr_loss=0.3916, over 19502.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1471, cr_loss=0.3716, over 4080068.65 frames. ], batch size: 90, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:13:36,223 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 21:13:59,215 INFO [train.py:1230] (0/2) Epoch 37, validation: loss=0.04025, ctc_loss=0.04025, cr_loss=1.378e-14, over 944034.00 frames. 2024-09-17 21:13:59,216 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 21:14:35,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660421.6666666666, ans=0.1 2024-09-17 21:14:43,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=660421.6666666666, ans=0.05 2024-09-17 21:14:47,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=660450.0, ans=0.025 2024-09-17 21:15:13,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=660478.3333333334, ans=0.125 2024-09-17 21:15:17,866 INFO [train.py:1198] (0/2) Epoch 37, batch 3050, loss[loss=0.2079, ctc_loss=0.1359, cr_loss=0.3598, over 20813.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1473, cr_loss=0.3724, over 4092488.19 frames. ], batch size: 59, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:15:27,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=660506.6666666666, ans=0.2 2024-09-17 21:15:31,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660535.0, ans=0.1 2024-09-17 21:15:49,145 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.195e+02 2.292e+02 2.443e+02 3.169e+02, threshold=4.584e+02, percent-clipped=0.0 2024-09-17 21:15:50,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=660563.3333333334, ans=0.2 2024-09-17 21:15:51,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=660563.3333333334, ans=0.0 2024-09-17 21:15:58,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=660563.3333333334, ans=0.2 2024-09-17 21:16:33,021 INFO [train.py:1198] (0/2) Epoch 37, batch 3100, loss[loss=0.1937, ctc_loss=0.1284, cr_loss=0.3264, over 20947.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1473, cr_loss=0.3726, over 4098837.53 frames. ], batch size: 49, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:16:33,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-17 21:16:36,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=660648.3333333334, ans=0.2 2024-09-17 21:17:02,034 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:17:04,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=15.0 2024-09-17 21:17:05,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=660705.0, ans=0.025 2024-09-17 21:17:21,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=660733.3333333334, ans=0.05 2024-09-17 21:17:37,296 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2024-09-17 21:17:47,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=660790.0, ans=0.2 2024-09-17 21:17:48,597 INFO [train.py:1198] (0/2) Epoch 37, batch 3150, loss[loss=0.1864, ctc_loss=0.1191, cr_loss=0.3366, over 20904.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3725, over 4095570.99 frames. ], batch size: 54, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:18:16,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=660818.3333333334, ans=0.0 2024-09-17 21:18:20,131 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.248e+02 2.337e+02 2.522e+02 3.537e+02, threshold=4.674e+02, percent-clipped=0.0 2024-09-17 21:18:47,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660903.3333333334, ans=0.1 2024-09-17 21:19:03,925 INFO [train.py:1198] (0/2) Epoch 37, batch 3200, loss[loss=0.1762, ctc_loss=0.1137, cr_loss=0.3125, over 20253.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1473, cr_loss=0.3729, over 4091367.67 frames. ], batch size: 45, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:19:16,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660931.6666666666, ans=0.1 2024-09-17 21:19:21,111 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=22.5 2024-09-17 21:19:42,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=22.5 2024-09-17 21:19:43,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=660988.3333333334, ans=0.125 2024-09-17 21:19:48,614 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:19:53,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=661016.6666666666, ans=0.0 2024-09-17 21:20:09,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=661045.0, ans=0.0 2024-09-17 21:20:11,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=661045.0, ans=0.0 2024-09-17 21:20:25,878 INFO [train.py:1198] (0/2) Epoch 37, batch 3250, loss[loss=0.2178, ctc_loss=0.1468, cr_loss=0.3548, over 21001.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1469, cr_loss=0.3721, over 4093906.68 frames. ], batch size: 52, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:20:35,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=661073.3333333334, ans=0.125 2024-09-17 21:20:44,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=661101.6666666666, ans=0.025 2024-09-17 21:20:57,083 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.220e+02 2.360e+02 2.568e+02 3.393e+02, threshold=4.719e+02, percent-clipped=0.0 2024-09-17 21:21:06,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=661130.0, ans=0.125 2024-09-17 21:21:07,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.55 vs. limit=8.0 2024-09-17 21:21:40,938 INFO [train.py:1198] (0/2) Epoch 37, batch 3300, loss[loss=0.2383, ctc_loss=0.161, cr_loss=0.3866, over 13896.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1475, cr_loss=0.3734, over 4074873.06 frames. ], batch size: 149, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:21:41,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=661215.0, ans=0.0 2024-09-17 21:21:47,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=661215.0, ans=0.0 2024-09-17 21:22:05,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=661243.3333333334, ans=0.2 2024-09-17 21:22:07,904 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.11 vs. limit=15.0 2024-09-17 21:22:38,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=22.5 2024-09-17 21:22:48,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-09-17 21:22:57,066 INFO [train.py:1198] (0/2) Epoch 37, batch 3350, loss[loss=0.1877, ctc_loss=0.1223, cr_loss=0.3269, over 20976.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1469, cr_loss=0.372, over 4076124.25 frames. ], batch size: 52, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:23:28,440 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.209e+02 2.367e+02 2.646e+02 4.774e+02, threshold=4.733e+02, percent-clipped=1.0 2024-09-17 21:23:33,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=12.0 2024-09-17 21:24:12,340 INFO [train.py:1198] (0/2) Epoch 37, batch 3400, loss[loss=0.2188, ctc_loss=0.147, cr_loss=0.3589, over 21013.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3705, over 4076609.61 frames. ], batch size: 61, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:24:29,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=661526.6666666666, ans=0.125 2024-09-17 21:25:01,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=661583.3333333334, ans=0.1 2024-09-17 21:25:02,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=661583.3333333334, ans=0.125 2024-09-17 21:25:31,119 INFO [train.py:1198] (0/2) Epoch 37, batch 3450, loss[loss=0.2172, ctc_loss=0.142, cr_loss=0.3762, over 20789.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1462, cr_loss=0.3704, over 4073462.41 frames. ], batch size: 53, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:25:57,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=661668.3333333334, ans=0.2 2024-09-17 21:26:05,559 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.160e+02 2.319e+02 2.439e+02 5.227e+02, threshold=4.638e+02, percent-clipped=1.0 2024-09-17 21:26:12,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=661696.6666666666, ans=0.125 2024-09-17 21:26:15,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=661696.6666666666, ans=0.1 2024-09-17 21:26:33,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-09-17 21:26:33,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.15 vs. limit=6.0 2024-09-17 21:26:49,572 INFO [train.py:1198] (0/2) Epoch 37, batch 3500, loss[loss=0.2411, ctc_loss=0.1606, cr_loss=0.4026, over 20988.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1459, cr_loss=0.3695, over 4069217.64 frames. ], batch size: 55, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:26:52,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=661781.6666666666, ans=0.2 2024-09-17 21:28:05,448 INFO [train.py:1198] (0/2) Epoch 37, batch 3550, loss[loss=0.1977, ctc_loss=0.132, cr_loss=0.3288, over 20957.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3702, over 4059982.70 frames. ], batch size: 50, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:28:23,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=661951.6666666666, ans=0.125 2024-09-17 21:28:36,663 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.214e+02 2.311e+02 2.470e+02 4.177e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-17 21:28:38,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=661980.0, ans=0.125 2024-09-17 21:29:10,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=662036.6666666666, ans=0.0 2024-09-17 21:29:16,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=662036.6666666666, ans=0.0 2024-09-17 21:29:17,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=662036.6666666666, ans=0.09899494936611666 2024-09-17 21:29:20,392 INFO [train.py:1198] (0/2) Epoch 37, batch 3600, loss[loss=0.2416, ctc_loss=0.16, cr_loss=0.4081, over 20999.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1471, cr_loss=0.3716, over 4073621.54 frames. ], batch size: 58, lr: 2.23e-03, grad_scale: 32.0 2024-09-17 21:30:01,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2024-09-17 21:30:36,586 INFO [train.py:1198] (0/2) Epoch 37, batch 3650, loss[loss=0.1851, ctc_loss=0.1208, cr_loss=0.3215, over 20960.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1474, cr_loss=0.3723, over 4068124.98 frames. ], batch size: 50, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:30:36,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=662206.6666666666, ans=0.95 2024-09-17 21:31:05,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662235.0, ans=0.1 2024-09-17 21:31:05,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=662235.0, ans=0.125 2024-09-17 21:31:12,908 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.218e+02 2.366e+02 2.463e+02 4.063e+02, threshold=4.733e+02, percent-clipped=0.0 2024-09-17 21:31:16,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=662263.3333333334, ans=0.125 2024-09-17 21:31:28,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=662291.6666666666, ans=0.125 2024-09-17 21:31:34,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=662291.6666666666, ans=0.0 2024-09-17 21:31:58,426 INFO [train.py:1198] (0/2) Epoch 37, batch 3700, loss[loss=0.2248, ctc_loss=0.1492, cr_loss=0.378, over 20930.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1473, cr_loss=0.372, over 4056016.24 frames. ], batch size: 60, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:32:25,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-17 21:32:42,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=662433.3333333334, ans=0.2 2024-09-17 21:32:42,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=22.5 2024-09-17 21:33:14,513 INFO [train.py:1198] (0/2) Epoch 37, batch 3750, loss[loss=0.2203, ctc_loss=0.146, cr_loss=0.3714, over 20705.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.3723, over 4055716.81 frames. ], batch size: 68, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:33:49,429 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.215e+02 2.319e+02 2.460e+02 3.182e+02, threshold=4.637e+02, percent-clipped=0.0 2024-09-17 21:34:02,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2024-09-17 21:34:10,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=662575.0, ans=0.0 2024-09-17 21:34:15,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=662603.3333333334, ans=0.125 2024-09-17 21:34:29,831 INFO [train.py:1198] (0/2) Epoch 37, batch 3800, loss[loss=0.2288, ctc_loss=0.1554, cr_loss=0.367, over 19692.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1474, cr_loss=0.3725, over 4067752.22 frames. ], batch size: 90, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:34:34,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=662631.6666666666, ans=0.05 2024-09-17 21:35:06,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662688.3333333334, ans=0.1 2024-09-17 21:35:10,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=662688.3333333334, ans=0.0 2024-09-17 21:35:16,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=662716.6666666666, ans=0.0 2024-09-17 21:35:22,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=662716.6666666666, ans=0.125 2024-09-17 21:35:32,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.87 vs. limit=15.0 2024-09-17 21:35:44,513 INFO [train.py:1198] (0/2) Epoch 37, batch 3850, loss[loss=0.2219, ctc_loss=0.1486, cr_loss=0.3662, over 20652.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1478, cr_loss=0.373, over 4078005.40 frames. ], batch size: 68, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:35:44,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=662773.3333333334, ans=0.0 2024-09-17 21:35:55,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-09-17 21:36:00,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2024-09-17 21:36:22,354 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.195e+02 2.352e+02 2.487e+02 3.809e+02, threshold=4.703e+02, percent-clipped=0.0 2024-09-17 21:36:30,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662830.0, ans=0.1 2024-09-17 21:36:51,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=662886.6666666666, ans=0.0 2024-09-17 21:37:03,977 INFO [train.py:1198] (0/2) Epoch 37, batch 3900, loss[loss=0.2471, ctc_loss=0.1639, cr_loss=0.4162, over 20867.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1478, cr_loss=0.3725, over 4087544.26 frames. ], batch size: 65, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:37:12,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-09-17 21:37:28,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=662943.3333333334, ans=0.125 2024-09-17 21:37:29,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=662943.3333333334, ans=0.125 2024-09-17 21:37:35,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=662971.6666666666, ans=0.125 2024-09-17 21:37:38,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=662971.6666666666, ans=0.125 2024-09-17 21:37:55,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=663000.0, ans=0.95 2024-09-17 21:38:22,838 INFO [train.py:1198] (0/2) Epoch 37, batch 3950, loss[loss=0.1819, ctc_loss=0.1172, cr_loss=0.3234, over 20981.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3705, over 4099430.44 frames. ], batch size: 52, lr: 2.23e-03, grad_scale: 8.0 2024-09-17 21:38:38,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=663085.0, ans=0.0 2024-09-17 21:38:54,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=663113.3333333334, ans=0.1 2024-09-17 21:38:54,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=663113.3333333334, ans=0.0 2024-09-17 21:38:57,534 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.207e+02 2.311e+02 2.528e+02 5.049e+02, threshold=4.622e+02, percent-clipped=1.0 2024-09-17 21:39:04,283 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-09-17 21:39:06,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=663141.6666666666, ans=0.125 2024-09-17 21:39:38,202 INFO [train.py:1198] (0/2) Epoch 37, batch 4000, loss[loss=0.1759, ctc_loss=0.1143, cr_loss=0.3083, over 21003.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1467, cr_loss=0.371, over 4082510.58 frames. ], batch size: 48, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:39:49,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=663198.3333333334, ans=0.05 2024-09-17 21:39:53,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=663226.6666666666, ans=0.125 2024-09-17 21:39:55,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=663226.6666666666, ans=0.0 2024-09-17 21:39:58,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2024-09-17 21:39:59,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=12.0 2024-09-17 21:40:14,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=663255.0, ans=0.0 2024-09-17 21:40:20,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=663255.0, ans=0.1 2024-09-17 21:40:40,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=663311.6666666666, ans=0.0 2024-09-17 21:40:48,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=663311.6666666666, ans=0.0 2024-09-17 21:40:53,755 INFO [train.py:1198] (0/2) Epoch 37, batch 4050, loss[loss=0.191, ctc_loss=0.1233, cr_loss=0.3383, over 20966.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1465, cr_loss=0.3702, over 4068501.52 frames. ], batch size: 51, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:41:17,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=663368.3333333334, ans=0.1 2024-09-17 21:41:28,171 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.220e+02 2.429e+02 2.674e+02 7.764e+02, threshold=4.858e+02, percent-clipped=1.0 2024-09-17 21:41:42,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=663425.0, ans=0.2 2024-09-17 21:41:57,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=663453.3333333334, ans=0.025 2024-09-17 21:42:11,846 INFO [train.py:1198] (0/2) Epoch 37, batch 4100, loss[loss=0.2403, ctc_loss=0.1618, cr_loss=0.3923, over 20668.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.148, cr_loss=0.3729, over 4072290.78 frames. ], batch size: 68, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:42:27,354 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:42:51,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=663538.3333333334, ans=0.0 2024-09-17 21:43:21,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663595.0, ans=0.1 2024-09-17 21:43:29,331 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:43:30,512 INFO [train.py:1198] (0/2) Epoch 37, batch 4150, loss[loss=0.2525, ctc_loss=0.1681, cr_loss=0.4219, over 20084.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1477, cr_loss=0.3728, over 4075121.40 frames. ], batch size: 80, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:43:45,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2024-09-17 21:43:49,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=22.5 2024-09-17 21:43:55,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=663651.6666666666, ans=0.125 2024-09-17 21:44:05,608 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.187e+02 2.367e+02 2.509e+02 3.633e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-17 21:44:11,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=663680.0, ans=0.2 2024-09-17 21:44:19,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663708.3333333334, ans=0.1 2024-09-17 21:44:33,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=663736.6666666666, ans=0.0 2024-09-17 21:44:46,557 INFO [train.py:1198] (0/2) Epoch 37, batch 4200, loss[loss=0.1803, ctc_loss=0.1167, cr_loss=0.3182, over 21002.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1465, cr_loss=0.3709, over 4081523.59 frames. ], batch size: 52, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:45:12,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=663793.3333333334, ans=0.125 2024-09-17 21:45:19,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=663821.6666666666, ans=0.125 2024-09-17 21:45:21,867 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=22.5 2024-09-17 21:45:23,318 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.74 vs. limit=6.0 2024-09-17 21:45:59,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.13 vs. limit=10.0 2024-09-17 21:46:00,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=663906.6666666666, ans=0.2 2024-09-17 21:46:02,042 INFO [train.py:1198] (0/2) Epoch 37, batch 4250, loss[loss=0.2385, ctc_loss=0.162, cr_loss=0.3821, over 21053.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3701, over 4085413.06 frames. ], batch size: 56, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:46:29,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=663935.0, ans=0.125 2024-09-17 21:46:33,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=663963.3333333334, ans=0.0 2024-09-17 21:46:35,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663963.3333333334, ans=0.1 2024-09-17 21:46:36,431 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.178e+02 2.281e+02 2.405e+02 4.909e+02, threshold=4.561e+02, percent-clipped=1.0 2024-09-17 21:46:57,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=663991.6666666666, ans=0.125 2024-09-17 21:47:16,617 INFO [train.py:1198] (0/2) Epoch 37, batch 4300, loss[loss=0.2558, ctc_loss=0.1779, cr_loss=0.3895, over 13720.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1473, cr_loss=0.3713, over 4071755.38 frames. ], batch size: 149, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:47:27,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=664048.3333333334, ans=0.125 2024-09-17 21:47:30,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=664076.6666666666, ans=0.125 2024-09-17 21:47:48,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=664105.0, ans=0.0 2024-09-17 21:48:02,239 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:48:08,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=664133.3333333334, ans=0.125 2024-09-17 21:48:15,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=664133.3333333334, ans=0.2 2024-09-17 21:48:17,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=664133.3333333334, ans=0.125 2024-09-17 21:48:28,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664161.6666666666, ans=0.1 2024-09-17 21:48:37,351 INFO [train.py:1198] (0/2) Epoch 37, batch 4350, loss[loss=0.2253, ctc_loss=0.1471, cr_loss=0.3909, over 20961.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1471, cr_loss=0.3715, over 4065811.28 frames. ], batch size: 64, lr: 2.23e-03, grad_scale: 16.0 2024-09-17 21:48:54,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664218.3333333334, ans=0.1 2024-09-17 21:49:11,861 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.242e+02 2.354e+02 2.561e+02 8.205e+02, threshold=4.709e+02, percent-clipped=1.0 2024-09-17 21:49:52,834 INFO [train.py:1198] (0/2) Epoch 37, batch 4400, loss[loss=0.2411, ctc_loss=0.1606, cr_loss=0.4028, over 21008.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1474, cr_loss=0.3725, over 4071692.31 frames. ], batch size: 61, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:50:31,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-09-17 21:50:46,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=664416.6666666666, ans=0.125 2024-09-17 21:50:55,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=664445.0, ans=0.125 2024-09-17 21:51:09,068 INFO [train.py:1198] (0/2) Epoch 37, batch 4450, loss[loss=0.2325, ctc_loss=0.1517, cr_loss=0.4039, over 20321.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1472, cr_loss=0.3727, over 4073877.46 frames. ], batch size: 74, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:51:42,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=664530.0, ans=0.0 2024-09-17 21:51:42,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=664530.0, ans=10.0 2024-09-17 21:51:43,507 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.217e+02 2.386e+02 2.583e+02 3.614e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-17 21:51:59,394 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2024-09-17 21:52:06,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=664558.3333333334, ans=0.025 2024-09-17 21:52:14,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=664586.6666666666, ans=0.0 2024-09-17 21:52:23,596 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-17 21:52:24,298 INFO [train.py:1198] (0/2) Epoch 37, batch 4500, loss[loss=0.2422, ctc_loss=0.1614, cr_loss=0.404, over 19928.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1475, cr_loss=0.3731, over 4082300.31 frames. ], batch size: 80, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:52:49,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=664643.3333333334, ans=0.04949747468305833 2024-09-17 21:52:51,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=664643.3333333334, ans=0.0 2024-09-17 21:53:42,694 INFO [train.py:1198] (0/2) Epoch 37, batch 4550, loss[loss=0.2479, ctc_loss=0.1654, cr_loss=0.4126, over 19456.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1473, cr_loss=0.3728, over 4092647.02 frames. ], batch size: 90, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:53:48,439 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=15.0 2024-09-17 21:53:49,519 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=22.5 2024-09-17 21:53:58,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=664785.0, ans=0.125 2024-09-17 21:54:20,114 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.176e+02 2.287e+02 2.438e+02 2.934e+02, threshold=4.574e+02, percent-clipped=0.0 2024-09-17 21:54:25,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-17 21:54:37,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=664841.6666666666, ans=0.035 2024-09-17 21:54:47,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=664870.0, ans=0.0 2024-09-17 21:55:01,075 INFO [train.py:1198] (0/2) Epoch 37, batch 4600, loss[loss=0.2385, ctc_loss=0.1594, cr_loss=0.3957, over 21045.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1481, cr_loss=0.3745, over 4084700.15 frames. ], batch size: 62, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:55:08,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=664898.3333333334, ans=0.0 2024-09-17 21:55:36,097 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.91 vs. limit=10.0 2024-09-17 21:55:53,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=664983.3333333334, ans=0.0 2024-09-17 21:56:15,996 INFO [train.py:1198] (0/2) Epoch 37, batch 4650, loss[loss=0.2257, ctc_loss=0.1484, cr_loss=0.3864, over 20659.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.148, cr_loss=0.3738, over 4075848.51 frames. ], batch size: 66, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:56:22,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=665040.0, ans=0.0 2024-09-17 21:56:51,252 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.161e+02 2.292e+02 2.525e+02 5.728e+02, threshold=4.584e+02, percent-clipped=1.0 2024-09-17 21:57:09,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.14 vs. limit=10.0 2024-09-17 21:57:10,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=665125.0, ans=0.2 2024-09-17 21:57:32,626 INFO [train.py:1198] (0/2) Epoch 37, batch 4700, loss[loss=0.2417, ctc_loss=0.1629, cr_loss=0.3943, over 20679.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1479, cr_loss=0.3738, over 4082458.55 frames. ], batch size: 68, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:57:58,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=665210.0, ans=0.125 2024-09-17 21:58:03,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=665238.3333333334, ans=0.025 2024-09-17 21:58:21,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=665266.6666666666, ans=0.0 2024-09-17 21:58:25,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665266.6666666666, ans=0.1 2024-09-17 21:58:48,810 INFO [train.py:1198] (0/2) Epoch 37, batch 4750, loss[loss=0.205, ctc_loss=0.1345, cr_loss=0.3529, over 20967.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1468, cr_loss=0.3719, over 4082114.51 frames. ], batch size: 51, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 21:59:04,742 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.41 vs. limit=22.5 2024-09-17 21:59:05,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=665351.6666666666, ans=0.2 2024-09-17 21:59:26,977 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.152e+02 2.270e+02 2.499e+02 4.006e+02, threshold=4.539e+02, percent-clipped=0.0 2024-09-17 22:00:05,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=665436.6666666666, ans=0.1 2024-09-17 22:00:11,167 INFO [train.py:1198] (0/2) Epoch 37, batch 4800, loss[loss=0.2186, ctc_loss=0.1445, cr_loss=0.3706, over 21043.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.371, over 4091875.89 frames. ], batch size: 63, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:00:19,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.11 vs. limit=6.0 2024-09-17 22:00:32,256 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.79 vs. limit=10.0 2024-09-17 22:00:33,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=665493.3333333334, ans=0.0 2024-09-17 22:00:33,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=665493.3333333334, ans=0.125 2024-09-17 22:01:27,386 INFO [train.py:1198] (0/2) Epoch 37, batch 4850, loss[loss=0.2312, ctc_loss=0.1517, cr_loss=0.3979, over 20991.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3711, over 4093349.03 frames. ], batch size: 58, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:01:50,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=665635.0, ans=0.125 2024-09-17 22:01:54,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=665635.0, ans=0.125 2024-09-17 22:02:01,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.203e+02 2.325e+02 2.450e+02 3.800e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-17 22:02:06,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=665663.3333333334, ans=0.0 2024-09-17 22:02:41,859 INFO [train.py:1198] (0/2) Epoch 37, batch 4900, loss[loss=0.2172, ctc_loss=0.1424, cr_loss=0.3741, over 21045.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3717, over 4090753.91 frames. ], batch size: 56, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:02:58,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=665776.6666666666, ans=0.0 2024-09-17 22:03:55,954 INFO [train.py:1198] (0/2) Epoch 37, batch 4950, loss[loss=0.1731, ctc_loss=0.1125, cr_loss=0.303, over 20985.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.3708, over 4102451.10 frames. ], batch size: 50, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:04:16,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.01 vs. limit=10.0 2024-09-17 22:04:30,624 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.177e+02 2.285e+02 2.547e+02 3.070e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-17 22:04:50,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=665975.0, ans=0.0 2024-09-17 22:04:56,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666003.3333333334, ans=0.1 2024-09-17 22:05:06,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=666003.3333333334, ans=0.05 2024-09-17 22:05:11,175 INFO [train.py:1198] (0/2) Epoch 37, batch 5000, loss[loss=0.2271, ctc_loss=0.149, cr_loss=0.3905, over 20978.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3704, over 4103241.34 frames. ], batch size: 58, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:05:14,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=666031.6666666666, ans=0.025 2024-09-17 22:05:27,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=666060.0, ans=0.2 2024-09-17 22:05:35,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=666060.0, ans=0.125 2024-09-17 22:05:49,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-09-17 22:05:54,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=666116.6666666666, ans=0.125 2024-09-17 22:06:12,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=666145.0, ans=0.125 2024-09-17 22:06:12,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=666145.0, ans=0.125 2024-09-17 22:06:25,054 INFO [train.py:1198] (0/2) Epoch 37, batch 5050, loss[loss=0.2366, ctc_loss=0.1566, cr_loss=0.4001, over 20933.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1452, cr_loss=0.3708, over 4110308.34 frames. ], batch size: 60, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:06:42,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=666201.6666666666, ans=0.125 2024-09-17 22:06:59,477 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.161e+02 2.332e+02 2.471e+02 3.064e+02, threshold=4.665e+02, percent-clipped=0.0 2024-09-17 22:07:26,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=666286.6666666666, ans=0.0 2024-09-17 22:07:37,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=22.5 2024-09-17 22:07:42,251 INFO [train.py:1198] (0/2) Epoch 37, batch 5100, loss[loss=0.2149, ctc_loss=0.1402, cr_loss=0.3735, over 20652.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3705, over 4111685.75 frames. ], batch size: 68, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:07:51,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=666315.0, ans=0.125 2024-09-17 22:08:29,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666400.0, ans=0.1 2024-09-17 22:08:48,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=666428.3333333334, ans=0.125 2024-09-17 22:08:58,272 INFO [train.py:1198] (0/2) Epoch 37, batch 5150, loss[loss=0.2115, ctc_loss=0.1363, cr_loss=0.3761, over 21048.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3713, over 4086837.32 frames. ], batch size: 56, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:08:58,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=666456.6666666666, ans=0.0 2024-09-17 22:09:33,865 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.160e+02 2.290e+02 2.458e+02 3.166e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-17 22:09:37,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=666513.3333333334, ans=0.07 2024-09-17 22:09:44,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=666541.6666666666, ans=12.0 2024-09-17 22:09:53,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=666541.6666666666, ans=0.125 2024-09-17 22:09:54,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=666541.6666666666, ans=0.125 2024-09-17 22:10:12,630 INFO [train.py:1198] (0/2) Epoch 37, batch 5200, loss[loss=0.2001, ctc_loss=0.1317, cr_loss=0.3422, over 20978.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1458, cr_loss=0.3706, over 4090553.97 frames. ], batch size: 51, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:11:04,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2024-09-17 22:11:23,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=666711.6666666666, ans=0.0 2024-09-17 22:11:27,348 INFO [train.py:1198] (0/2) Epoch 37, batch 5250, loss[loss=0.241, ctc_loss=0.1618, cr_loss=0.3963, over 20664.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1463, cr_loss=0.371, over 4090081.34 frames. ], batch size: 71, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:11:30,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666740.0, ans=0.1 2024-09-17 22:11:32,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666740.0, ans=0.1 2024-09-17 22:11:51,282 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:11:54,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=666768.3333333334, ans=0.125 2024-09-17 22:12:02,766 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.131e+02 2.273e+02 2.440e+02 3.297e+02, threshold=4.545e+02, percent-clipped=0.0 2024-09-17 22:12:12,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=666825.0, ans=0.125 2024-09-17 22:12:16,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-09-17 22:12:23,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=666825.0, ans=0.125 2024-09-17 22:12:27,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-17 22:12:37,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=666853.3333333334, ans=0.125 2024-09-17 22:12:41,255 INFO [train.py:1198] (0/2) Epoch 37, batch 5300, loss[loss=0.175, ctc_loss=0.1134, cr_loss=0.3081, over 20977.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1466, cr_loss=0.3712, over 4096384.71 frames. ], batch size: 49, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:12:59,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666910.0, ans=0.1 2024-09-17 22:13:13,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=666938.3333333334, ans=0.025 2024-09-17 22:13:21,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-09-17 22:13:56,345 INFO [train.py:1198] (0/2) Epoch 37, batch 5350, loss[loss=0.2332, ctc_loss=0.1579, cr_loss=0.3767, over 18104.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3704, over 4093109.76 frames. ], batch size: 108, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:13:58,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2024-09-17 22:14:28,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=667080.0, ans=0.2 2024-09-17 22:14:32,459 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.196e+02 2.345e+02 2.554e+02 4.902e+02, threshold=4.690e+02, percent-clipped=1.0 2024-09-17 22:15:11,287 INFO [train.py:1198] (0/2) Epoch 37, batch 5400, loss[loss=0.2126, ctc_loss=0.1407, cr_loss=0.3596, over 19988.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.3701, over 4100930.45 frames. ], batch size: 80, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:15:30,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.74 vs. limit=15.0 2024-09-17 22:15:50,659 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-09-17 22:15:52,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=667221.6666666666, ans=0.125 2024-09-17 22:16:27,628 INFO [train.py:1198] (0/2) Epoch 37, batch 5450, loss[loss=0.21, ctc_loss=0.1387, cr_loss=0.3565, over 21021.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1468, cr_loss=0.3712, over 4086505.84 frames. ], batch size: 61, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:16:44,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=667335.0, ans=0.125 2024-09-17 22:17:03,471 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.225e+02 2.339e+02 2.539e+02 5.737e+02, threshold=4.679e+02, percent-clipped=1.0 2024-09-17 22:17:17,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=667391.6666666666, ans=0.125 2024-09-17 22:17:25,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-17 22:17:35,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=667420.0, ans=0.125 2024-09-17 22:17:37,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=667420.0, ans=0.2 2024-09-17 22:17:43,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=667448.3333333334, ans=0.035 2024-09-17 22:17:44,292 INFO [train.py:1198] (0/2) Epoch 37, batch 5500, loss[loss=0.2379, ctc_loss=0.1621, cr_loss=0.3792, over 20976.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1476, cr_loss=0.3727, over 4085104.78 frames. ], batch size: 64, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:17:46,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=667448.3333333334, ans=0.0 2024-09-17 22:17:59,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=667476.6666666666, ans=0.125 2024-09-17 22:18:21,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=667505.0, ans=0.035 2024-09-17 22:18:58,546 INFO [train.py:1198] (0/2) Epoch 37, batch 5550, loss[loss=0.205, ctc_loss=0.134, cr_loss=0.3548, over 20972.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.147, cr_loss=0.3719, over 4092245.26 frames. ], batch size: 55, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:19:34,189 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.193e+02 2.325e+02 2.504e+02 3.181e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-17 22:19:34,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667646.6666666666, ans=0.1 2024-09-17 22:19:40,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=667646.6666666666, ans=0.2 2024-09-17 22:19:46,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=667675.0, ans=0.125 2024-09-17 22:19:51,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=667675.0, ans=0.025 2024-09-17 22:19:56,018 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-17 22:20:13,120 INFO [train.py:1198] (0/2) Epoch 37, batch 5600, loss[loss=0.2094, ctc_loss=0.1367, cr_loss=0.3635, over 20777.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.371, over 4091712.06 frames. ], batch size: 56, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:20:23,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=667731.6666666666, ans=0.0 2024-09-17 22:21:01,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.05 vs. limit=22.5 2024-09-17 22:21:27,416 INFO [train.py:1198] (0/2) Epoch 37, batch 5650, loss[loss=0.2347, ctc_loss=0.1538, cr_loss=0.4046, over 20723.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1466, cr_loss=0.3711, over 4090848.42 frames. ], batch size: 71, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:21:54,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2024-09-17 22:22:01,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667930.0, ans=0.1 2024-09-17 22:22:02,972 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.187e+02 2.295e+02 2.506e+02 3.366e+02, threshold=4.589e+02, percent-clipped=0.0 2024-09-17 22:22:09,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=667930.0, ans=0.125 2024-09-17 22:22:18,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=667958.3333333334, ans=0.2 2024-09-17 22:22:42,210 INFO [train.py:1198] (0/2) Epoch 37, batch 5700, loss[loss=0.2344, ctc_loss=0.1595, cr_loss=0.3743, over 20834.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1472, cr_loss=0.3717, over 4095311.88 frames. ], batch size: 65, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:22:51,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-17 22:23:09,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668043.3333333334, ans=0.1 2024-09-17 22:23:57,260 INFO [train.py:1198] (0/2) Epoch 37, batch 5750, loss[loss=0.2237, ctc_loss=0.1488, cr_loss=0.3742, over 21032.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3725, over 4083338.87 frames. ], batch size: 61, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:24:13,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=668185.0, ans=0.125 2024-09-17 22:24:19,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=668185.0, ans=0.125 2024-09-17 22:24:34,419 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.198e+02 2.336e+02 2.545e+02 7.095e+02, threshold=4.671e+02, percent-clipped=1.0 2024-09-17 22:24:40,457 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:25:00,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668270.0, ans=0.1 2024-09-17 22:25:08,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.91 vs. limit=15.0 2024-09-17 22:25:13,534 INFO [train.py:1198] (0/2) Epoch 37, batch 5800, loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.367, over 20898.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1478, cr_loss=0.3739, over 4089361.20 frames. ], batch size: 57, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:25:25,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=668298.3333333334, ans=0.125 2024-09-17 22:25:33,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=668326.6666666666, ans=0.035 2024-09-17 22:25:42,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=668355.0, ans=0.125 2024-09-17 22:25:43,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=668355.0, ans=0.2 2024-09-17 22:26:30,762 INFO [train.py:1198] (0/2) Epoch 37, batch 5850, loss[loss=0.2365, ctc_loss=0.1563, cr_loss=0.4008, over 20307.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1475, cr_loss=0.3745, over 4094251.89 frames. ], batch size: 74, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:27:07,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=668496.6666666666, ans=0.125 2024-09-17 22:27:08,090 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.226e+02 2.364e+02 2.504e+02 4.614e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-17 22:27:17,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=668525.0, ans=0.125 2024-09-17 22:27:45,345 INFO [train.py:1198] (0/2) Epoch 37, batch 5900, loss[loss=0.2371, ctc_loss=0.1565, cr_loss=0.4029, over 20665.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.148, cr_loss=0.3746, over 4084873.18 frames. ], batch size: 66, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:28:12,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668610.0, ans=0.1 2024-09-17 22:28:27,952 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-236000.pt 2024-09-17 22:28:32,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=668666.6666666666, ans=0.0 2024-09-17 22:28:48,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=668695.0, ans=0.07 2024-09-17 22:28:51,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668695.0, ans=0.1 2024-09-17 22:29:00,174 INFO [train.py:1198] (0/2) Epoch 37, batch 5950, loss[loss=0.2375, ctc_loss=0.1573, cr_loss=0.4008, over 20969.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1478, cr_loss=0.3738, over 4063567.51 frames. ], batch size: 64, lr: 2.22e-03, grad_scale: 16.0 2024-09-17 22:29:03,735 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=22.5 2024-09-17 22:29:10,001 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-09-17 22:29:12,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668723.3333333334, ans=0.1 2024-09-17 22:29:27,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2024-09-17 22:29:37,463 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.245e+02 2.386e+02 2.631e+02 5.409e+02, threshold=4.771e+02, percent-clipped=1.0 2024-09-17 22:29:54,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=668808.3333333334, ans=0.125 2024-09-17 22:30:15,184 INFO [train.py:1198] (0/2) Epoch 37, batch 6000, loss[loss=0.1823, ctc_loss=0.118, cr_loss=0.3218, over 20983.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1475, cr_loss=0.3727, over 4067715.98 frames. ], batch size: 48, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:30:15,185 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 22:30:38,354 INFO [train.py:1230] (0/2) Epoch 37, validation: loss=0.03929, ctc_loss=0.03929, cr_loss=1.435e-14, over 944034.00 frames. 2024-09-17 22:30:38,355 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 22:30:56,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668893.3333333334, ans=0.1 2024-09-17 22:31:53,353 INFO [train.py:1198] (0/2) Epoch 37, batch 6050, loss[loss=0.2215, ctc_loss=0.1466, cr_loss=0.3742, over 20707.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1478, cr_loss=0.3731, over 4059452.64 frames. ], batch size: 71, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:32:32,114 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.234e+02 2.378e+02 2.546e+02 3.378e+02, threshold=4.756e+02, percent-clipped=0.0 2024-09-17 22:33:09,256 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=22.5 2024-09-17 22:33:09,903 INFO [train.py:1198] (0/2) Epoch 37, batch 6100, loss[loss=0.2274, ctc_loss=0.1493, cr_loss=0.3904, over 20952.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1467, cr_loss=0.3717, over 4070604.36 frames. ], batch size: 60, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:33:10,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-09-17 22:33:26,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=22.5 2024-09-17 22:33:45,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=669205.0, ans=0.125 2024-09-17 22:33:48,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=669205.0, ans=0.5 2024-09-17 22:34:00,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=669233.3333333334, ans=10.0 2024-09-17 22:34:24,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=669290.0, ans=0.0 2024-09-17 22:34:25,354 INFO [train.py:1198] (0/2) Epoch 37, batch 6150, loss[loss=0.2901, ctc_loss=0.2033, cr_loss=0.4338, over 14165.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1473, cr_loss=0.373, over 4059946.47 frames. ], batch size: 149, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:34:55,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=669346.6666666666, ans=0.04949747468305833 2024-09-17 22:35:02,921 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.191e+02 2.355e+02 2.514e+02 3.195e+02, threshold=4.711e+02, percent-clipped=0.0 2024-09-17 22:35:16,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=669375.0, ans=0.125 2024-09-17 22:35:40,040 INFO [train.py:1198] (0/2) Epoch 37, batch 6200, loss[loss=0.243, ctc_loss=0.167, cr_loss=0.3798, over 18254.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1469, cr_loss=0.372, over 4053966.93 frames. ], batch size: 108, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:35:40,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=669431.6666666666, ans=0.125 2024-09-17 22:36:04,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=669460.0, ans=0.025 2024-09-17 22:36:10,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=669488.3333333334, ans=0.0 2024-09-17 22:36:51,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=669545.0, ans=0.0 2024-09-17 22:36:55,270 INFO [train.py:1198] (0/2) Epoch 37, batch 6250, loss[loss=0.1996, ctc_loss=0.1303, cr_loss=0.3466, over 19916.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1469, cr_loss=0.371, over 4022351.41 frames. ], batch size: 44, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:37:13,814 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:37:29,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=669630.0, ans=0.125 2024-09-17 22:37:33,624 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.166e+02 2.359e+02 2.612e+02 4.310e+02, threshold=4.719e+02, percent-clipped=0.0 2024-09-17 22:37:47,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-17 22:37:54,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=669686.6666666666, ans=0.0 2024-09-17 22:38:10,711 INFO [train.py:1198] (0/2) Epoch 37, batch 6300, loss[loss=0.2647, ctc_loss=0.1857, cr_loss=0.3949, over 14722.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1451, cr_loss=0.3671, over 4019335.81 frames. ], batch size: 149, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:38:49,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=669771.6666666666, ans=0.1 2024-09-17 22:39:12,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-17 22:39:12,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-09-17 22:39:23,805 INFO [train.py:1198] (0/2) Epoch 37, batch 6350, loss[loss=0.2632, ctc_loss=0.1877, cr_loss=0.3775, over 14019.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.145, cr_loss=0.3652, over 3967555.33 frames. ], batch size: 151, lr: 2.22e-03, grad_scale: 32.0 2024-09-17 22:39:27,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=669856.6666666666, ans=0.0 2024-09-17 22:40:00,739 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.350e+02 2.481e+02 2.801e+02 3.678e+02, threshold=4.962e+02, percent-clipped=0.0 2024-09-17 22:40:19,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=669941.6666666666, ans=0.2 2024-09-17 22:40:23,443 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-37.pt 2024-09-17 22:41:13,629 INFO [train.py:1198] (0/2) Epoch 38, batch 0, loss[loss=0.2091, ctc_loss=0.136, cr_loss=0.3655, over 20888.00 frames. ], tot_loss[loss=0.2091, ctc_loss=0.136, cr_loss=0.3655, over 20888.00 frames. ], batch size: 54, lr: 2.19e-03, grad_scale: 32.0 2024-09-17 22:41:13,630 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 22:41:32,172 INFO [train.py:1230] (0/2) Epoch 38, validation: loss=0.03921, ctc_loss=0.03921, cr_loss=1.412e-14, over 944034.00 frames. 2024-09-17 22:41:32,172 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 22:41:41,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=669972.8333333334, ans=0.1 2024-09-17 22:41:56,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670001.1666666666, ans=0.1 2024-09-17 22:42:12,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670029.5, ans=0.1 2024-09-17 22:42:47,764 INFO [train.py:1198] (0/2) Epoch 38, batch 50, loss[loss=0.2282, ctc_loss=0.1485, cr_loss=0.3988, over 20992.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1481, cr_loss=0.3732, over 918399.33 frames. ], batch size: 55, lr: 2.19e-03, grad_scale: 32.0 2024-09-17 22:43:01,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=670142.8333333334, ans=0.125 2024-09-17 22:43:19,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-17 22:43:38,569 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.211e+02 2.385e+02 2.550e+02 3.140e+02, threshold=4.771e+02, percent-clipped=0.0 2024-09-17 22:44:02,853 INFO [train.py:1198] (0/2) Epoch 38, batch 100, loss[loss=0.2103, ctc_loss=0.1386, cr_loss=0.3586, over 20820.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1474, cr_loss=0.3719, over 1622201.06 frames. ], batch size: 59, lr: 2.19e-03, grad_scale: 32.0 2024-09-17 22:44:16,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=670284.5, ans=0.2 2024-09-17 22:44:18,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=670284.5, ans=0.1 2024-09-17 22:44:53,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=670341.1666666666, ans=0.0 2024-09-17 22:44:54,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=670341.1666666666, ans=0.0 2024-09-17 22:44:58,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.83 vs. limit=10.0 2024-09-17 22:45:21,400 INFO [train.py:1198] (0/2) Epoch 38, batch 150, loss[loss=0.2392, ctc_loss=0.159, cr_loss=0.4013, over 20945.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1465, cr_loss=0.3698, over 2165141.31 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:45:32,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=670397.8333333334, ans=0.05 2024-09-17 22:46:12,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=670482.8333333334, ans=0.0 2024-09-17 22:46:16,434 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.222e+02 2.379e+02 2.547e+02 3.122e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-17 22:46:40,691 INFO [train.py:1198] (0/2) Epoch 38, batch 200, loss[loss=0.1862, ctc_loss=0.1196, cr_loss=0.3328, over 20969.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1454, cr_loss=0.3695, over 2593762.58 frames. ], batch size: 49, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:47:29,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670624.5, ans=0.1 2024-09-17 22:47:34,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=670624.5, ans=0.125 2024-09-17 22:47:56,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.91 vs. limit=15.0 2024-09-17 22:47:56,780 INFO [train.py:1198] (0/2) Epoch 38, batch 250, loss[loss=0.2064, ctc_loss=0.137, cr_loss=0.347, over 20986.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1456, cr_loss=0.3701, over 2932172.50 frames. ], batch size: 52, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:48:31,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=670737.8333333334, ans=0.025 2024-09-17 22:48:48,002 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.159e+02 2.295e+02 2.480e+02 4.048e+02, threshold=4.591e+02, percent-clipped=0.0 2024-09-17 22:48:48,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=670766.1666666666, ans=0.0 2024-09-17 22:49:12,042 INFO [train.py:1198] (0/2) Epoch 38, batch 300, loss[loss=0.2431, ctc_loss=0.163, cr_loss=0.4009, over 20803.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1454, cr_loss=0.3687, over 3191304.79 frames. ], batch size: 65, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:49:30,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=670851.1666666666, ans=0.0 2024-09-17 22:49:48,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=670879.5, ans=10.0 2024-09-17 22:50:04,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=670907.8333333334, ans=0.0 2024-09-17 22:50:26,997 INFO [train.py:1198] (0/2) Epoch 38, batch 350, loss[loss=0.1956, ctc_loss=0.1288, cr_loss=0.3339, over 21060.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.145, cr_loss=0.3688, over 3406289.08 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:50:28,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=670964.5, ans=0.2 2024-09-17 22:50:30,786 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-17 22:51:06,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=671021.1666666666, ans=0.2 2024-09-17 22:51:17,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=671049.5, ans=0.0 2024-09-17 22:51:24,826 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.251e+02 2.372e+02 2.515e+02 3.125e+02, threshold=4.745e+02, percent-clipped=0.0 2024-09-17 22:51:37,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=671077.8333333334, ans=0.2 2024-09-17 22:51:49,265 INFO [train.py:1198] (0/2) Epoch 38, batch 400, loss[loss=0.1963, ctc_loss=0.1277, cr_loss=0.3431, over 20933.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1453, cr_loss=0.3701, over 3574646.91 frames. ], batch size: 51, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:52:13,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=671134.5, ans=0.125 2024-09-17 22:53:03,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=671247.8333333334, ans=0.125 2024-09-17 22:53:04,951 INFO [train.py:1198] (0/2) Epoch 38, batch 450, loss[loss=0.2285, ctc_loss=0.1496, cr_loss=0.3947, over 20667.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1447, cr_loss=0.3692, over 3697416.84 frames. ], batch size: 68, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:53:15,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=671247.8333333334, ans=0.125 2024-09-17 22:53:42,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=671304.5, ans=0.035 2024-09-17 22:53:56,314 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.208e+02 2.337e+02 2.516e+02 3.232e+02, threshold=4.673e+02, percent-clipped=0.0 2024-09-17 22:54:05,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=671361.1666666666, ans=0.125 2024-09-17 22:54:10,545 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-17 22:54:20,287 INFO [train.py:1198] (0/2) Epoch 38, batch 500, loss[loss=0.2373, ctc_loss=0.1542, cr_loss=0.4155, over 20976.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1456, cr_loss=0.3711, over 3792709.24 frames. ], batch size: 64, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:54:38,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=671417.8333333334, ans=0.125 2024-09-17 22:54:44,145 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-17 22:55:10,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=671474.5, ans=0.0 2024-09-17 22:55:20,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=671502.8333333334, ans=0.0 2024-09-17 22:55:35,784 INFO [train.py:1198] (0/2) Epoch 38, batch 550, loss[loss=0.1635, ctc_loss=0.107, cr_loss=0.2825, over 19955.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3717, over 3852904.42 frames. ], batch size: 44, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:55:44,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2024-09-17 22:56:10,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=671587.8333333334, ans=0.125 2024-09-17 22:56:22,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=671616.1666666666, ans=0.125 2024-09-17 22:56:26,732 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.208e+02 2.346e+02 2.515e+02 4.379e+02, threshold=4.691e+02, percent-clipped=0.0 2024-09-17 22:56:51,076 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:56:53,721 INFO [train.py:1198] (0/2) Epoch 38, batch 600, loss[loss=0.2527, ctc_loss=0.1717, cr_loss=0.4048, over 20298.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1464, cr_loss=0.3715, over 3910429.66 frames. ], batch size: 74, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:57:06,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671672.8333333334, ans=0.1 2024-09-17 22:57:41,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.09 vs. limit=10.0 2024-09-17 22:58:04,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=671786.1666666666, ans=0.2 2024-09-17 22:58:09,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671786.1666666666, ans=0.1 2024-09-17 22:58:12,046 INFO [train.py:1198] (0/2) Epoch 38, batch 650, loss[loss=0.1757, ctc_loss=0.114, cr_loss=0.3087, over 20946.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1472, cr_loss=0.3733, over 3948076.01 frames. ], batch size: 49, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:58:40,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671871.1666666666, ans=0.1 2024-09-17 22:58:47,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=671871.1666666666, ans=0.125 2024-09-17 22:58:55,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=671899.5, ans=0.2 2024-09-17 22:59:02,894 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.175e+02 2.346e+02 2.642e+02 3.414e+02, threshold=4.692e+02, percent-clipped=0.0 2024-09-17 22:59:07,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=671899.5, ans=0.2 2024-09-17 22:59:27,139 INFO [train.py:1198] (0/2) Epoch 38, batch 700, loss[loss=0.2314, ctc_loss=0.1554, cr_loss=0.3799, over 20695.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1464, cr_loss=0.3714, over 3969458.13 frames. ], batch size: 68, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 22:59:50,686 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=22.5 2024-09-17 23:00:10,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=672012.8333333334, ans=0.2 2024-09-17 23:00:17,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=672041.1666666666, ans=0.025 2024-09-17 23:00:43,070 INFO [train.py:1198] (0/2) Epoch 38, batch 750, loss[loss=0.2539, ctc_loss=0.1714, cr_loss=0.4122, over 20673.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1456, cr_loss=0.3705, over 3999481.11 frames. ], batch size: 71, lr: 2.18e-03, grad_scale: 16.0 2024-09-17 23:00:43,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=672097.8333333334, ans=0.125 2024-09-17 23:01:34,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=672182.8333333334, ans=0.125 2024-09-17 23:01:35,914 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.174e+02 2.297e+02 2.482e+02 4.051e+02, threshold=4.593e+02, percent-clipped=0.0 2024-09-17 23:01:41,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-17 23:01:44,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-17 23:01:58,543 INFO [train.py:1198] (0/2) Epoch 38, batch 800, loss[loss=0.1904, ctc_loss=0.1237, cr_loss=0.3334, over 20983.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1461, cr_loss=0.3711, over 4019802.86 frames. ], batch size: 55, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:02:01,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=672239.5, ans=0.2 2024-09-17 23:02:25,044 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2024-09-17 23:02:58,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=672324.5, ans=0.125 2024-09-17 23:03:03,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=672324.5, ans=0.125 2024-09-17 23:03:20,761 INFO [train.py:1198] (0/2) Epoch 38, batch 850, loss[loss=0.2204, ctc_loss=0.1465, cr_loss=0.3695, over 21077.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1455, cr_loss=0.37, over 4031693.34 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:03:26,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-09-17 23:03:42,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2024-09-17 23:03:43,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=672409.5, ans=0.07 2024-09-17 23:03:49,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=672437.8333333334, ans=0.125 2024-09-17 23:04:06,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=672466.1666666666, ans=0.0 2024-09-17 23:04:09,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=672466.1666666666, ans=0.125 2024-09-17 23:04:14,034 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.187e+02 2.328e+02 2.452e+02 3.011e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-17 23:04:33,076 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.10 vs. limit=10.0 2024-09-17 23:04:36,904 INFO [train.py:1198] (0/2) Epoch 38, batch 900, loss[loss=0.1985, ctc_loss=0.1308, cr_loss=0.3389, over 20870.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3684, over 4044963.57 frames. ], batch size: 57, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:04:43,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=672522.8333333334, ans=0.0 2024-09-17 23:04:59,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=672551.1666666666, ans=0.2 2024-09-17 23:05:10,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=672579.5, ans=0.1 2024-09-17 23:05:34,631 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2024-09-17 23:05:52,249 INFO [train.py:1198] (0/2) Epoch 38, batch 950, loss[loss=0.2036, ctc_loss=0.1316, cr_loss=0.3598, over 19942.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1456, cr_loss=0.3704, over 4057317.89 frames. ], batch size: 44, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:06:45,142 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.170e+02 2.334e+02 2.474e+02 2.835e+02, threshold=4.668e+02, percent-clipped=0.0 2024-09-17 23:07:07,983 INFO [train.py:1198] (0/2) Epoch 38, batch 1000, loss[loss=0.2094, ctc_loss=0.1373, cr_loss=0.3607, over 20998.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1461, cr_loss=0.3718, over 4067009.17 frames. ], batch size: 51, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:07:08,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=672806.1666666666, ans=0.125 2024-09-17 23:07:21,384 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:07:48,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=672862.8333333334, ans=6.0 2024-09-17 23:07:57,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=672891.1666666666, ans=0.04949747468305833 2024-09-17 23:08:01,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=672891.1666666666, ans=0.0 2024-09-17 23:08:06,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=672891.1666666666, ans=0.125 2024-09-17 23:08:22,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=672919.5, ans=0.04949747468305833 2024-09-17 23:08:28,685 INFO [train.py:1198] (0/2) Epoch 38, batch 1050, loss[loss=0.2359, ctc_loss=0.1563, cr_loss=0.3981, over 20371.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1465, cr_loss=0.3719, over 4056277.83 frames. ], batch size: 74, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:08:29,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=15.0 2024-09-17 23:08:39,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=672947.8333333334, ans=0.125 2024-09-17 23:09:19,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-17 23:09:21,434 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.197e+02 2.300e+02 2.459e+02 4.453e+02, threshold=4.600e+02, percent-clipped=0.0 2024-09-17 23:09:43,975 INFO [train.py:1198] (0/2) Epoch 38, batch 1100, loss[loss=0.2035, ctc_loss=0.1324, cr_loss=0.3554, over 20803.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1467, cr_loss=0.3724, over 4055977.00 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:10:41,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=673174.5, ans=0.1 2024-09-17 23:10:49,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=22.5 2024-09-17 23:10:59,318 INFO [train.py:1198] (0/2) Epoch 38, batch 1150, loss[loss=0.2062, ctc_loss=0.1335, cr_loss=0.3636, over 20769.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1468, cr_loss=0.3725, over 4068701.78 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:11:04,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=673231.1666666666, ans=0.2 2024-09-17 23:11:06,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=673231.1666666666, ans=0.0 2024-09-17 23:11:14,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-17 23:11:52,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.203e+02 2.311e+02 2.439e+02 3.422e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-17 23:11:56,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=673316.1666666666, ans=0.07 2024-09-17 23:12:15,313 INFO [train.py:1198] (0/2) Epoch 38, batch 1200, loss[loss=0.1975, ctc_loss=0.1292, cr_loss=0.3419, over 20997.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1463, cr_loss=0.3718, over 4074535.68 frames. ], batch size: 51, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:13:30,909 INFO [train.py:1198] (0/2) Epoch 38, batch 1250, loss[loss=0.1972, ctc_loss=0.1309, cr_loss=0.3316, over 21050.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.146, cr_loss=0.3718, over 4076813.44 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:13:48,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=673542.8333333334, ans=0.0 2024-09-17 23:14:23,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=673599.5, ans=0.0 2024-09-17 23:14:29,619 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.193e+02 2.361e+02 2.546e+02 3.304e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-17 23:14:52,188 INFO [train.py:1198] (0/2) Epoch 38, batch 1300, loss[loss=0.2118, ctc_loss=0.1399, cr_loss=0.3594, over 20778.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1458, cr_loss=0.371, over 4083349.34 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:15:12,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=673684.5, ans=0.2 2024-09-17 23:15:15,787 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.57 vs. limit=10.0 2024-09-17 23:15:18,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=673684.5, ans=0.125 2024-09-17 23:15:44,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=673741.1666666666, ans=0.0 2024-09-17 23:16:08,415 INFO [train.py:1198] (0/2) Epoch 38, batch 1350, loss[loss=0.2351, ctc_loss=0.1553, cr_loss=0.3992, over 20976.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3715, over 4088647.14 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:16:20,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=673797.8333333334, ans=10.0 2024-09-17 23:16:29,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=673826.1666666666, ans=0.125 2024-09-17 23:16:56,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673882.8333333334, ans=0.1 2024-09-17 23:17:01,204 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.188e+02 2.301e+02 2.421e+02 3.432e+02, threshold=4.601e+02, percent-clipped=0.0 2024-09-17 23:17:05,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=673882.8333333334, ans=0.015 2024-09-17 23:17:21,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=673911.1666666666, ans=0.125 2024-09-17 23:17:24,003 INFO [train.py:1198] (0/2) Epoch 38, batch 1400, loss[loss=0.2639, ctc_loss=0.1753, cr_loss=0.443, over 20652.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1464, cr_loss=0.3723, over 4082045.27 frames. ], batch size: 68, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:17:26,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.18 vs. limit=6.0 2024-09-17 23:17:40,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=673967.8333333334, ans=0.125 2024-09-17 23:18:20,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-09-17 23:18:24,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=674052.8333333334, ans=0.125 2024-09-17 23:18:39,998 INFO [train.py:1198] (0/2) Epoch 38, batch 1450, loss[loss=0.1834, ctc_loss=0.1206, cr_loss=0.3138, over 20958.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1468, cr_loss=0.3723, over 4074086.22 frames. ], batch size: 50, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:18:52,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=674081.1666666666, ans=0.0 2024-09-17 23:18:54,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=8.0 2024-09-17 23:19:27,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.61 vs. limit=10.0 2024-09-17 23:19:35,858 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.223e+02 2.375e+02 2.504e+02 3.299e+02, threshold=4.750e+02, percent-clipped=0.0 2024-09-17 23:20:01,230 INFO [train.py:1198] (0/2) Epoch 38, batch 1500, loss[loss=0.231, ctc_loss=0.1519, cr_loss=0.3957, over 20648.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1458, cr_loss=0.3711, over 4085872.43 frames. ], batch size: 66, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:20:07,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=674222.8333333334, ans=0.0 2024-09-17 23:20:24,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=674251.1666666666, ans=0.125 2024-09-17 23:20:27,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-09-17 23:20:47,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=674307.8333333334, ans=0.1 2024-09-17 23:20:54,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=674307.8333333334, ans=0.0 2024-09-17 23:21:00,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=674336.1666666666, ans=0.0 2024-09-17 23:21:10,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-09-17 23:21:15,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=674364.5, ans=0.125 2024-09-17 23:21:16,787 INFO [train.py:1198] (0/2) Epoch 38, batch 1550, loss[loss=0.2256, ctc_loss=0.1502, cr_loss=0.3771, over 20965.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1464, cr_loss=0.3721, over 4083868.75 frames. ], batch size: 64, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:21:34,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=674392.8333333334, ans=0.025 2024-09-17 23:21:43,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=674392.8333333334, ans=0.1 2024-09-17 23:21:49,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=674421.1666666666, ans=0.125 2024-09-17 23:22:03,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=674449.5, ans=0.07 2024-09-17 23:22:08,949 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.216e+02 2.344e+02 2.544e+02 4.001e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-17 23:22:31,593 INFO [train.py:1198] (0/2) Epoch 38, batch 1600, loss[loss=0.2219, ctc_loss=0.1477, cr_loss=0.3709, over 21021.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1466, cr_loss=0.3727, over 4087006.52 frames. ], batch size: 62, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:22:50,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-17 23:23:47,693 INFO [train.py:1198] (0/2) Epoch 38, batch 1650, loss[loss=0.2311, ctc_loss=0.1511, cr_loss=0.3999, over 20979.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1459, cr_loss=0.3721, over 4096175.71 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:24:06,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=674676.1666666666, ans=0.0 2024-09-17 23:24:09,222 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:24:33,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=674732.8333333334, ans=0.125 2024-09-17 23:24:40,537 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.138e+02 2.273e+02 2.414e+02 3.350e+02, threshold=4.546e+02, percent-clipped=0.0 2024-09-17 23:25:05,858 INFO [train.py:1198] (0/2) Epoch 38, batch 1700, loss[loss=0.2755, ctc_loss=0.1893, cr_loss=0.4309, over 19368.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1462, cr_loss=0.3724, over 4101765.08 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:25:19,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=674817.8333333334, ans=0.0 2024-09-17 23:25:44,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=674846.1666666666, ans=0.0 2024-09-17 23:25:47,536 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:25:52,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=674846.1666666666, ans=0.0 2024-09-17 23:26:07,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=674874.5, ans=0.2 2024-09-17 23:26:19,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=674902.8333333334, ans=0.125 2024-09-17 23:26:25,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.06 vs. limit=22.5 2024-09-17 23:26:25,382 INFO [train.py:1198] (0/2) Epoch 38, batch 1750, loss[loss=0.196, ctc_loss=0.1298, cr_loss=0.3311, over 20897.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1462, cr_loss=0.3725, over 4098918.40 frames. ], batch size: 54, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:26:34,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674931.1666666666, ans=0.1 2024-09-17 23:26:34,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674931.1666666666, ans=0.1 2024-09-17 23:26:54,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674987.8333333334, ans=0.1 2024-09-17 23:26:55,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=674987.8333333334, ans=0.2 2024-09-17 23:27:18,008 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.188e+02 2.341e+02 2.472e+02 5.035e+02, threshold=4.682e+02, percent-clipped=1.0 2024-09-17 23:27:40,712 INFO [train.py:1198] (0/2) Epoch 38, batch 1800, loss[loss=0.2328, ctc_loss=0.1537, cr_loss=0.3953, over 20814.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1467, cr_loss=0.3734, over 4103966.70 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:27:54,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=675101.1666666666, ans=0.125 2024-09-17 23:28:08,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=12.0 2024-09-17 23:28:21,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=675129.5, ans=0.0 2024-09-17 23:28:32,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=22.5 2024-09-17 23:28:36,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=675157.8333333334, ans=0.125 2024-09-17 23:28:43,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=675186.1666666666, ans=0.125 2024-09-17 23:28:55,174 INFO [train.py:1198] (0/2) Epoch 38, batch 1850, loss[loss=0.2305, ctc_loss=0.1517, cr_loss=0.3944, over 20825.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1466, cr_loss=0.3729, over 4083066.38 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:28:57,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=675214.5, ans=0.125 2024-09-17 23:29:02,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=675214.5, ans=0.125 2024-09-17 23:29:09,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675242.8333333334, ans=0.1 2024-09-17 23:29:10,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=675242.8333333334, ans=0.125 2024-09-17 23:29:47,931 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.209e+02 2.370e+02 2.537e+02 3.248e+02, threshold=4.741e+02, percent-clipped=0.0 2024-09-17 23:29:52,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675299.5, ans=0.1 2024-09-17 23:30:03,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=675327.8333333334, ans=0.0 2024-09-17 23:30:06,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=675327.8333333334, ans=0.125 2024-09-17 23:30:10,453 INFO [train.py:1198] (0/2) Epoch 38, batch 1900, loss[loss=0.1902, ctc_loss=0.1239, cr_loss=0.3317, over 20954.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1452, cr_loss=0.3695, over 4079420.54 frames. ], batch size: 48, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:30:24,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=675384.5, ans=0.125 2024-09-17 23:30:25,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=675384.5, ans=0.0 2024-09-17 23:30:38,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=675384.5, ans=0.0 2024-09-17 23:30:53,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675412.8333333334, ans=0.1 2024-09-17 23:31:20,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=675469.5, ans=0.125 2024-09-17 23:31:32,016 INFO [train.py:1198] (0/2) Epoch 38, batch 1950, loss[loss=0.2081, ctc_loss=0.1368, cr_loss=0.3567, over 20998.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1457, cr_loss=0.3699, over 4083771.26 frames. ], batch size: 50, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:31:47,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=675526.1666666666, ans=0.0 2024-09-17 23:31:51,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2024-09-17 23:32:25,008 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.215e+02 2.295e+02 2.440e+02 6.100e+02, threshold=4.591e+02, percent-clipped=1.0 2024-09-17 23:32:30,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=675582.8333333334, ans=0.025 2024-09-17 23:32:32,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675611.1666666666, ans=0.1 2024-09-17 23:32:44,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=675611.1666666666, ans=0.0 2024-09-17 23:32:47,622 INFO [train.py:1198] (0/2) Epoch 38, batch 2000, loss[loss=0.1927, ctc_loss=0.1264, cr_loss=0.3318, over 20786.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1453, cr_loss=0.3688, over 4072512.13 frames. ], batch size: 53, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:33:06,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-17 23:33:24,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=675696.1666666666, ans=0.0 2024-09-17 23:33:41,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=675724.5, ans=10.0 2024-09-17 23:34:03,532 INFO [train.py:1198] (0/2) Epoch 38, batch 2050, loss[loss=0.2526, ctc_loss=0.1681, cr_loss=0.4223, over 20020.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1461, cr_loss=0.3706, over 4075104.27 frames. ], batch size: 80, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:34:32,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=675837.8333333334, ans=0.05 2024-09-17 23:34:46,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=675837.8333333334, ans=0.125 2024-09-17 23:34:52,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=675866.1666666666, ans=0.035 2024-09-17 23:34:56,592 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.185e+02 2.316e+02 2.490e+02 4.004e+02, threshold=4.632e+02, percent-clipped=0.0 2024-09-17 23:34:59,076 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.39 vs. limit=12.0 2024-09-17 23:35:19,522 INFO [train.py:1198] (0/2) Epoch 38, batch 2100, loss[loss=0.2526, ctc_loss=0.1769, cr_loss=0.3784, over 14080.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1455, cr_loss=0.3694, over 4079422.71 frames. ], batch size: 149, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:35:21,623 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:35:36,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=675951.1666666666, ans=0.2 2024-09-17 23:36:00,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=675979.5, ans=0.09899494936611666 2024-09-17 23:36:08,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=22.5 2024-09-17 23:36:37,871 INFO [train.py:1198] (0/2) Epoch 38, batch 2150, loss[loss=0.2134, ctc_loss=0.1389, cr_loss=0.3727, over 20963.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1456, cr_loss=0.3701, over 4095680.85 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:36:42,142 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.89 vs. limit=8.0 2024-09-17 23:37:33,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=22.5 2024-09-17 23:37:33,793 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.250e+02 2.382e+02 2.570e+02 3.088e+02, threshold=4.763e+02, percent-clipped=0.0 2024-09-17 23:37:56,954 INFO [train.py:1198] (0/2) Epoch 38, batch 2200, loss[loss=0.1843, ctc_loss=0.1185, cr_loss=0.3292, over 20380.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1444, cr_loss=0.3681, over 4097505.79 frames. ], batch size: 45, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:38:15,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=676234.5, ans=0.125 2024-09-17 23:38:17,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=676234.5, ans=0.2 2024-09-17 23:38:25,397 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=22.5 2024-09-17 23:38:26,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=676262.8333333334, ans=0.2 2024-09-17 23:38:27,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=676262.8333333334, ans=0.125 2024-09-17 23:38:31,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=676262.8333333334, ans=0.125 2024-09-17 23:38:47,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=676291.1666666666, ans=0.0 2024-09-17 23:39:12,699 INFO [train.py:1198] (0/2) Epoch 38, batch 2250, loss[loss=0.2335, ctc_loss=0.1556, cr_loss=0.3896, over 20876.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1449, cr_loss=0.3694, over 4098054.31 frames. ], batch size: 65, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:39:47,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=676404.5, ans=0.07 2024-09-17 23:40:05,198 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.204e+02 2.333e+02 2.495e+02 3.028e+02, threshold=4.665e+02, percent-clipped=0.0 2024-09-17 23:40:07,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=676432.8333333334, ans=0.0 2024-09-17 23:40:28,108 INFO [train.py:1198] (0/2) Epoch 38, batch 2300, loss[loss=0.2458, ctc_loss=0.1651, cr_loss=0.4035, over 19357.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1447, cr_loss=0.3689, over 4103315.07 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 32.0 2024-09-17 23:41:21,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=676574.5, ans=0.2 2024-09-17 23:41:21,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=676574.5, ans=0.1 2024-09-17 23:41:34,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=676602.8333333334, ans=0.0 2024-09-17 23:41:43,474 INFO [train.py:1198] (0/2) Epoch 38, batch 2350, loss[loss=0.2252, ctc_loss=0.146, cr_loss=0.396, over 21017.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1451, cr_loss=0.3692, over 4096314.51 frames. ], batch size: 61, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:41:45,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=676631.1666666666, ans=0.0 2024-09-17 23:41:54,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=676631.1666666666, ans=0.125 2024-09-17 23:42:29,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=676687.8333333334, ans=0.035 2024-09-17 23:42:34,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2024-09-17 23:42:42,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.161e+02 2.302e+02 2.540e+02 3.831e+02, threshold=4.603e+02, percent-clipped=0.0 2024-09-17 23:42:47,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=676716.1666666666, ans=0.1 2024-09-17 23:42:56,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=676744.5, ans=10.0 2024-09-17 23:43:04,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2024-09-17 23:43:05,098 INFO [train.py:1198] (0/2) Epoch 38, batch 2400, loss[loss=0.2549, ctc_loss=0.1685, cr_loss=0.4323, over 20965.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1463, cr_loss=0.3707, over 4085202.53 frames. ], batch size: 64, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:43:16,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=676772.8333333334, ans=0.0 2024-09-17 23:43:23,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=676801.1666666666, ans=0.0 2024-09-17 23:43:44,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=676829.5, ans=0.1 2024-09-17 23:43:47,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=676829.5, ans=0.125 2024-09-17 23:43:59,997 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-17 23:44:20,322 INFO [train.py:1198] (0/2) Epoch 38, batch 2450, loss[loss=0.2063, ctc_loss=0.1348, cr_loss=0.3577, over 20962.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1469, cr_loss=0.3719, over 4092059.93 frames. ], batch size: 55, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:44:25,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-09-17 23:45:14,056 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.823e+02 2.182e+02 2.372e+02 2.578e+02 3.656e+02, threshold=4.743e+02, percent-clipped=0.0 2024-09-17 23:45:36,772 INFO [train.py:1198] (0/2) Epoch 38, batch 2500, loss[loss=0.2214, ctc_loss=0.1462, cr_loss=0.3758, over 20786.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1471, cr_loss=0.3724, over 4101576.49 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:45:50,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=677084.5, ans=0.0 2024-09-17 23:46:42,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2024-09-17 23:46:46,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=677169.5, ans=0.125 2024-09-17 23:46:46,701 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-09-17 23:46:46,821 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2024-09-17 23:46:51,882 INFO [train.py:1198] (0/2) Epoch 38, batch 2550, loss[loss=0.2455, ctc_loss=0.1671, cr_loss=0.3922, over 20975.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.3702, over 4102900.05 frames. ], batch size: 64, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:46:54,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2024-09-17 23:47:15,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=677226.1666666666, ans=0.125 2024-09-17 23:47:18,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=677226.1666666666, ans=0.2 2024-09-17 23:47:34,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=677254.5, ans=0.2 2024-09-17 23:47:44,926 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.160e+02 2.286e+02 2.515e+02 3.049e+02, threshold=4.571e+02, percent-clipped=0.0 2024-09-17 23:47:54,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=677311.1666666666, ans=0.0 2024-09-17 23:48:10,397 INFO [train.py:1198] (0/2) Epoch 38, batch 2600, loss[loss=0.2393, ctc_loss=0.1628, cr_loss=0.3829, over 19524.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1454, cr_loss=0.3698, over 4104181.07 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:48:48,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=677396.1666666666, ans=0.2 2024-09-17 23:48:54,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677396.1666666666, ans=0.1 2024-09-17 23:49:05,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=677424.5, ans=0.125 2024-09-17 23:49:28,942 INFO [train.py:1198] (0/2) Epoch 38, batch 2650, loss[loss=0.2377, ctc_loss=0.1563, cr_loss=0.4072, over 20977.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1453, cr_loss=0.369, over 4085709.53 frames. ], batch size: 58, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:49:32,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=677481.1666666666, ans=0.07 2024-09-17 23:49:42,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=677509.5, ans=0.2 2024-09-17 23:50:09,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=677537.8333333334, ans=0.0 2024-09-17 23:50:20,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=677566.1666666666, ans=0.0 2024-09-17 23:50:21,634 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.240e+02 2.383e+02 2.555e+02 4.331e+02, threshold=4.765e+02, percent-clipped=0.0 2024-09-17 23:50:26,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=677566.1666666666, ans=0.125 2024-09-17 23:50:40,446 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:50:44,623 INFO [train.py:1198] (0/2) Epoch 38, batch 2700, loss[loss=0.1744, ctc_loss=0.1135, cr_loss=0.3041, over 20991.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1455, cr_loss=0.3694, over 4082489.65 frames. ], batch size: 51, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:51:19,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=677679.5, ans=0.025 2024-09-17 23:51:20,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=677679.5, ans=0.125 2024-09-17 23:51:20,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=677679.5, ans=15.0 2024-09-17 23:51:23,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=677679.5, ans=0.0 2024-09-17 23:52:00,367 INFO [train.py:1198] (0/2) Epoch 38, batch 2750, loss[loss=0.2419, ctc_loss=0.1639, cr_loss=0.3901, over 19946.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.3706, over 4083547.28 frames. ], batch size: 80, lr: 2.17e-03, grad_scale: 64.0 2024-09-17 23:52:02,566 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-09-17 23:52:11,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=677764.5, ans=0.1 2024-09-17 23:52:12,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=677764.5, ans=0.2 2024-09-17 23:52:33,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=677821.1666666666, ans=0.04949747468305833 2024-09-17 23:52:33,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=677821.1666666666, ans=0.125 2024-09-17 23:52:52,708 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.201e+02 2.301e+02 2.469e+02 3.405e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-17 23:53:15,306 INFO [train.py:1198] (0/2) Epoch 38, batch 2800, loss[loss=0.2173, ctc_loss=0.1411, cr_loss=0.381, over 20975.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1459, cr_loss=0.3706, over 4079847.33 frames. ], batch size: 58, lr: 2.17e-03, grad_scale: 64.0 2024-09-17 23:54:22,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=678019.5, ans=0.125 2024-09-17 23:54:28,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=678019.5, ans=0.5 2024-09-17 23:54:29,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=678019.5, ans=0.125 2024-09-17 23:54:36,765 INFO [train.py:1198] (0/2) Epoch 38, batch 2850, loss[loss=0.2149, ctc_loss=0.1444, cr_loss=0.3527, over 20795.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3708, over 4093098.48 frames. ], batch size: 53, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:54:37,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.21 vs. limit=15.0 2024-09-17 23:54:46,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=678047.8333333334, ans=0.0 2024-09-17 23:54:52,659 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-17 23:55:07,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678104.5, ans=0.1 2024-09-17 23:55:31,274 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.169e+02 2.263e+02 2.441e+02 3.215e+02, threshold=4.527e+02, percent-clipped=0.0 2024-09-17 23:55:52,872 INFO [train.py:1198] (0/2) Epoch 38, batch 2900, loss[loss=0.226, ctc_loss=0.1512, cr_loss=0.3741, over 21018.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1449, cr_loss=0.3689, over 4101753.11 frames. ], batch size: 61, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:56:01,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.90 vs. limit=15.0 2024-09-17 23:56:08,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=678217.8333333334, ans=0.0 2024-09-17 23:56:56,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678302.8333333334, ans=0.1 2024-09-17 23:57:08,362 INFO [train.py:1198] (0/2) Epoch 38, batch 2950, loss[loss=0.2231, ctc_loss=0.1498, cr_loss=0.3667, over 20852.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1461, cr_loss=0.3702, over 4079519.56 frames. ], batch size: 59, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:57:28,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=678359.5, ans=0.0 2024-09-17 23:58:03,572 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.235e+02 2.352e+02 2.536e+02 8.217e+02, threshold=4.704e+02, percent-clipped=1.0 2024-09-17 23:58:23,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=678472.8333333334, ans=0.07 2024-09-17 23:58:25,015 INFO [train.py:1198] (0/2) Epoch 38, batch 3000, loss[loss=0.228, ctc_loss=0.1525, cr_loss=0.3778, over 20270.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.146, cr_loss=0.37, over 4071267.71 frames. ], batch size: 74, lr: 2.17e-03, grad_scale: 32.0 2024-09-17 23:58:25,017 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 23:58:44,725 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1988, 5.1024, 4.8487, 5.1779], device='cuda:0') 2024-09-17 23:58:48,108 INFO [train.py:1230] (0/2) Epoch 38, validation: loss=0.04012, ctc_loss=0.04012, cr_loss=1.433e-14, over 944034.00 frames. 2024-09-17 23:58:48,109 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-17 23:58:53,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-09-17 23:59:00,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=678472.8333333334, ans=0.125 2024-09-17 23:59:22,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2024-09-18 00:00:10,399 INFO [train.py:1198] (0/2) Epoch 38, batch 3050, loss[loss=0.2323, ctc_loss=0.1524, cr_loss=0.3998, over 20800.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.145, cr_loss=0.368, over 4069555.33 frames. ], batch size: 53, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:00:36,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=678642.8333333334, ans=0.04949747468305833 2024-09-18 00:00:45,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=678671.1666666666, ans=0.125 2024-09-18 00:01:02,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=678699.5, ans=0.2 2024-09-18 00:01:04,878 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.170e+02 2.309e+02 2.471e+02 3.592e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-18 00:01:18,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=678727.8333333334, ans=0.0 2024-09-18 00:01:26,250 INFO [train.py:1198] (0/2) Epoch 38, batch 3100, loss[loss=0.2569, ctc_loss=0.1744, cr_loss=0.4121, over 18191.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1449, cr_loss=0.3674, over 4076232.73 frames. ], batch size: 108, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:01:32,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2024-09-18 00:02:07,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=678812.8333333334, ans=0.125 2024-09-18 00:02:13,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=678841.1666666666, ans=0.0 2024-09-18 00:02:27,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.14 vs. limit=6.0 2024-09-18 00:02:42,095 INFO [train.py:1198] (0/2) Epoch 38, batch 3150, loss[loss=0.2172, ctc_loss=0.1418, cr_loss=0.377, over 20786.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1455, cr_loss=0.3685, over 4080248.11 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:02:56,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=678926.1666666666, ans=0.2 2024-09-18 00:03:20,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678954.5, ans=0.1 2024-09-18 00:03:33,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=678982.8333333334, ans=0.0 2024-09-18 00:03:37,837 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.168e+02 2.310e+02 2.490e+02 3.894e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-18 00:03:57,607 INFO [train.py:1198] (0/2) Epoch 38, batch 3200, loss[loss=0.1922, ctc_loss=0.1242, cr_loss=0.3399, over 20944.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1452, cr_loss=0.3687, over 4087720.85 frames. ], batch size: 50, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:04:16,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=679067.8333333334, ans=0.125 2024-09-18 00:04:17,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=679067.8333333334, ans=0.125 2024-09-18 00:05:16,837 INFO [train.py:1198] (0/2) Epoch 38, batch 3250, loss[loss=0.2539, ctc_loss=0.1715, cr_loss=0.4121, over 19652.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1444, cr_loss=0.3679, over 4096107.08 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:05:26,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=679181.1666666666, ans=0.07 2024-09-18 00:05:39,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=679209.5, ans=0.125 2024-09-18 00:05:48,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=22.5 2024-09-18 00:06:15,263 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.235e+02 2.338e+02 2.498e+02 4.233e+02, threshold=4.676e+02, percent-clipped=0.0 2024-09-18 00:06:20,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-09-18 00:06:35,036 INFO [train.py:1198] (0/2) Epoch 38, batch 3300, loss[loss=0.2248, ctc_loss=0.1488, cr_loss=0.3799, over 20996.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1443, cr_loss=0.3675, over 4084901.98 frames. ], batch size: 61, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:06:44,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679322.8333333334, ans=0.1 2024-09-18 00:07:40,900 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=22.5 2024-09-18 00:07:50,577 INFO [train.py:1198] (0/2) Epoch 38, batch 3350, loss[loss=0.2347, ctc_loss=0.1576, cr_loss=0.3856, over 20725.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1445, cr_loss=0.3678, over 4080401.19 frames. ], batch size: 71, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:08:16,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=679492.8333333334, ans=0.1 2024-09-18 00:08:28,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.77 vs. limit=15.0 2024-09-18 00:08:37,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=679549.5, ans=0.125 2024-09-18 00:08:41,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=15.0 2024-09-18 00:08:46,570 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.192e+02 2.281e+02 2.427e+02 3.859e+02, threshold=4.562e+02, percent-clipped=0.0 2024-09-18 00:08:53,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=679577.8333333334, ans=0.125 2024-09-18 00:09:03,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=679577.8333333334, ans=0.05 2024-09-18 00:09:06,282 INFO [train.py:1198] (0/2) Epoch 38, batch 3400, loss[loss=0.1792, ctc_loss=0.1162, cr_loss=0.3152, over 20382.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1456, cr_loss=0.3693, over 4074925.26 frames. ], batch size: 45, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:09:14,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=679606.1666666666, ans=0.0 2024-09-18 00:09:42,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=679662.8333333334, ans=0.125 2024-09-18 00:09:53,996 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-09-18 00:09:58,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=679691.1666666666, ans=0.0 2024-09-18 00:09:59,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=679691.1666666666, ans=0.2 2024-09-18 00:10:13,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=679719.5, ans=0.2 2024-09-18 00:10:22,115 INFO [train.py:1198] (0/2) Epoch 38, batch 3450, loss[loss=0.1947, ctc_loss=0.1273, cr_loss=0.337, over 20789.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1455, cr_loss=0.3693, over 4080795.27 frames. ], batch size: 53, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:10:22,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=679747.8333333334, ans=0.0 2024-09-18 00:10:48,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=679776.1666666666, ans=0.125 2024-09-18 00:11:25,027 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.189e+02 2.319e+02 2.495e+02 3.342e+02, threshold=4.638e+02, percent-clipped=0.0 2024-09-18 00:11:43,241 INFO [train.py:1198] (0/2) Epoch 38, batch 3500, loss[loss=0.1812, ctc_loss=0.1196, cr_loss=0.3081, over 20959.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1451, cr_loss=0.3689, over 4092610.84 frames. ], batch size: 50, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:12:38,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=679974.5, ans=0.1 2024-09-18 00:12:41,296 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-240000.pt 2024-09-18 00:12:50,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=680002.8333333334, ans=0.125 2024-09-18 00:12:59,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=680031.1666666666, ans=0.025 2024-09-18 00:13:00,313 INFO [train.py:1198] (0/2) Epoch 38, batch 3550, loss[loss=0.1821, ctc_loss=0.1174, cr_loss=0.3234, over 20944.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1453, cr_loss=0.3691, over 4087642.71 frames. ], batch size: 48, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:13:09,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=680031.1666666666, ans=0.125 2024-09-18 00:13:48,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=680116.1666666666, ans=0.125 2024-09-18 00:13:49,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=680116.1666666666, ans=0.125 2024-09-18 00:13:57,294 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.246e+02 2.376e+02 2.589e+02 3.239e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-18 00:14:15,177 INFO [train.py:1198] (0/2) Epoch 38, batch 3600, loss[loss=0.2456, ctc_loss=0.1615, cr_loss=0.4201, over 19961.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1468, cr_loss=0.3721, over 4090526.98 frames. ], batch size: 80, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:14:15,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=680172.8333333334, ans=0.0 2024-09-18 00:14:42,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=680201.1666666666, ans=0.0 2024-09-18 00:14:42,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=680201.1666666666, ans=0.125 2024-09-18 00:15:25,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680286.1666666666, ans=0.1 2024-09-18 00:15:30,329 INFO [train.py:1198] (0/2) Epoch 38, batch 3650, loss[loss=0.1908, ctc_loss=0.1227, cr_loss=0.3404, over 20946.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.3719, over 4102948.70 frames. ], batch size: 50, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:15:43,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=680314.5, ans=0.0 2024-09-18 00:16:28,388 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.174e+02 2.298e+02 2.482e+02 3.802e+02, threshold=4.597e+02, percent-clipped=0.0 2024-09-18 00:16:49,219 INFO [train.py:1198] (0/2) Epoch 38, batch 3700, loss[loss=0.2157, ctc_loss=0.1423, cr_loss=0.3671, over 20808.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1465, cr_loss=0.3727, over 4111906.01 frames. ], batch size: 53, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:17:02,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.83 vs. limit=15.0 2024-09-18 00:17:43,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.49 vs. limit=15.0 2024-09-18 00:18:07,908 INFO [train.py:1198] (0/2) Epoch 38, batch 3750, loss[loss=0.2509, ctc_loss=0.1661, cr_loss=0.4237, over 20685.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1464, cr_loss=0.3731, over 4112474.07 frames. ], batch size: 71, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:18:39,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=680654.5, ans=0.5 2024-09-18 00:18:42,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=680654.5, ans=0.0 2024-09-18 00:19:02,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=680682.8333333334, ans=0.2 2024-09-18 00:19:02,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-18 00:19:04,947 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.188e+02 2.340e+02 2.478e+02 3.466e+02, threshold=4.681e+02, percent-clipped=0.0 2024-09-18 00:19:06,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680711.1666666666, ans=0.1 2024-09-18 00:19:22,900 INFO [train.py:1198] (0/2) Epoch 38, batch 3800, loss[loss=0.2042, ctc_loss=0.1368, cr_loss=0.3372, over 20988.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1447, cr_loss=0.3691, over 4117138.37 frames. ], batch size: 52, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:20:23,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=680852.8333333334, ans=0.125 2024-09-18 00:20:39,057 INFO [train.py:1198] (0/2) Epoch 38, batch 3850, loss[loss=0.2236, ctc_loss=0.1486, cr_loss=0.3746, over 21089.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1444, cr_loss=0.3686, over 4115142.27 frames. ], batch size: 53, lr: 2.17e-03, grad_scale: 8.0 2024-09-18 00:20:45,635 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:21:39,730 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.177e+02 2.335e+02 2.487e+02 1.316e+03, threshold=4.669e+02, percent-clipped=2.0 2024-09-18 00:21:46,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=680994.5, ans=0.0 2024-09-18 00:21:54,733 INFO [train.py:1198] (0/2) Epoch 38, batch 3900, loss[loss=0.2631, ctc_loss=0.1818, cr_loss=0.4068, over 18239.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1444, cr_loss=0.3678, over 4111405.69 frames. ], batch size: 108, lr: 2.17e-03, grad_scale: 8.0 2024-09-18 00:22:05,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-18 00:22:43,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=681107.8333333334, ans=0.0 2024-09-18 00:22:55,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.90 vs. limit=15.0 2024-09-18 00:23:07,649 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=12.0 2024-09-18 00:23:10,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=681136.1666666666, ans=0.125 2024-09-18 00:23:15,988 INFO [train.py:1198] (0/2) Epoch 38, batch 3950, loss[loss=0.2188, ctc_loss=0.1449, cr_loss=0.3692, over 21035.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1446, cr_loss=0.368, over 4110586.52 frames. ], batch size: 63, lr: 2.17e-03, grad_scale: 8.0 2024-09-18 00:23:26,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=681164.5, ans=0.125 2024-09-18 00:23:34,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=681192.8333333334, ans=0.125 2024-09-18 00:23:36,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=15.0 2024-09-18 00:24:06,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=681249.5, ans=0.125 2024-09-18 00:24:16,206 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.172e+02 2.301e+02 2.467e+02 5.100e+02, threshold=4.602e+02, percent-clipped=1.0 2024-09-18 00:24:22,786 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-18 00:24:29,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=681306.1666666666, ans=0.0 2024-09-18 00:24:31,238 INFO [train.py:1198] (0/2) Epoch 38, batch 4000, loss[loss=0.1903, ctc_loss=0.1253, cr_loss=0.3252, over 20984.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.145, cr_loss=0.3695, over 4112696.80 frames. ], batch size: 51, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:24:33,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.31 vs. limit=10.0 2024-09-18 00:24:42,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=681306.1666666666, ans=0.0 2024-09-18 00:24:43,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=681306.1666666666, ans=0.0 2024-09-18 00:24:43,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=681306.1666666666, ans=0.0 2024-09-18 00:25:06,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681362.8333333334, ans=0.1 2024-09-18 00:25:10,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=681362.8333333334, ans=0.0 2024-09-18 00:25:14,095 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=22.5 2024-09-18 00:25:27,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681391.1666666666, ans=0.1 2024-09-18 00:25:46,910 INFO [train.py:1198] (0/2) Epoch 38, batch 4050, loss[loss=0.2041, ctc_loss=0.1329, cr_loss=0.3563, over 21050.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1457, cr_loss=0.3712, over 4108536.20 frames. ], batch size: 53, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:26:02,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=681476.1666666666, ans=0.1 2024-09-18 00:26:13,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=681476.1666666666, ans=0.0 2024-09-18 00:26:47,733 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.171e+02 2.348e+02 2.511e+02 3.000e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-18 00:27:02,958 INFO [train.py:1198] (0/2) Epoch 38, batch 4100, loss[loss=0.1905, ctc_loss=0.125, cr_loss=0.3272, over 19976.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1456, cr_loss=0.3706, over 4103494.12 frames. ], batch size: 44, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:27:24,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681617.8333333334, ans=0.1 2024-09-18 00:27:42,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=681646.1666666666, ans=0.125 2024-09-18 00:27:44,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=681646.1666666666, ans=0.0 2024-09-18 00:27:51,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.48 vs. limit=15.0 2024-09-18 00:28:21,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=681731.1666666666, ans=0.125 2024-09-18 00:28:22,268 INFO [train.py:1198] (0/2) Epoch 38, batch 4150, loss[loss=0.1985, ctc_loss=0.131, cr_loss=0.3371, over 20959.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1443, cr_loss=0.3683, over 4102129.23 frames. ], batch size: 51, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:29:09,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=681816.1666666666, ans=0.125 2024-09-18 00:29:25,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=22.5 2024-09-18 00:29:25,983 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.185e+02 2.324e+02 2.533e+02 7.205e+02, threshold=4.647e+02, percent-clipped=1.0 2024-09-18 00:29:40,901 INFO [train.py:1198] (0/2) Epoch 38, batch 4200, loss[loss=0.2098, ctc_loss=0.138, cr_loss=0.3588, over 20856.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3705, over 4087313.88 frames. ], batch size: 57, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:29:54,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=22.5 2024-09-18 00:30:55,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=682014.5, ans=0.2 2024-09-18 00:30:56,683 INFO [train.py:1198] (0/2) Epoch 38, batch 4250, loss[loss=0.2122, ctc_loss=0.1386, cr_loss=0.3683, over 21057.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1455, cr_loss=0.3706, over 4086772.11 frames. ], batch size: 53, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:31:32,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682071.1666666666, ans=0.1 2024-09-18 00:31:58,138 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.234e+02 2.385e+02 2.551e+02 6.288e+02, threshold=4.770e+02, percent-clipped=1.0 2024-09-18 00:32:04,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=682127.8333333334, ans=0.125 2024-09-18 00:32:13,282 INFO [train.py:1198] (0/2) Epoch 38, batch 4300, loss[loss=0.2589, ctc_loss=0.1768, cr_loss=0.4103, over 18381.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3699, over 4087814.07 frames. ], batch size: 108, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:32:13,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=682156.1666666666, ans=0.1 2024-09-18 00:32:33,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=682184.5, ans=0.0 2024-09-18 00:32:38,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=22.5 2024-09-18 00:33:28,515 INFO [train.py:1198] (0/2) Epoch 38, batch 4350, loss[loss=0.2367, ctc_loss=0.1592, cr_loss=0.3874, over 21024.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1463, cr_loss=0.3718, over 4094160.22 frames. ], batch size: 63, lr: 2.17e-03, grad_scale: 16.0 2024-09-18 00:33:32,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=12.0 2024-09-18 00:33:44,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=682326.1666666666, ans=0.125 2024-09-18 00:33:57,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=682354.5, ans=0.2 2024-09-18 00:34:19,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=682382.8333333334, ans=0.125 2024-09-18 00:34:34,458 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.234e+02 2.368e+02 2.527e+02 2.981e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-18 00:34:48,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=682439.5, ans=0.0 2024-09-18 00:34:48,498 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2024-09-18 00:34:49,410 INFO [train.py:1198] (0/2) Epoch 38, batch 4400, loss[loss=0.221, ctc_loss=0.1464, cr_loss=0.3733, over 20692.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1462, cr_loss=0.3713, over 4088331.40 frames. ], batch size: 71, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:34:55,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682439.5, ans=0.1 2024-09-18 00:35:12,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=682467.8333333334, ans=0.025 2024-09-18 00:35:20,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=682496.1666666666, ans=0.0 2024-09-18 00:35:31,469 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-18 00:36:05,399 INFO [train.py:1198] (0/2) Epoch 38, batch 4450, loss[loss=0.2547, ctc_loss=0.1735, cr_loss=0.4059, over 20635.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1457, cr_loss=0.3704, over 4096116.32 frames. ], batch size: 71, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:36:10,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=682581.1666666666, ans=0.125 2024-09-18 00:36:22,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=682609.5, ans=0.025 2024-09-18 00:36:28,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=682609.5, ans=0.0 2024-09-18 00:36:57,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=22.5 2024-09-18 00:37:02,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=682666.1666666666, ans=0.125 2024-09-18 00:37:05,721 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.178e+02 2.267e+02 2.475e+02 3.401e+02, threshold=4.534e+02, percent-clipped=0.0 2024-09-18 00:37:07,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=682694.5, ans=0.025 2024-09-18 00:37:21,039 INFO [train.py:1198] (0/2) Epoch 38, batch 4500, loss[loss=0.211, ctc_loss=0.1381, cr_loss=0.3643, over 20831.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.371, over 4097286.22 frames. ], batch size: 59, lr: 2.17e-03, grad_scale: 32.0 2024-09-18 00:37:39,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=682751.1666666666, ans=0.5 2024-09-18 00:37:39,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=682751.1666666666, ans=15.0 2024-09-18 00:37:46,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=682751.1666666666, ans=0.125 2024-09-18 00:38:36,241 INFO [train.py:1198] (0/2) Epoch 38, batch 4550, loss[loss=0.2373, ctc_loss=0.1587, cr_loss=0.3931, over 20799.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1464, cr_loss=0.3714, over 4101072.68 frames. ], batch size: 65, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:38:50,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=682892.8333333334, ans=15.0 2024-09-18 00:38:52,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-18 00:38:59,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=682892.8333333334, ans=0.125 2024-09-18 00:39:02,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=22.5 2024-09-18 00:39:08,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=682921.1666666666, ans=0.125 2024-09-18 00:39:09,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=682921.1666666666, ans=0.125 2024-09-18 00:39:36,919 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.204e+02 2.316e+02 2.469e+02 9.471e+02, threshold=4.633e+02, percent-clipped=1.0 2024-09-18 00:39:54,845 INFO [train.py:1198] (0/2) Epoch 38, batch 4600, loss[loss=0.2336, ctc_loss=0.1548, cr_loss=0.3941, over 21018.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1464, cr_loss=0.3717, over 4101384.08 frames. ], batch size: 63, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:39:55,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=683006.1666666666, ans=0.2 2024-09-18 00:40:38,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=683062.8333333334, ans=0.0 2024-09-18 00:41:00,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=683119.5, ans=0.0 2024-09-18 00:41:00,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=683119.5, ans=0.125 2024-09-18 00:41:13,661 INFO [train.py:1198] (0/2) Epoch 38, batch 4650, loss[loss=0.2078, ctc_loss=0.1363, cr_loss=0.3577, over 20979.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1458, cr_loss=0.3709, over 4103383.32 frames. ], batch size: 55, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:41:15,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=683147.8333333334, ans=0.2 2024-09-18 00:41:17,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=683147.8333333334, ans=15.0 2024-09-18 00:42:02,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683232.8333333334, ans=0.1 2024-09-18 00:42:08,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=683232.8333333334, ans=0.0 2024-09-18 00:42:10,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=683232.8333333334, ans=0.125 2024-09-18 00:42:14,435 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.184e+02 2.332e+02 2.461e+02 3.395e+02, threshold=4.665e+02, percent-clipped=0.0 2024-09-18 00:42:14,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=683261.1666666666, ans=0.125 2024-09-18 00:42:29,492 INFO [train.py:1198] (0/2) Epoch 38, batch 4700, loss[loss=0.2429, ctc_loss=0.161, cr_loss=0.4097, over 20348.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1464, cr_loss=0.3714, over 4099430.38 frames. ], batch size: 74, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:42:34,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=683289.5, ans=0.125 2024-09-18 00:42:37,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=683289.5, ans=0.125 2024-09-18 00:43:24,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=683374.5, ans=0.2 2024-09-18 00:43:26,258 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-18 00:43:45,157 INFO [train.py:1198] (0/2) Epoch 38, batch 4750, loss[loss=0.2299, ctc_loss=0.1533, cr_loss=0.3828, over 20851.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1459, cr_loss=0.3704, over 4101122.42 frames. ], batch size: 65, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:43:56,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=683431.1666666666, ans=0.02 2024-09-18 00:44:09,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=683459.5, ans=10.0 2024-09-18 00:44:23,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=22.5 2024-09-18 00:44:45,916 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.203e+02 2.333e+02 2.487e+02 5.202e+02, threshold=4.666e+02, percent-clipped=1.0 2024-09-18 00:44:56,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-09-18 00:45:00,940 INFO [train.py:1198] (0/2) Epoch 38, batch 4800, loss[loss=0.2702, ctc_loss=0.1855, cr_loss=0.4236, over 18097.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1463, cr_loss=0.3714, over 4098144.19 frames. ], batch size: 108, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:45:53,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=683657.8333333334, ans=0.125 2024-09-18 00:46:12,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=683686.1666666666, ans=0.0 2024-09-18 00:46:15,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.54 vs. limit=15.0 2024-09-18 00:46:22,374 INFO [train.py:1198] (0/2) Epoch 38, batch 4850, loss[loss=0.2353, ctc_loss=0.1542, cr_loss=0.4054, over 21088.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3708, over 4104512.22 frames. ], batch size: 59, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:46:27,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=683714.5, ans=0.125 2024-09-18 00:46:35,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=683714.5, ans=0.02 2024-09-18 00:46:49,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=22.5 2024-09-18 00:46:53,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=683771.1666666666, ans=0.125 2024-09-18 00:46:58,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=15.0 2024-09-18 00:47:00,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=683771.1666666666, ans=0.0 2024-09-18 00:47:18,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=683799.5, ans=0.2 2024-09-18 00:47:22,855 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.185e+02 2.288e+02 2.442e+02 3.685e+02, threshold=4.575e+02, percent-clipped=0.0 2024-09-18 00:47:38,104 INFO [train.py:1198] (0/2) Epoch 38, batch 4900, loss[loss=0.2097, ctc_loss=0.138, cr_loss=0.3587, over 20870.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1461, cr_loss=0.3712, over 4103779.53 frames. ], batch size: 57, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:47:49,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=683856.1666666666, ans=0.125 2024-09-18 00:48:32,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=683941.1666666666, ans=0.0 2024-09-18 00:48:36,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=683941.1666666666, ans=0.125 2024-09-18 00:48:53,083 INFO [train.py:1198] (0/2) Epoch 38, batch 4950, loss[loss=0.2508, ctc_loss=0.1742, cr_loss=0.3829, over 14297.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.3713, over 4106100.56 frames. ], batch size: 150, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:49:14,279 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:49:15,704 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:49:17,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=684026.1666666666, ans=0.1 2024-09-18 00:49:17,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=684026.1666666666, ans=0.0 2024-09-18 00:49:21,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=684054.5, ans=0.125 2024-09-18 00:49:52,537 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.221e+02 2.337e+02 2.503e+02 4.200e+02, threshold=4.673e+02, percent-clipped=0.0 2024-09-18 00:50:07,457 INFO [train.py:1198] (0/2) Epoch 38, batch 5000, loss[loss=0.1925, ctc_loss=0.1242, cr_loss=0.3416, over 20961.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3707, over 4115776.07 frames. ], batch size: 48, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:50:09,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=684139.5, ans=0.125 2024-09-18 00:50:19,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=684139.5, ans=0.125 2024-09-18 00:50:21,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=684167.8333333334, ans=0.125 2024-09-18 00:50:27,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=684167.8333333334, ans=0.125 2024-09-18 00:50:46,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=684196.1666666666, ans=0.025 2024-09-18 00:50:56,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=684224.5, ans=0.125 2024-09-18 00:51:01,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=684224.5, ans=0.0 2024-09-18 00:51:04,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=684224.5, ans=0.125 2024-09-18 00:51:07,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=684252.8333333334, ans=0.0 2024-09-18 00:51:21,993 INFO [train.py:1198] (0/2) Epoch 38, batch 5050, loss[loss=0.2071, ctc_loss=0.1375, cr_loss=0.3482, over 20888.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1453, cr_loss=0.3702, over 4119156.80 frames. ], batch size: 57, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:51:23,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=684281.1666666666, ans=0.5 2024-09-18 00:51:42,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=684309.5, ans=0.5 2024-09-18 00:51:59,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=684337.8333333334, ans=0.125 2024-09-18 00:52:16,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=684366.1666666666, ans=0.125 2024-09-18 00:52:20,571 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.246e+02 2.374e+02 2.516e+02 6.354e+02, threshold=4.749e+02, percent-clipped=1.0 2024-09-18 00:52:28,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=684394.5, ans=0.125 2024-09-18 00:52:35,418 INFO [train.py:1198] (0/2) Epoch 38, batch 5100, loss[loss=0.2232, ctc_loss=0.1485, cr_loss=0.3737, over 20085.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1457, cr_loss=0.3706, over 4104922.65 frames. ], batch size: 80, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:52:37,195 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:52:44,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=684422.8333333334, ans=0.0 2024-09-18 00:52:58,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=684451.1666666666, ans=0.0 2024-09-18 00:53:35,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=22.5 2024-09-18 00:53:47,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=684536.1666666666, ans=0.2 2024-09-18 00:53:52,937 INFO [train.py:1198] (0/2) Epoch 38, batch 5150, loss[loss=0.2294, ctc_loss=0.1498, cr_loss=0.3981, over 21001.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1449, cr_loss=0.3693, over 4106691.51 frames. ], batch size: 61, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:54:06,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=684592.8333333334, ans=0.025 2024-09-18 00:54:08,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=684592.8333333334, ans=0.0 2024-09-18 00:54:16,958 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2024-09-18 00:54:31,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684621.1666666666, ans=0.1 2024-09-18 00:54:54,780 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.184e+02 2.308e+02 2.448e+02 4.655e+02, threshold=4.616e+02, percent-clipped=0.0 2024-09-18 00:55:09,714 INFO [train.py:1198] (0/2) Epoch 38, batch 5200, loss[loss=0.2496, ctc_loss=0.1653, cr_loss=0.4218, over 20939.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1445, cr_loss=0.3688, over 4118040.27 frames. ], batch size: 60, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:55:46,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=684762.8333333334, ans=0.125 2024-09-18 00:55:47,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=12.0 2024-09-18 00:56:02,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-18 00:56:12,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-09-18 00:56:24,436 INFO [train.py:1198] (0/2) Epoch 38, batch 5250, loss[loss=0.2385, ctc_loss=0.1596, cr_loss=0.3943, over 20794.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1446, cr_loss=0.369, over 4115570.63 frames. ], batch size: 56, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:56:32,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=684847.8333333334, ans=0.2 2024-09-18 00:56:32,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=22.5 2024-09-18 00:56:48,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=684876.1666666666, ans=0.0 2024-09-18 00:57:22,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=684961.1666666666, ans=0.125 2024-09-18 00:57:23,804 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.195e+02 2.290e+02 2.458e+02 4.073e+02, threshold=4.580e+02, percent-clipped=0.0 2024-09-18 00:57:39,084 INFO [train.py:1198] (0/2) Epoch 38, batch 5300, loss[loss=0.201, ctc_loss=0.131, cr_loss=0.3502, over 20984.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.144, cr_loss=0.368, over 4117253.65 frames. ], batch size: 48, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 00:58:14,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=685046.1666666666, ans=0.125 2024-09-18 00:58:22,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=685074.5, ans=0.125 2024-09-18 00:58:47,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=685102.8333333334, ans=0.1 2024-09-18 00:58:53,344 INFO [train.py:1198] (0/2) Epoch 38, batch 5350, loss[loss=0.2241, ctc_loss=0.1525, cr_loss=0.358, over 19210.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1438, cr_loss=0.3673, over 4108539.88 frames. ], batch size: 90, lr: 2.16e-03, grad_scale: 16.0 2024-09-18 00:58:55,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=685131.1666666666, ans=0.125 2024-09-18 00:58:59,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=685131.1666666666, ans=0.125 2024-09-18 00:59:41,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=685216.1666666666, ans=0.1 2024-09-18 00:59:54,581 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.204e+02 2.357e+02 2.541e+02 3.874e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-18 00:59:56,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=685244.5, ans=0.0 2024-09-18 01:00:07,921 INFO [train.py:1198] (0/2) Epoch 38, batch 5400, loss[loss=0.2229, ctc_loss=0.1485, cr_loss=0.3721, over 20982.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1444, cr_loss=0.3676, over 4084298.75 frames. ], batch size: 58, lr: 2.16e-03, grad_scale: 16.0 2024-09-18 01:00:14,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=685272.8333333334, ans=0.0 2024-09-18 01:00:31,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=685301.1666666666, ans=0.05 2024-09-18 01:01:03,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-09-18 01:01:04,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=685357.8333333334, ans=0.125 2024-09-18 01:01:22,629 INFO [train.py:1198] (0/2) Epoch 38, batch 5450, loss[loss=0.1918, ctc_loss=0.125, cr_loss=0.3338, over 19878.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1446, cr_loss=0.3681, over 4080103.51 frames. ], batch size: 44, lr: 2.16e-03, grad_scale: 16.0 2024-09-18 01:01:31,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=685414.5, ans=0.125 2024-09-18 01:01:54,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685471.1666666666, ans=0.1 2024-09-18 01:02:23,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.229e+02 2.376e+02 2.564e+02 3.729e+02, threshold=4.752e+02, percent-clipped=0.0 2024-09-18 01:02:30,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=685527.8333333334, ans=0.125 2024-09-18 01:02:37,684 INFO [train.py:1198] (0/2) Epoch 38, batch 5500, loss[loss=0.2553, ctc_loss=0.1725, cr_loss=0.4141, over 20841.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1442, cr_loss=0.3677, over 4082666.82 frames. ], batch size: 65, lr: 2.16e-03, grad_scale: 16.0 2024-09-18 01:02:39,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-18 01:02:41,145 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:02:42,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=685556.1666666666, ans=0.125 2024-09-18 01:02:45,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=685556.1666666666, ans=0.2 2024-09-18 01:02:45,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=685556.1666666666, ans=0.125 2024-09-18 01:03:18,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-18 01:03:32,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685641.1666666666, ans=0.1 2024-09-18 01:03:52,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.04 vs. limit=10.0 2024-09-18 01:03:57,543 INFO [train.py:1198] (0/2) Epoch 38, batch 5550, loss[loss=0.2397, ctc_loss=0.1574, cr_loss=0.4114, over 21015.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1437, cr_loss=0.3667, over 4082801.16 frames. ], batch size: 63, lr: 2.16e-03, grad_scale: 16.0 2024-09-18 01:04:05,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=685697.8333333334, ans=0.125 2024-09-18 01:04:17,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=685726.1666666666, ans=0.125 2024-09-18 01:04:18,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=685726.1666666666, ans=0.0 2024-09-18 01:04:23,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-18 01:04:50,675 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2024-09-18 01:04:58,865 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.165e+02 2.298e+02 2.438e+02 4.355e+02, threshold=4.597e+02, percent-clipped=0.0 2024-09-18 01:05:12,220 INFO [train.py:1198] (0/2) Epoch 38, batch 5600, loss[loss=0.2407, ctc_loss=0.1623, cr_loss=0.3919, over 19459.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1432, cr_loss=0.366, over 4091870.62 frames. ], batch size: 90, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:05:55,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=685924.5, ans=0.2 2024-09-18 01:06:01,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=685924.5, ans=0.125 2024-09-18 01:06:26,291 INFO [train.py:1198] (0/2) Epoch 38, batch 5650, loss[loss=0.2248, ctc_loss=0.1522, cr_loss=0.3625, over 20883.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1439, cr_loss=0.3676, over 4092250.51 frames. ], batch size: 57, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:06:50,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=686009.5, ans=0.125 2024-09-18 01:07:02,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=686037.8333333334, ans=0.2 2024-09-18 01:07:27,098 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.236e+02 2.355e+02 2.585e+02 3.333e+02, threshold=4.710e+02, percent-clipped=0.0 2024-09-18 01:07:31,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=686094.5, ans=0.0 2024-09-18 01:07:34,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=686094.5, ans=0.0 2024-09-18 01:07:40,406 INFO [train.py:1198] (0/2) Epoch 38, batch 5700, loss[loss=0.2311, ctc_loss=0.1522, cr_loss=0.3946, over 20689.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.369, over 4098052.64 frames. ], batch size: 68, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:08:16,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.50 vs. limit=10.0 2024-09-18 01:08:27,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=686207.8333333334, ans=0.0 2024-09-18 01:08:40,790 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=22.5 2024-09-18 01:08:49,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=686236.1666666666, ans=0.125 2024-09-18 01:08:55,284 INFO [train.py:1198] (0/2) Epoch 38, batch 5750, loss[loss=0.2315, ctc_loss=0.1517, cr_loss=0.3989, over 20854.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3702, over 4105933.92 frames. ], batch size: 65, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:09:56,652 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.234e+02 2.378e+02 2.540e+02 4.922e+02, threshold=4.756e+02, percent-clipped=1.0 2024-09-18 01:09:58,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=686377.8333333334, ans=0.125 2024-09-18 01:10:10,229 INFO [train.py:1198] (0/2) Epoch 38, batch 5800, loss[loss=0.1964, ctc_loss=0.1275, cr_loss=0.3443, over 20990.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1447, cr_loss=0.3704, over 4118297.28 frames. ], batch size: 52, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:10:12,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=686406.1666666666, ans=0.0 2024-09-18 01:10:16,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=686406.1666666666, ans=0.025 2024-09-18 01:10:59,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=686491.1666666666, ans=0.2 2024-09-18 01:11:01,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=686491.1666666666, ans=0.125 2024-09-18 01:11:07,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=686491.1666666666, ans=0.2 2024-09-18 01:11:27,208 INFO [train.py:1198] (0/2) Epoch 38, batch 5850, loss[loss=0.247, ctc_loss=0.1658, cr_loss=0.4063, over 19337.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3709, over 4107790.48 frames. ], batch size: 90, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:11:30,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=686547.8333333334, ans=0.2 2024-09-18 01:11:33,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=686547.8333333334, ans=0.09899494936611666 2024-09-18 01:12:02,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=686604.5, ans=0.125 2024-09-18 01:12:21,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.71 vs. limit=10.0 2024-09-18 01:12:28,575 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.263e+02 2.383e+02 2.562e+02 4.441e+02, threshold=4.766e+02, percent-clipped=0.0 2024-09-18 01:12:44,255 INFO [train.py:1198] (0/2) Epoch 38, batch 5900, loss[loss=0.1732, ctc_loss=0.1128, cr_loss=0.3024, over 20985.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1446, cr_loss=0.3693, over 4098374.64 frames. ], batch size: 48, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:12:48,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=8.0 2024-09-18 01:13:18,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=686746.1666666666, ans=0.125 2024-09-18 01:13:20,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=686746.1666666666, ans=0.125 2024-09-18 01:13:45,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=686802.8333333334, ans=0.125 2024-09-18 01:13:48,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=686802.8333333334, ans=0.0 2024-09-18 01:13:57,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=686831.1666666666, ans=0.125 2024-09-18 01:13:58,798 INFO [train.py:1198] (0/2) Epoch 38, batch 5950, loss[loss=0.1982, ctc_loss=0.1305, cr_loss=0.3387, over 20878.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3695, over 4104915.68 frames. ], batch size: 54, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:14:12,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=686859.5, ans=0.2 2024-09-18 01:14:12,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2024-09-18 01:14:16,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=686859.5, ans=0.04949747468305833 2024-09-18 01:14:25,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=686859.5, ans=0.2 2024-09-18 01:14:55,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=686916.1666666666, ans=0.125 2024-09-18 01:14:59,849 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.184e+02 2.312e+02 2.587e+02 4.338e+02, threshold=4.624e+02, percent-clipped=0.0 2024-09-18 01:15:03,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=686944.5, ans=0.0 2024-09-18 01:15:06,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=686944.5, ans=0.2 2024-09-18 01:15:07,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=686944.5, ans=0.04949747468305833 2024-09-18 01:15:13,343 INFO [train.py:1198] (0/2) Epoch 38, batch 6000, loss[loss=0.2395, ctc_loss=0.1602, cr_loss=0.3966, over 20827.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1453, cr_loss=0.37, over 4087982.44 frames. ], batch size: 59, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:15:13,344 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 01:15:34,936 INFO [train.py:1230] (0/2) Epoch 38, validation: loss=0.03984, ctc_loss=0.03984, cr_loss=1.403e-14, over 944034.00 frames. 2024-09-18 01:15:34,936 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 01:15:49,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=687001.1666666666, ans=0.0 2024-09-18 01:15:52,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=687001.1666666666, ans=0.0 2024-09-18 01:16:34,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=687086.1666666666, ans=0.125 2024-09-18 01:16:48,771 INFO [train.py:1198] (0/2) Epoch 38, batch 6050, loss[loss=0.2386, ctc_loss=0.1585, cr_loss=0.4004, over 20936.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3705, over 4088929.00 frames. ], batch size: 60, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:17:06,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=687142.8333333334, ans=0.1 2024-09-18 01:17:50,521 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.121e+02 2.271e+02 2.489e+02 3.721e+02, threshold=4.541e+02, percent-clipped=0.0 2024-09-18 01:18:04,030 INFO [train.py:1198] (0/2) Epoch 38, batch 6100, loss[loss=0.2126, ctc_loss=0.1382, cr_loss=0.3717, over 20659.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1454, cr_loss=0.3699, over 4093470.86 frames. ], batch size: 66, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:18:07,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=687256.1666666666, ans=0.125 2024-09-18 01:18:07,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=687256.1666666666, ans=0.125 2024-09-18 01:18:13,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=687256.1666666666, ans=0.125 2024-09-18 01:18:13,479 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2024-09-18 01:18:30,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=687284.5, ans=0.0 2024-09-18 01:18:50,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=687341.1666666666, ans=0.2 2024-09-18 01:19:10,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=687369.5, ans=0.125 2024-09-18 01:19:17,687 INFO [train.py:1198] (0/2) Epoch 38, batch 6150, loss[loss=0.2238, ctc_loss=0.1494, cr_loss=0.3723, over 21042.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1462, cr_loss=0.3705, over 4073515.76 frames. ], batch size: 56, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:19:18,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=12.0 2024-09-18 01:19:42,316 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=15.0 2024-09-18 01:20:00,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=687454.5, ans=0.0 2024-09-18 01:20:20,026 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.208e+02 2.352e+02 2.559e+02 3.096e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 01:20:24,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=687511.1666666666, ans=0.125 2024-09-18 01:20:33,176 INFO [train.py:1198] (0/2) Epoch 38, batch 6200, loss[loss=0.2272, ctc_loss=0.1529, cr_loss=0.371, over 20714.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1448, cr_loss=0.3687, over 4078481.06 frames. ], batch size: 71, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:20:46,751 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2024-09-18 01:21:11,726 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2024-09-18 01:21:46,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=687681.1666666666, ans=0.0 2024-09-18 01:21:48,063 INFO [train.py:1198] (0/2) Epoch 38, batch 6250, loss[loss=0.1869, ctc_loss=0.1232, cr_loss=0.3187, over 20972.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1458, cr_loss=0.3694, over 4039825.54 frames. ], batch size: 52, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:22:46,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.267e+02 2.458e+02 2.648e+02 4.941e+02, threshold=4.916e+02, percent-clipped=1.0 2024-09-18 01:22:59,757 INFO [train.py:1198] (0/2) Epoch 38, batch 6300, loss[loss=0.2711, ctc_loss=0.1852, cr_loss=0.4294, over 18141.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.148, cr_loss=0.3713, over 3954554.50 frames. ], batch size: 108, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:23:08,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-18 01:23:20,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=687851.1666666666, ans=0.125 2024-09-18 01:23:30,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=687879.5, ans=0.125 2024-09-18 01:23:47,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=687907.8333333334, ans=0.125 2024-09-18 01:24:02,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-09-18 01:24:03,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=687936.1666666666, ans=0.125 2024-09-18 01:24:10,114 INFO [train.py:1198] (0/2) Epoch 38, batch 6350, loss[loss=0.2347, ctc_loss=0.1599, cr_loss=0.3736, over 14908.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1539, cr_loss=0.3766, over 3771050.11 frames. ], batch size: 149, lr: 2.16e-03, grad_scale: 32.0 2024-09-18 01:24:10,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=687964.5, ans=0.0 2024-09-18 01:24:16,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=687964.5, ans=0.2 2024-09-18 01:24:23,579 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:24:32,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=687992.8333333334, ans=0.2 2024-09-18 01:24:36,996 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=12.0 2024-09-18 01:24:55,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=688049.5, ans=0.125 2024-09-18 01:25:04,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=688049.5, ans=0.0 2024-09-18 01:25:09,554 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-38.pt 2024-09-18 01:25:57,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2024-09-18 01:25:58,166 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.185e+02 2.573e+02 2.741e+02 2.937e+02 3.484e+02, threshold=5.482e+02, percent-clipped=0.0 2024-09-18 01:25:58,185 INFO [train.py:1198] (0/2) Epoch 39, batch 0, loss[loss=0.1987, ctc_loss=0.1324, cr_loss=0.3316, over 20966.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1324, cr_loss=0.3316, over 20966.00 frames. ], batch size: 55, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:25:58,186 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 01:26:12,516 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2314, 4.8275, 5.0950, 4.6133], device='cuda:0') 2024-09-18 01:26:16,550 INFO [train.py:1230] (0/2) Epoch 39, validation: loss=0.03987, ctc_loss=0.03987, cr_loss=1.418e-14, over 944034.00 frames. 2024-09-18 01:26:16,551 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 01:26:35,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=688109.0, ans=0.125 2024-09-18 01:26:36,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=688109.0, ans=0.125 2024-09-18 01:27:05,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=688165.6666666666, ans=0.2 2024-09-18 01:27:11,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=688165.6666666666, ans=0.0 2024-09-18 01:27:35,308 INFO [train.py:1198] (0/2) Epoch 39, batch 50, loss[loss=0.2297, ctc_loss=0.1554, cr_loss=0.3716, over 19558.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1473, cr_loss=0.3713, over 927324.06 frames. ], batch size: 90, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:27:47,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=688222.3333333334, ans=0.125 2024-09-18 01:27:49,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=22.5 2024-09-18 01:28:51,203 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.207e+02 2.366e+02 2.486e+02 4.313e+02, threshold=4.732e+02, percent-clipped=0.0 2024-09-18 01:28:51,225 INFO [train.py:1198] (0/2) Epoch 39, batch 100, loss[loss=0.1971, ctc_loss=0.1269, cr_loss=0.3512, over 20774.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1466, cr_loss=0.3705, over 1621242.92 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:29:33,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=688420.6666666666, ans=0.0 2024-09-18 01:29:53,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.26 vs. limit=10.0 2024-09-18 01:30:00,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=688477.3333333334, ans=0.025 2024-09-18 01:30:06,556 INFO [train.py:1198] (0/2) Epoch 39, batch 150, loss[loss=0.2201, ctc_loss=0.1451, cr_loss=0.3749, over 20781.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1477, cr_loss=0.3738, over 2178970.46 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:30:09,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=688505.6666666666, ans=0.0 2024-09-18 01:30:14,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=688505.6666666666, ans=0.0 2024-09-18 01:30:17,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=688505.6666666666, ans=0.0 2024-09-18 01:30:23,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-09-18 01:31:01,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=688590.6666666666, ans=0.125 2024-09-18 01:31:24,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=688647.3333333334, ans=0.125 2024-09-18 01:31:25,303 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.205e+02 2.315e+02 2.455e+02 3.675e+02, threshold=4.631e+02, percent-clipped=0.0 2024-09-18 01:31:25,321 INFO [train.py:1198] (0/2) Epoch 39, batch 200, loss[loss=0.2275, ctc_loss=0.1511, cr_loss=0.3821, over 21000.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1476, cr_loss=0.372, over 2583301.45 frames. ], batch size: 61, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:31:45,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=688675.6666666666, ans=0.125 2024-09-18 01:32:10,084 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.26 vs. limit=6.0 2024-09-18 01:32:20,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.09 vs. limit=10.0 2024-09-18 01:32:27,700 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:32:37,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-18 01:32:44,923 INFO [train.py:1198] (0/2) Epoch 39, batch 250, loss[loss=0.2217, ctc_loss=0.1456, cr_loss=0.3807, over 21087.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1467, cr_loss=0.3714, over 2910894.64 frames. ], batch size: 59, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:33:07,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=688817.3333333334, ans=0.125 2024-09-18 01:33:17,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=688845.6666666666, ans=0.0 2024-09-18 01:33:58,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=12.0 2024-09-18 01:34:00,652 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.215e+02 2.353e+02 2.497e+02 4.786e+02, threshold=4.706e+02, percent-clipped=1.0 2024-09-18 01:34:00,670 INFO [train.py:1198] (0/2) Epoch 39, batch 300, loss[loss=0.1999, ctc_loss=0.1312, cr_loss=0.3437, over 21048.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1462, cr_loss=0.3707, over 3164571.26 frames. ], batch size: 53, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:34:46,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=689015.6666666666, ans=0.125 2024-09-18 01:34:50,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689015.6666666666, ans=0.1 2024-09-18 01:35:13,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=689044.0, ans=0.0 2024-09-18 01:35:13,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=689044.0, ans=0.125 2024-09-18 01:35:14,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=689072.3333333334, ans=0.125 2024-09-18 01:35:16,119 INFO [train.py:1198] (0/2) Epoch 39, batch 350, loss[loss=0.1777, ctc_loss=0.1165, cr_loss=0.3064, over 20244.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1457, cr_loss=0.3692, over 3369175.04 frames. ], batch size: 45, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:35:16,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=689072.3333333334, ans=0.125 2024-09-18 01:35:22,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=689072.3333333334, ans=0.05 2024-09-18 01:35:27,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2024-09-18 01:35:29,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=689100.6666666666, ans=0.125 2024-09-18 01:35:30,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=689100.6666666666, ans=22.5 2024-09-18 01:35:33,262 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:35:45,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=689129.0, ans=0.125 2024-09-18 01:36:13,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=689157.3333333334, ans=0.025 2024-09-18 01:36:26,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=689185.6666666666, ans=0.125 2024-09-18 01:36:35,551 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.171e+02 2.344e+02 2.456e+02 3.665e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-18 01:36:35,581 INFO [train.py:1198] (0/2) Epoch 39, batch 400, loss[loss=0.2203, ctc_loss=0.1448, cr_loss=0.3775, over 20969.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1452, cr_loss=0.3689, over 3538227.98 frames. ], batch size: 58, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:37:07,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=689270.6666666666, ans=0.025 2024-09-18 01:37:35,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-18 01:37:39,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-09-18 01:37:51,289 INFO [train.py:1198] (0/2) Epoch 39, batch 450, loss[loss=0.2291, ctc_loss=0.1512, cr_loss=0.3895, over 20822.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1467, cr_loss=0.3713, over 3647641.90 frames. ], batch size: 59, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:37:51,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=689355.6666666666, ans=0.125 2024-09-18 01:38:03,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=689355.6666666666, ans=0.0 2024-09-18 01:38:34,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=689412.3333333334, ans=0.125 2024-09-18 01:38:54,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=689469.0, ans=0.0 2024-09-18 01:38:57,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689469.0, ans=0.1 2024-09-18 01:39:10,209 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.249e+02 2.367e+02 2.613e+02 3.957e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-18 01:39:10,228 INFO [train.py:1198] (0/2) Epoch 39, batch 500, loss[loss=0.1966, ctc_loss=0.1296, cr_loss=0.3349, over 20885.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1471, cr_loss=0.372, over 3735085.30 frames. ], batch size: 54, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:39:41,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689554.0, ans=0.1 2024-09-18 01:40:09,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=689610.6666666666, ans=0.2 2024-09-18 01:40:23,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=689610.6666666666, ans=0.025 2024-09-18 01:40:25,970 INFO [train.py:1198] (0/2) Epoch 39, batch 550, loss[loss=0.2478, ctc_loss=0.1658, cr_loss=0.4099, over 20829.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1466, cr_loss=0.3712, over 3815454.82 frames. ], batch size: 65, lr: 2.13e-03, grad_scale: 32.0 2024-09-18 01:40:29,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=689639.0, ans=0.0 2024-09-18 01:40:35,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=689639.0, ans=0.125 2024-09-18 01:40:37,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=689639.0, ans=0.0 2024-09-18 01:41:42,195 INFO [train.py:1198] (0/2) Epoch 39, batch 600, loss[loss=0.2423, ctc_loss=0.1601, cr_loss=0.4107, over 19463.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1457, cr_loss=0.37, over 3891609.50 frames. ], batch size: 90, lr: 2.13e-03, grad_scale: 16.0 2024-09-18 01:41:43,760 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.192e+02 2.325e+02 2.469e+02 3.248e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-18 01:41:53,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=689780.6666666666, ans=0.125 2024-09-18 01:43:01,324 INFO [train.py:1198] (0/2) Epoch 39, batch 650, loss[loss=0.1838, ctc_loss=0.1207, cr_loss=0.3159, over 20971.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.145, cr_loss=0.3696, over 3942794.09 frames. ], batch size: 49, lr: 2.13e-03, grad_scale: 16.0 2024-09-18 01:43:11,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=689922.3333333334, ans=0.1 2024-09-18 01:43:25,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=689950.6666666666, ans=0.125 2024-09-18 01:43:34,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2024-09-18 01:43:50,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-09-18 01:44:14,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=690035.6666666666, ans=0.125 2024-09-18 01:44:15,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=690035.6666666666, ans=0.035 2024-09-18 01:44:20,012 INFO [train.py:1198] (0/2) Epoch 39, batch 700, loss[loss=0.2228, ctc_loss=0.1461, cr_loss=0.3832, over 21057.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1448, cr_loss=0.3692, over 3985620.77 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 16.0 2024-09-18 01:44:21,461 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.187e+02 2.336e+02 2.483e+02 5.228e+02, threshold=4.671e+02, percent-clipped=1.0 2024-09-18 01:44:37,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=690092.3333333334, ans=0.0 2024-09-18 01:44:50,588 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=22.5 2024-09-18 01:44:53,450 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:44:54,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=690120.6666666666, ans=0.125 2024-09-18 01:45:17,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=690149.0, ans=0.125 2024-09-18 01:45:23,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=690177.3333333334, ans=0.125 2024-09-18 01:45:35,702 INFO [train.py:1198] (0/2) Epoch 39, batch 750, loss[loss=0.2271, ctc_loss=0.1538, cr_loss=0.3666, over 21035.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1463, cr_loss=0.3714, over 4010109.56 frames. ], batch size: 62, lr: 2.13e-03, grad_scale: 16.0 2024-09-18 01:45:36,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=22.5 2024-09-18 01:45:43,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=690205.6666666666, ans=0.2 2024-09-18 01:46:00,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=690234.0, ans=0.125 2024-09-18 01:46:32,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=690290.6666666666, ans=0.125 2024-09-18 01:46:39,239 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-09-18 01:46:44,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=690319.0, ans=0.025 2024-09-18 01:46:52,230 INFO [train.py:1198] (0/2) Epoch 39, batch 800, loss[loss=0.2249, ctc_loss=0.1516, cr_loss=0.3663, over 19651.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1463, cr_loss=0.3715, over 4039510.08 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:46:53,743 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.253e+02 2.414e+02 2.538e+02 3.314e+02, threshold=4.829e+02, percent-clipped=0.0 2024-09-18 01:47:00,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=690347.3333333334, ans=0.0 2024-09-18 01:47:04,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=690347.3333333334, ans=0.125 2024-09-18 01:47:07,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=690375.6666666666, ans=0.1 2024-09-18 01:47:50,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=690432.3333333334, ans=0.2 2024-09-18 01:48:03,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=690460.6666666666, ans=0.0 2024-09-18 01:48:10,969 INFO [train.py:1198] (0/2) Epoch 39, batch 850, loss[loss=0.2566, ctc_loss=0.1726, cr_loss=0.42, over 19923.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1465, cr_loss=0.3715, over 4050034.43 frames. ], batch size: 80, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:48:12,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=690489.0, ans=0.2 2024-09-18 01:48:17,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=690489.0, ans=10.0 2024-09-18 01:49:27,448 INFO [train.py:1198] (0/2) Epoch 39, batch 900, loss[loss=0.2337, ctc_loss=0.1552, cr_loss=0.3927, over 20999.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1467, cr_loss=0.3721, over 4062179.28 frames. ], batch size: 67, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:49:28,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2024-09-18 01:49:28,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.209e+02 2.350e+02 2.525e+02 4.801e+02, threshold=4.701e+02, percent-clipped=0.0 2024-09-18 01:49:43,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=690659.0, ans=0.0 2024-09-18 01:49:49,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=12.0 2024-09-18 01:50:11,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=690687.3333333334, ans=0.125 2024-09-18 01:50:22,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=690715.6666666666, ans=0.2 2024-09-18 01:50:31,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=690744.0, ans=0.1 2024-09-18 01:50:46,501 INFO [train.py:1198] (0/2) Epoch 39, batch 950, loss[loss=0.1912, ctc_loss=0.125, cr_loss=0.3314, over 21009.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1455, cr_loss=0.3696, over 4070240.66 frames. ], batch size: 52, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:51:04,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=690800.6666666666, ans=0.025 2024-09-18 01:52:01,737 INFO [train.py:1198] (0/2) Epoch 39, batch 1000, loss[loss=0.2159, ctc_loss=0.1437, cr_loss=0.361, over 20972.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1452, cr_loss=0.3689, over 4091402.31 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 01:52:02,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=690914.0, ans=0.125 2024-09-18 01:52:02,533 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-18 01:52:04,766 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.207e+02 2.328e+02 2.485e+02 3.427e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-18 01:53:04,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-09-18 01:53:17,069 INFO [train.py:1198] (0/2) Epoch 39, batch 1050, loss[loss=0.1916, ctc_loss=0.1253, cr_loss=0.3314, over 20301.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1447, cr_loss=0.3676, over 4086959.30 frames. ], batch size: 45, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 01:54:02,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=691112.3333333334, ans=0.125 2024-09-18 01:54:35,706 INFO [train.py:1198] (0/2) Epoch 39, batch 1100, loss[loss=0.237, ctc_loss=0.158, cr_loss=0.3947, over 20977.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1461, cr_loss=0.37, over 4089234.95 frames. ], batch size: 64, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 01:54:38,799 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.228e+02 2.356e+02 2.530e+02 2.946e+02, threshold=4.711e+02, percent-clipped=0.0 2024-09-18 01:54:39,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=691197.3333333334, ans=0.2 2024-09-18 01:54:44,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=691197.3333333334, ans=0.025 2024-09-18 01:54:59,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2024-09-18 01:55:11,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=691254.0, ans=0.04949747468305833 2024-09-18 01:55:15,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=691254.0, ans=0.125 2024-09-18 01:55:50,551 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-244000.pt 2024-09-18 01:55:56,162 INFO [train.py:1198] (0/2) Epoch 39, batch 1150, loss[loss=0.2125, ctc_loss=0.1393, cr_loss=0.3659, over 21078.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1452, cr_loss=0.3684, over 4095930.14 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 01:56:04,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=691339.0, ans=0.2 2024-09-18 01:56:13,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=691367.3333333334, ans=0.2 2024-09-18 01:56:16,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-18 01:56:46,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=691424.0, ans=0.125 2024-09-18 01:56:52,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=691424.0, ans=0.125 2024-09-18 01:56:59,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=691452.3333333334, ans=0.125 2024-09-18 01:57:12,973 INFO [train.py:1198] (0/2) Epoch 39, batch 1200, loss[loss=0.2256, ctc_loss=0.149, cr_loss=0.3827, over 21078.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1448, cr_loss=0.3682, over 4112629.76 frames. ], batch size: 59, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:57:15,976 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.212e+02 2.345e+02 2.470e+02 3.222e+02, threshold=4.690e+02, percent-clipped=0.0 2024-09-18 01:57:19,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=691480.6666666666, ans=0.125 2024-09-18 01:57:43,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=691537.3333333334, ans=0.125 2024-09-18 01:58:29,012 INFO [train.py:1198] (0/2) Epoch 39, batch 1250, loss[loss=0.194, ctc_loss=0.1268, cr_loss=0.336, over 20973.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.145, cr_loss=0.3691, over 4099979.04 frames. ], batch size: 51, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:58:41,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=691622.3333333334, ans=0.2 2024-09-18 01:59:45,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=691735.6666666666, ans=0.2 2024-09-18 01:59:46,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-09-18 01:59:48,444 INFO [train.py:1198] (0/2) Epoch 39, batch 1300, loss[loss=0.2253, ctc_loss=0.1481, cr_loss=0.386, over 20252.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1448, cr_loss=0.3691, over 4108811.84 frames. ], batch size: 74, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 01:59:48,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=691764.0, ans=0.0 2024-09-18 01:59:51,555 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.209e+02 2.365e+02 2.635e+02 3.413e+02, threshold=4.730e+02, percent-clipped=0.0 2024-09-18 02:00:13,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=691792.3333333334, ans=0.025 2024-09-18 02:00:17,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=691820.6666666666, ans=0.1 2024-09-18 02:00:19,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=691820.6666666666, ans=0.125 2024-09-18 02:00:28,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=691820.6666666666, ans=0.1 2024-09-18 02:00:49,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=691877.3333333334, ans=0.0 2024-09-18 02:00:53,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=691877.3333333334, ans=0.015 2024-09-18 02:00:58,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=691877.3333333334, ans=0.125 2024-09-18 02:01:03,954 INFO [train.py:1198] (0/2) Epoch 39, batch 1350, loss[loss=0.2281, ctc_loss=0.1519, cr_loss=0.381, over 20659.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1447, cr_loss=0.3696, over 4105751.05 frames. ], batch size: 71, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:01:09,242 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.62 vs. limit=15.0 2024-09-18 02:01:18,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=691934.0, ans=0.125 2024-09-18 02:01:18,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=691934.0, ans=0.0 2024-09-18 02:01:21,831 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=8.0 2024-09-18 02:02:06,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=692019.0, ans=0.0 2024-09-18 02:02:09,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2024-09-18 02:02:22,572 INFO [train.py:1198] (0/2) Epoch 39, batch 1400, loss[loss=0.2248, ctc_loss=0.1506, cr_loss=0.3712, over 21028.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3703, over 4098256.51 frames. ], batch size: 63, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:02:25,573 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.227e+02 2.358e+02 2.516e+02 4.226e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-18 02:03:29,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=692160.6666666666, ans=0.05 2024-09-18 02:03:38,790 INFO [train.py:1198] (0/2) Epoch 39, batch 1450, loss[loss=0.2399, ctc_loss=0.1581, cr_loss=0.4086, over 20943.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.37, over 4090294.66 frames. ], batch size: 60, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:04:20,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=692245.6666666666, ans=0.0 2024-09-18 02:04:21,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=692245.6666666666, ans=0.0 2024-09-18 02:04:31,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-18 02:04:53,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=692330.6666666666, ans=0.1 2024-09-18 02:04:54,830 INFO [train.py:1198] (0/2) Epoch 39, batch 1500, loss[loss=0.2201, ctc_loss=0.1453, cr_loss=0.3738, over 20986.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1454, cr_loss=0.3699, over 4104949.62 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:04:57,710 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.181e+02 2.284e+02 2.433e+02 3.092e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-18 02:05:47,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=692415.6666666666, ans=10.0 2024-09-18 02:05:53,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=692415.6666666666, ans=0.0 2024-09-18 02:06:13,547 INFO [train.py:1198] (0/2) Epoch 39, batch 1550, loss[loss=0.245, ctc_loss=0.1697, cr_loss=0.3767, over 18272.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.3709, over 4088262.86 frames. ], batch size: 108, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:06:15,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=692472.3333333334, ans=0.2 2024-09-18 02:06:33,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=22.5 2024-09-18 02:06:49,048 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-09-18 02:07:07,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=692557.3333333334, ans=0.0 2024-09-18 02:07:10,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2024-09-18 02:07:33,090 INFO [train.py:1198] (0/2) Epoch 39, batch 1600, loss[loss=0.2473, ctc_loss=0.1696, cr_loss=0.3885, over 18028.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3704, over 4101046.32 frames. ], batch size: 108, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:07:36,035 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.173e+02 2.336e+02 2.487e+02 3.153e+02, threshold=4.671e+02, percent-clipped=0.0 2024-09-18 02:07:40,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=692614.0, ans=0.125 2024-09-18 02:07:46,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=692642.3333333334, ans=0.125 2024-09-18 02:08:35,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-18 02:08:40,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=692727.3333333334, ans=0.2 2024-09-18 02:08:46,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=692727.3333333334, ans=0.025 2024-09-18 02:08:48,615 INFO [train.py:1198] (0/2) Epoch 39, batch 1650, loss[loss=0.1919, ctc_loss=0.1251, cr_loss=0.3341, over 21049.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.3697, over 4098618.31 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:08:51,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=692755.6666666666, ans=0.2 2024-09-18 02:09:15,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=692784.0, ans=0.1 2024-09-18 02:10:04,193 INFO [train.py:1198] (0/2) Epoch 39, batch 1700, loss[loss=0.2166, ctc_loss=0.1431, cr_loss=0.3678, over 20906.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.3697, over 4103023.46 frames. ], batch size: 54, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:10:08,612 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.165e+02 2.297e+02 2.416e+02 6.730e+02, threshold=4.595e+02, percent-clipped=1.0 2024-09-18 02:10:10,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=692897.3333333334, ans=0.125 2024-09-18 02:10:37,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=692954.0, ans=0.125 2024-09-18 02:10:50,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=692982.3333333334, ans=0.125 2024-09-18 02:11:03,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=692982.3333333334, ans=0.5 2024-09-18 02:11:09,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693010.6666666666, ans=0.1 2024-09-18 02:11:22,981 INFO [train.py:1198] (0/2) Epoch 39, batch 1750, loss[loss=0.2181, ctc_loss=0.1424, cr_loss=0.3783, over 21065.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.145, cr_loss=0.3698, over 4084913.89 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:11:29,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=693039.0, ans=0.125 2024-09-18 02:12:09,346 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=15.0 2024-09-18 02:12:14,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=693124.0, ans=0.125 2024-09-18 02:12:15,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-09-18 02:12:37,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=693180.6666666666, ans=0.0 2024-09-18 02:12:38,394 INFO [train.py:1198] (0/2) Epoch 39, batch 1800, loss[loss=0.2367, ctc_loss=0.1585, cr_loss=0.3911, over 20835.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3702, over 4089663.18 frames. ], batch size: 65, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:12:42,803 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.216e+02 2.325e+02 2.535e+02 3.394e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-18 02:12:44,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693180.6666666666, ans=0.1 2024-09-18 02:13:29,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=693265.6666666666, ans=0.2 2024-09-18 02:13:32,692 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:13:32,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=693265.6666666666, ans=0.0 2024-09-18 02:13:56,561 INFO [train.py:1198] (0/2) Epoch 39, batch 1850, loss[loss=0.2151, ctc_loss=0.1415, cr_loss=0.3676, over 20770.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3698, over 4100261.76 frames. ], batch size: 56, lr: 2.12e-03, grad_scale: 8.0 2024-09-18 02:14:01,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=693322.3333333334, ans=0.125 2024-09-18 02:14:46,831 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2024-09-18 02:14:52,765 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=12.0 2024-09-18 02:15:13,214 INFO [train.py:1198] (0/2) Epoch 39, batch 1900, loss[loss=0.2547, ctc_loss=0.1687, cr_loss=0.43, over 20716.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1458, cr_loss=0.3707, over 4095305.32 frames. ], batch size: 71, lr: 2.12e-03, grad_scale: 8.0 2024-09-18 02:15:19,267 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.166e+02 2.310e+02 2.460e+02 3.340e+02, threshold=4.621e+02, percent-clipped=0.0 2024-09-18 02:15:41,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.50 vs. limit=10.0 2024-09-18 02:15:51,553 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-18 02:16:28,581 INFO [train.py:1198] (0/2) Epoch 39, batch 1950, loss[loss=0.214, ctc_loss=0.1425, cr_loss=0.3573, over 20129.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3692, over 4085252.00 frames. ], batch size: 80, lr: 2.12e-03, grad_scale: 8.0 2024-09-18 02:17:43,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.10 vs. limit=10.0 2024-09-18 02:17:47,114 INFO [train.py:1198] (0/2) Epoch 39, batch 2000, loss[loss=0.2468, ctc_loss=0.1656, cr_loss=0.4056, over 20693.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1457, cr_loss=0.3703, over 4091070.84 frames. ], batch size: 71, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:17:53,403 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.194e+02 2.329e+02 2.491e+02 3.480e+02, threshold=4.659e+02, percent-clipped=0.0 2024-09-18 02:17:58,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=693747.3333333334, ans=0.0 2024-09-18 02:18:01,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=693775.6666666666, ans=0.125 2024-09-18 02:18:01,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=693775.6666666666, ans=0.125 2024-09-18 02:18:35,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693832.3333333334, ans=0.1 2024-09-18 02:19:06,066 INFO [train.py:1198] (0/2) Epoch 39, batch 2050, loss[loss=0.2118, ctc_loss=0.1366, cr_loss=0.3762, over 20963.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.371, over 4089810.01 frames. ], batch size: 55, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:19:25,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.33 vs. limit=15.0 2024-09-18 02:19:40,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-09-18 02:19:43,057 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2024-09-18 02:19:47,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2024-09-18 02:19:51,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693974.0, ans=0.1 2024-09-18 02:20:11,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=694002.3333333334, ans=0.125 2024-09-18 02:20:21,878 INFO [train.py:1198] (0/2) Epoch 39, batch 2100, loss[loss=0.2552, ctc_loss=0.172, cr_loss=0.4156, over 19977.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1457, cr_loss=0.3708, over 4088463.78 frames. ], batch size: 80, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:20:27,986 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.207e+02 2.301e+02 2.550e+02 4.509e+02, threshold=4.602e+02, percent-clipped=0.0 2024-09-18 02:20:49,551 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:20:55,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=694087.3333333334, ans=0.125 2024-09-18 02:20:56,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.63 vs. limit=10.0 2024-09-18 02:21:21,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=694144.0, ans=0.0 2024-09-18 02:21:29,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=694144.0, ans=0.125 2024-09-18 02:21:38,009 INFO [train.py:1198] (0/2) Epoch 39, batch 2150, loss[loss=0.1828, ctc_loss=0.119, cr_loss=0.3191, over 20971.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.145, cr_loss=0.3697, over 4091141.46 frames. ], batch size: 48, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:21:57,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=15.0 2024-09-18 02:22:14,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=694229.0, ans=0.125 2024-09-18 02:22:16,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694229.0, ans=0.1 2024-09-18 02:22:57,061 INFO [train.py:1198] (0/2) Epoch 39, batch 2200, loss[loss=0.2397, ctc_loss=0.1611, cr_loss=0.3928, over 20545.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1446, cr_loss=0.3688, over 4103639.98 frames. ], batch size: 75, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:23:03,268 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.219e+02 2.374e+02 2.554e+02 3.781e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-18 02:23:29,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=694370.6666666666, ans=0.025 2024-09-18 02:23:31,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2024-09-18 02:23:37,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-18 02:23:59,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=694427.3333333334, ans=0.125 2024-09-18 02:24:05,232 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:24:06,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=694427.3333333334, ans=0.125 2024-09-18 02:24:09,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=694427.3333333334, ans=0.0 2024-09-18 02:24:11,588 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-09-18 02:24:12,236 INFO [train.py:1198] (0/2) Epoch 39, batch 2250, loss[loss=0.2664, ctc_loss=0.185, cr_loss=0.4069, over 14104.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1452, cr_loss=0.3691, over 4088218.88 frames. ], batch size: 149, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:24:33,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=694484.0, ans=0.0 2024-09-18 02:24:51,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=694512.3333333334, ans=0.1 2024-09-18 02:24:53,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=694512.3333333334, ans=0.0 2024-09-18 02:25:13,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=694569.0, ans=0.1 2024-09-18 02:25:23,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694569.0, ans=0.1 2024-09-18 02:25:30,111 INFO [train.py:1198] (0/2) Epoch 39, batch 2300, loss[loss=0.2668, ctc_loss=0.1848, cr_loss=0.4099, over 18166.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1455, cr_loss=0.3697, over 4088335.71 frames. ], batch size: 108, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:25:31,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=694597.3333333334, ans=0.125 2024-09-18 02:25:36,044 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.214e+02 2.383e+02 2.580e+02 4.808e+02, threshold=4.766e+02, percent-clipped=1.0 2024-09-18 02:25:36,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=694597.3333333334, ans=0.0 2024-09-18 02:25:48,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694625.6666666666, ans=0.1 2024-09-18 02:26:30,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=694710.6666666666, ans=0.2 2024-09-18 02:26:31,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=694710.6666666666, ans=0.0 2024-09-18 02:26:46,178 INFO [train.py:1198] (0/2) Epoch 39, batch 2350, loss[loss=0.2031, ctc_loss=0.1336, cr_loss=0.3477, over 20970.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1458, cr_loss=0.3703, over 4083746.55 frames. ], batch size: 48, lr: 2.12e-03, grad_scale: 16.0 2024-09-18 02:27:15,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694795.6666666666, ans=0.1 2024-09-18 02:27:28,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=694795.6666666666, ans=0.125 2024-09-18 02:27:43,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=22.5 2024-09-18 02:27:48,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=694852.3333333334, ans=0.07 2024-09-18 02:27:49,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=694852.3333333334, ans=0.125 2024-09-18 02:28:04,660 INFO [train.py:1198] (0/2) Epoch 39, batch 2400, loss[loss=0.223, ctc_loss=0.1472, cr_loss=0.3793, over 20673.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1461, cr_loss=0.3712, over 4084935.54 frames. ], batch size: 66, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:28:10,673 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.205e+02 2.353e+02 2.565e+02 3.948e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-18 02:28:30,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=694909.0, ans=0.025 2024-09-18 02:28:53,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=694965.6666666666, ans=0.0 2024-09-18 02:29:05,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2024-09-18 02:29:16,268 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:29:20,629 INFO [train.py:1198] (0/2) Epoch 39, batch 2450, loss[loss=0.202, ctc_loss=0.1314, cr_loss=0.3531, over 20952.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1458, cr_loss=0.3712, over 4091187.32 frames. ], batch size: 48, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:29:22,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=695022.3333333334, ans=0.0 2024-09-18 02:29:28,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695022.3333333334, ans=0.1 2024-09-18 02:29:53,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-18 02:30:00,821 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2024-09-18 02:30:28,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=695135.6666666666, ans=0.125 2024-09-18 02:30:39,084 INFO [train.py:1198] (0/2) Epoch 39, batch 2500, loss[loss=0.1886, ctc_loss=0.1214, cr_loss=0.3356, over 20951.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1467, cr_loss=0.3733, over 4096354.30 frames. ], batch size: 48, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:30:43,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=695164.0, ans=0.125 2024-09-18 02:30:45,162 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.185e+02 2.340e+02 2.463e+02 3.257e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-18 02:30:52,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=695192.3333333334, ans=0.2 2024-09-18 02:31:00,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=695192.3333333334, ans=0.125 2024-09-18 02:31:30,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=695249.0, ans=0.125 2024-09-18 02:31:39,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=695277.3333333334, ans=0.0 2024-09-18 02:31:45,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2024-09-18 02:31:54,333 INFO [train.py:1198] (0/2) Epoch 39, batch 2550, loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3706, over 21030.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1464, cr_loss=0.3729, over 4104502.55 frames. ], batch size: 62, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:31:57,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=695305.6666666666, ans=0.125 2024-09-18 02:32:09,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=695334.0, ans=0.125 2024-09-18 02:32:25,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.46 vs. limit=15.0 2024-09-18 02:32:29,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=695362.3333333334, ans=0.2 2024-09-18 02:33:09,979 INFO [train.py:1198] (0/2) Epoch 39, batch 2600, loss[loss=0.1991, ctc_loss=0.1337, cr_loss=0.3273, over 20986.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1457, cr_loss=0.372, over 4111248.61 frames. ], batch size: 49, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:33:13,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=695447.3333333334, ans=22.5 2024-09-18 02:33:16,174 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.210e+02 2.325e+02 2.513e+02 3.040e+02, threshold=4.651e+02, percent-clipped=0.0 2024-09-18 02:33:39,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695504.0, ans=0.1 2024-09-18 02:33:48,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=695504.0, ans=0.125 2024-09-18 02:33:48,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=695504.0, ans=0.125 2024-09-18 02:34:17,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695560.6666666666, ans=0.1 2024-09-18 02:34:20,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2024-09-18 02:34:28,563 INFO [train.py:1198] (0/2) Epoch 39, batch 2650, loss[loss=0.1925, ctc_loss=0.1246, cr_loss=0.3396, over 21041.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1449, cr_loss=0.3708, over 4119553.89 frames. ], batch size: 62, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:34:33,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=695589.0, ans=0.0 2024-09-18 02:34:48,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=695617.3333333334, ans=0.125 2024-09-18 02:34:56,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=695617.3333333334, ans=0.2 2024-09-18 02:35:06,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=695645.6666666666, ans=0.125 2024-09-18 02:35:12,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=695674.0, ans=0.0 2024-09-18 02:35:22,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=695674.0, ans=0.0 2024-09-18 02:35:44,708 INFO [train.py:1198] (0/2) Epoch 39, batch 2700, loss[loss=0.2323, ctc_loss=0.1567, cr_loss=0.3782, over 20655.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1448, cr_loss=0.3698, over 4097870.07 frames. ], batch size: 71, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:35:50,743 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.202e+02 2.348e+02 2.533e+02 4.629e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 02:36:01,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=695759.0, ans=0.125 2024-09-18 02:36:19,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=695787.3333333334, ans=0.125 2024-09-18 02:36:22,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=695787.3333333334, ans=0.125 2024-09-18 02:36:56,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=695844.0, ans=0.0 2024-09-18 02:37:03,330 INFO [train.py:1198] (0/2) Epoch 39, batch 2750, loss[loss=0.2671, ctc_loss=0.1835, cr_loss=0.4177, over 14259.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.145, cr_loss=0.3696, over 4090256.12 frames. ], batch size: 149, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:37:03,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=695872.3333333334, ans=0.125 2024-09-18 02:37:19,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=695900.6666666666, ans=0.0 2024-09-18 02:37:41,501 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:38:19,069 INFO [train.py:1198] (0/2) Epoch 39, batch 2800, loss[loss=0.2099, ctc_loss=0.1365, cr_loss=0.3673, over 20956.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3705, over 4085197.47 frames. ], batch size: 64, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:38:25,167 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.185e+02 2.290e+02 2.480e+02 5.351e+02, threshold=4.579e+02, percent-clipped=1.0 2024-09-18 02:38:27,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=15.0 2024-09-18 02:39:03,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=696099.0, ans=0.125 2024-09-18 02:39:04,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=696099.0, ans=0.125 2024-09-18 02:39:12,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=696099.0, ans=0.0 2024-09-18 02:39:16,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=696099.0, ans=0.0 2024-09-18 02:39:38,066 INFO [train.py:1198] (0/2) Epoch 39, batch 2850, loss[loss=0.2312, ctc_loss=0.1516, cr_loss=0.3982, over 20958.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3702, over 4078866.92 frames. ], batch size: 64, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:39:38,539 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:39:54,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=696184.0, ans=0.125 2024-09-18 02:39:59,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=696184.0, ans=0.0 2024-09-18 02:40:19,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=696212.3333333334, ans=0.125 2024-09-18 02:40:46,569 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.55 vs. limit=10.0 2024-09-18 02:40:53,638 INFO [train.py:1198] (0/2) Epoch 39, batch 2900, loss[loss=0.2298, ctc_loss=0.1516, cr_loss=0.3909, over 20952.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3686, over 4077663.62 frames. ], batch size: 60, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:40:59,471 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.170e+02 2.308e+02 2.451e+02 3.176e+02, threshold=4.617e+02, percent-clipped=0.0 2024-09-18 02:41:21,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.42 vs. limit=15.0 2024-09-18 02:41:22,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=696354.0, ans=0.0 2024-09-18 02:41:27,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=696354.0, ans=0.05 2024-09-18 02:41:39,457 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2024-09-18 02:42:12,291 INFO [train.py:1198] (0/2) Epoch 39, batch 2950, loss[loss=0.2024, ctc_loss=0.1331, cr_loss=0.3464, over 20365.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3685, over 4092038.97 frames. ], batch size: 74, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:42:35,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=696467.3333333334, ans=0.1 2024-09-18 02:42:53,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=696495.6666666666, ans=0.0 2024-09-18 02:42:53,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=696495.6666666666, ans=0.0 2024-09-18 02:42:59,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=696524.0, ans=0.125 2024-09-18 02:43:00,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.18 vs. limit=10.0 2024-09-18 02:43:20,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=696552.3333333334, ans=0.125 2024-09-18 02:43:25,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=696552.3333333334, ans=0.125 2024-09-18 02:43:27,899 INFO [train.py:1198] (0/2) Epoch 39, batch 3000, loss[loss=0.1845, ctc_loss=0.1189, cr_loss=0.3279, over 20945.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3685, over 4097316.80 frames. ], batch size: 50, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:43:27,900 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 02:43:51,730 INFO [train.py:1230] (0/2) Epoch 39, validation: loss=0.03975, ctc_loss=0.03975, cr_loss=1.411e-14, over 944034.00 frames. 2024-09-18 02:43:51,731 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 02:43:57,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.205e+02 2.340e+02 2.590e+02 3.976e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-18 02:44:02,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696580.6666666666, ans=0.1 2024-09-18 02:44:34,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=696637.3333333334, ans=0.125 2024-09-18 02:45:10,534 INFO [train.py:1198] (0/2) Epoch 39, batch 3050, loss[loss=0.2183, ctc_loss=0.1441, cr_loss=0.3708, over 20883.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1439, cr_loss=0.3667, over 4096346.68 frames. ], batch size: 54, lr: 2.12e-03, grad_scale: 32.0 2024-09-18 02:45:31,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=696750.6666666666, ans=0.125 2024-09-18 02:45:41,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=696779.0, ans=0.125 2024-09-18 02:45:43,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2024-09-18 02:45:44,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=696779.0, ans=0.125 2024-09-18 02:45:56,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=696807.3333333334, ans=0.125 2024-09-18 02:46:26,417 INFO [train.py:1198] (0/2) Epoch 39, batch 3100, loss[loss=0.2106, ctc_loss=0.1393, cr_loss=0.3566, over 20940.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1443, cr_loss=0.3681, over 4090579.41 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:46:34,073 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.214e+02 2.345e+02 2.505e+02 3.409e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-18 02:46:43,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696892.3333333334, ans=0.1 2024-09-18 02:46:45,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=696892.3333333334, ans=0.125 2024-09-18 02:46:47,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=696892.3333333334, ans=15.0 2024-09-18 02:47:44,737 INFO [train.py:1198] (0/2) Epoch 39, batch 3150, loss[loss=0.2511, ctc_loss=0.1694, cr_loss=0.4082, over 20669.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1447, cr_loss=0.3691, over 4094892.46 frames. ], batch size: 68, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 02:48:22,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=697062.3333333334, ans=0.1 2024-09-18 02:48:54,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=697119.0, ans=0.125 2024-09-18 02:49:00,483 INFO [train.py:1198] (0/2) Epoch 39, batch 3200, loss[loss=0.2033, ctc_loss=0.135, cr_loss=0.3416, over 20846.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1443, cr_loss=0.3686, over 4099252.22 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:49:08,095 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.154e+02 2.285e+02 2.445e+02 2.956e+02, threshold=4.570e+02, percent-clipped=0.0 2024-09-18 02:49:55,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=697232.3333333334, ans=0.025 2024-09-18 02:50:08,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697260.6666666666, ans=0.1 2024-09-18 02:50:16,750 INFO [train.py:1198] (0/2) Epoch 39, batch 3250, loss[loss=0.1826, ctc_loss=0.1182, cr_loss=0.3219, over 20988.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1449, cr_loss=0.3692, over 4079096.07 frames. ], batch size: 48, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:50:18,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=697289.0, ans=0.5 2024-09-18 02:50:39,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697317.3333333334, ans=0.1 2024-09-18 02:51:14,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=697374.0, ans=0.035 2024-09-18 02:51:36,041 INFO [train.py:1198] (0/2) Epoch 39, batch 3300, loss[loss=0.2218, ctc_loss=0.1457, cr_loss=0.3803, over 20897.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3686, over 4087682.70 frames. ], batch size: 54, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:51:43,598 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.211e+02 2.364e+02 2.512e+02 3.508e+02, threshold=4.729e+02, percent-clipped=0.0 2024-09-18 02:52:23,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=697515.6666666666, ans=0.025 2024-09-18 02:52:32,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=697515.6666666666, ans=0.02 2024-09-18 02:52:49,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=12.0 2024-09-18 02:52:51,698 INFO [train.py:1198] (0/2) Epoch 39, batch 3350, loss[loss=0.1796, ctc_loss=0.1164, cr_loss=0.3159, over 20977.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1448, cr_loss=0.3689, over 4087263.85 frames. ], batch size: 52, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:53:21,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=697600.6666666666, ans=0.125 2024-09-18 02:53:52,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=697657.3333333334, ans=15.0 2024-09-18 02:54:10,161 INFO [train.py:1198] (0/2) Epoch 39, batch 3400, loss[loss=0.2755, ctc_loss=0.1878, cr_loss=0.4386, over 20698.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1464, cr_loss=0.3712, over 4077657.58 frames. ], batch size: 68, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:54:17,719 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.216e+02 2.332e+02 2.482e+02 7.936e+02, threshold=4.663e+02, percent-clipped=1.0 2024-09-18 02:54:39,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697770.6666666666, ans=0.1 2024-09-18 02:55:00,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=697799.0, ans=0.125 2024-09-18 02:55:26,229 INFO [train.py:1198] (0/2) Epoch 39, batch 3450, loss[loss=0.1902, ctc_loss=0.1218, cr_loss=0.3422, over 20983.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3698, over 4099132.61 frames. ], batch size: 52, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:55:43,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=697884.0, ans=0.125 2024-09-18 02:55:47,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=697884.0, ans=0.2 2024-09-18 02:56:01,373 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:56:14,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=697940.6666666666, ans=0.125 2024-09-18 02:56:16,752 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=22.5 2024-09-18 02:56:27,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=697969.0, ans=0.125 2024-09-18 02:56:44,833 INFO [train.py:1198] (0/2) Epoch 39, batch 3500, loss[loss=0.2339, ctc_loss=0.1542, cr_loss=0.3988, over 20974.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.371, over 4100868.11 frames. ], batch size: 58, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:56:46,958 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2024-09-18 02:56:52,171 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.181e+02 2.338e+02 2.494e+02 3.391e+02, threshold=4.676e+02, percent-clipped=0.0 2024-09-18 02:56:54,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=22.5 2024-09-18 02:57:26,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2024-09-18 02:57:28,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=698082.3333333334, ans=0.2 2024-09-18 02:57:33,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=698082.3333333334, ans=0.125 2024-09-18 02:58:00,200 INFO [train.py:1198] (0/2) Epoch 39, batch 3550, loss[loss=0.1834, ctc_loss=0.1178, cr_loss=0.3283, over 19857.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.3712, over 4099496.01 frames. ], batch size: 44, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:58:23,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=698167.3333333334, ans=0.04949747468305833 2024-09-18 02:58:28,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=698167.3333333334, ans=0.025 2024-09-18 02:59:03,099 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=22.5 2024-09-18 02:59:19,567 INFO [train.py:1198] (0/2) Epoch 39, batch 3600, loss[loss=0.177, ctc_loss=0.116, cr_loss=0.3048, over 21052.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.3709, over 4096529.21 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 02:59:27,284 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.233e+02 2.349e+02 2.443e+02 2.960e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 02:59:36,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=698309.0, ans=0.2 2024-09-18 02:59:45,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=698309.0, ans=0.125 2024-09-18 03:00:14,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=698365.6666666666, ans=0.025 2024-09-18 03:00:17,900 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=22.5 2024-09-18 03:00:35,297 INFO [train.py:1198] (0/2) Epoch 39, batch 3650, loss[loss=0.2443, ctc_loss=0.1629, cr_loss=0.407, over 20307.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3712, over 4098169.95 frames. ], batch size: 74, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:01:00,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=12.0 2024-09-18 03:01:46,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-09-18 03:01:50,957 INFO [train.py:1198] (0/2) Epoch 39, batch 3700, loss[loss=0.2237, ctc_loss=0.1483, cr_loss=0.377, over 20672.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1461, cr_loss=0.3718, over 4102818.65 frames. ], batch size: 66, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:01:58,424 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.234e+02 2.357e+02 2.503e+02 2.991e+02, threshold=4.714e+02, percent-clipped=0.0 2024-09-18 03:02:00,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.23 vs. limit=6.0 2024-09-18 03:02:01,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=698564.0, ans=0.0 2024-09-18 03:02:48,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.61 vs. limit=12.0 2024-09-18 03:02:56,621 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:03:02,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=698677.3333333334, ans=0.5 2024-09-18 03:03:08,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698705.6666666666, ans=0.1 2024-09-18 03:03:09,958 INFO [train.py:1198] (0/2) Epoch 39, batch 3750, loss[loss=0.196, ctc_loss=0.1282, cr_loss=0.3388, over 20970.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1458, cr_loss=0.3716, over 4110381.48 frames. ], batch size: 48, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:03:32,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698734.0, ans=0.1 2024-09-18 03:03:37,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-09-18 03:04:16,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698819.0, ans=0.1 2024-09-18 03:04:25,468 INFO [train.py:1198] (0/2) Epoch 39, batch 3800, loss[loss=0.225, ctc_loss=0.1513, cr_loss=0.3685, over 21030.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1465, cr_loss=0.3723, over 4100961.25 frames. ], batch size: 63, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:04:35,767 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.202e+02 2.364e+02 2.543e+02 6.705e+02, threshold=4.729e+02, percent-clipped=2.0 2024-09-18 03:04:45,444 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-09-18 03:05:20,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=698932.3333333334, ans=0.0 2024-09-18 03:05:26,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=698932.3333333334, ans=0.125 2024-09-18 03:05:44,287 INFO [train.py:1198] (0/2) Epoch 39, batch 3850, loss[loss=0.221, ctc_loss=0.1446, cr_loss=0.3817, over 20859.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1456, cr_loss=0.3703, over 4092359.03 frames. ], batch size: 65, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:05:46,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=698989.0, ans=0.0 2024-09-18 03:06:17,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=699045.6666666666, ans=0.025 2024-09-18 03:06:19,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=699045.6666666666, ans=0.125 2024-09-18 03:06:27,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=699045.6666666666, ans=0.0 2024-09-18 03:06:35,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2024-09-18 03:06:42,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=699074.0, ans=0.2 2024-09-18 03:06:45,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=699102.3333333334, ans=0.125 2024-09-18 03:06:59,787 INFO [train.py:1198] (0/2) Epoch 39, batch 3900, loss[loss=0.1689, ctc_loss=0.1078, cr_loss=0.3053, over 20945.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1464, cr_loss=0.3716, over 4086755.40 frames. ], batch size: 49, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:07:07,104 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.205e+02 2.341e+02 2.496e+02 6.315e+02, threshold=4.683e+02, percent-clipped=1.0 2024-09-18 03:07:39,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=699187.3333333334, ans=0.05 2024-09-18 03:07:46,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=699215.6666666666, ans=0.0 2024-09-18 03:07:54,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-09-18 03:08:03,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=699244.0, ans=0.125 2024-09-18 03:08:18,347 INFO [train.py:1198] (0/2) Epoch 39, batch 3950, loss[loss=0.2742, ctc_loss=0.1881, cr_loss=0.4309, over 18266.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.3721, over 4086779.40 frames. ], batch size: 108, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:08:48,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-18 03:09:01,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=22.5 2024-09-18 03:09:05,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=699357.3333333334, ans=0.2 2024-09-18 03:09:30,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2024-09-18 03:09:34,442 INFO [train.py:1198] (0/2) Epoch 39, batch 4000, loss[loss=0.2228, ctc_loss=0.1484, cr_loss=0.372, over 21010.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1464, cr_loss=0.3721, over 4094439.08 frames. ], batch size: 63, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:09:42,130 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.248e+02 2.374e+02 2.559e+02 5.193e+02, threshold=4.749e+02, percent-clipped=1.0 2024-09-18 03:10:29,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=699499.0, ans=0.125 2024-09-18 03:10:40,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=699527.3333333334, ans=0.125 2024-09-18 03:10:42,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=15.0 2024-09-18 03:10:53,661 INFO [train.py:1198] (0/2) Epoch 39, batch 4050, loss[loss=0.2476, ctc_loss=0.1629, cr_loss=0.4235, over 20627.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1468, cr_loss=0.3733, over 4094263.61 frames. ], batch size: 66, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:12:08,935 INFO [train.py:1198] (0/2) Epoch 39, batch 4100, loss[loss=0.228, ctc_loss=0.152, cr_loss=0.3796, over 20819.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1458, cr_loss=0.3714, over 4105908.53 frames. ], batch size: 65, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:12:16,421 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.262e+02 2.394e+02 2.590e+02 4.150e+02, threshold=4.789e+02, percent-clipped=0.0 2024-09-18 03:12:44,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=699754.0, ans=0.025 2024-09-18 03:12:57,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=699782.3333333334, ans=0.125 2024-09-18 03:12:57,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=699782.3333333334, ans=0.025 2024-09-18 03:12:59,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=699782.3333333334, ans=0.125 2024-09-18 03:13:04,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=699782.3333333334, ans=0.125 2024-09-18 03:13:07,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=699782.3333333334, ans=0.0 2024-09-18 03:13:07,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=22.5 2024-09-18 03:13:13,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2024-09-18 03:13:25,258 INFO [train.py:1198] (0/2) Epoch 39, batch 4150, loss[loss=0.1778, ctc_loss=0.1155, cr_loss=0.3115, over 20982.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1462, cr_loss=0.3727, over 4103458.30 frames. ], batch size: 48, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:13:35,142 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-09-18 03:14:44,751 INFO [train.py:1198] (0/2) Epoch 39, batch 4200, loss[loss=0.1951, ctc_loss=0.1255, cr_loss=0.3476, over 20800.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1463, cr_loss=0.3733, over 4106208.88 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:14:52,233 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.240e+02 2.333e+02 2.497e+02 6.454e+02, threshold=4.666e+02, percent-clipped=1.0 2024-09-18 03:15:14,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=700037.3333333334, ans=0.125 2024-09-18 03:15:27,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=700037.3333333334, ans=0.025 2024-09-18 03:15:42,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2024-09-18 03:15:48,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=22.5 2024-09-18 03:16:03,697 INFO [train.py:1198] (0/2) Epoch 39, batch 4250, loss[loss=0.2498, ctc_loss=0.1726, cr_loss=0.3859, over 19426.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1466, cr_loss=0.3735, over 4103519.15 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:16:28,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=700150.6666666666, ans=0.125 2024-09-18 03:16:28,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=700150.6666666666, ans=0.125 2024-09-18 03:16:49,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=700207.3333333334, ans=0.1 2024-09-18 03:16:52,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=700207.3333333334, ans=0.125 2024-09-18 03:17:01,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=700207.3333333334, ans=0.2 2024-09-18 03:17:19,406 INFO [train.py:1198] (0/2) Epoch 39, batch 4300, loss[loss=0.1963, ctc_loss=0.1303, cr_loss=0.3297, over 20787.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1459, cr_loss=0.3714, over 4094786.39 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:17:28,626 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.207e+02 2.362e+02 2.524e+02 3.240e+02, threshold=4.724e+02, percent-clipped=0.0 2024-09-18 03:17:53,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2024-09-18 03:18:02,086 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:18:17,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=700349.0, ans=0.125 2024-09-18 03:18:33,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=700405.6666666666, ans=0.125 2024-09-18 03:18:35,003 INFO [train.py:1198] (0/2) Epoch 39, batch 4350, loss[loss=0.2235, ctc_loss=0.1463, cr_loss=0.3857, over 19510.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1471, cr_loss=0.374, over 4097916.03 frames. ], batch size: 43, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:18:50,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=700434.0, ans=0.2 2024-09-18 03:18:54,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.74 vs. limit=10.0 2024-09-18 03:18:58,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=700434.0, ans=0.125 2024-09-18 03:18:58,767 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-18 03:19:10,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=700462.3333333334, ans=0.0 2024-09-18 03:19:53,959 INFO [train.py:1198] (0/2) Epoch 39, batch 4400, loss[loss=0.2196, ctc_loss=0.1458, cr_loss=0.3694, over 20796.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1464, cr_loss=0.3724, over 4100522.33 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:19:58,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=700547.3333333334, ans=0.1 2024-09-18 03:20:03,074 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.195e+02 2.398e+02 2.587e+02 4.163e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-18 03:20:06,707 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=22.5 2024-09-18 03:20:18,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=700575.6666666666, ans=0.125 2024-09-18 03:20:38,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=12.0 2024-09-18 03:20:50,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=700632.3333333334, ans=0.125 2024-09-18 03:20:58,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=700660.6666666666, ans=0.0 2024-09-18 03:21:10,114 INFO [train.py:1198] (0/2) Epoch 39, batch 4450, loss[loss=0.2121, ctc_loss=0.1404, cr_loss=0.3586, over 21003.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.146, cr_loss=0.3719, over 4094557.37 frames. ], batch size: 63, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:21:28,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=700717.3333333334, ans=0.0 2024-09-18 03:21:30,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=700717.3333333334, ans=0.125 2024-09-18 03:22:01,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=22.5 2024-09-18 03:22:06,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=700774.0, ans=0.2 2024-09-18 03:22:23,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=700802.3333333334, ans=0.125 2024-09-18 03:22:29,024 INFO [train.py:1198] (0/2) Epoch 39, batch 4500, loss[loss=0.1996, ctc_loss=0.1318, cr_loss=0.339, over 20779.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1465, cr_loss=0.3725, over 4090644.71 frames. ], batch size: 53, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:22:38,144 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.206e+02 2.349e+02 2.484e+02 3.264e+02, threshold=4.698e+02, percent-clipped=0.0 2024-09-18 03:22:42,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2024-09-18 03:23:38,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-18 03:23:42,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=700944.0, ans=0.0 2024-09-18 03:23:45,221 INFO [train.py:1198] (0/2) Epoch 39, batch 4550, loss[loss=0.2254, ctc_loss=0.1483, cr_loss=0.3856, over 20831.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1462, cr_loss=0.3718, over 4085946.52 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:24:02,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=701000.6666666666, ans=0.2 2024-09-18 03:24:26,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=701029.0, ans=0.0 2024-09-18 03:24:37,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2024-09-18 03:24:52,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=701085.6666666666, ans=0.0 2024-09-18 03:25:03,851 INFO [train.py:1198] (0/2) Epoch 39, batch 4600, loss[loss=0.2473, ctc_loss=0.1671, cr_loss=0.401, over 18421.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1465, cr_loss=0.3721, over 4075550.63 frames. ], batch size: 108, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:25:07,881 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.16 vs. limit=15.0 2024-09-18 03:25:12,744 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.230e+02 2.365e+02 2.513e+02 5.047e+02, threshold=4.730e+02, percent-clipped=1.0 2024-09-18 03:25:42,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=701170.6666666666, ans=0.95 2024-09-18 03:25:44,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-09-18 03:26:02,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=701199.0, ans=0.05 2024-09-18 03:26:07,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-09-18 03:26:20,506 INFO [train.py:1198] (0/2) Epoch 39, batch 4650, loss[loss=0.2115, ctc_loss=0.1379, cr_loss=0.3683, over 20767.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1463, cr_loss=0.3715, over 4084156.16 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:26:29,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=701255.6666666666, ans=0.125 2024-09-18 03:27:07,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=701340.6666666666, ans=0.0 2024-09-18 03:27:39,056 INFO [train.py:1198] (0/2) Epoch 39, batch 4700, loss[loss=0.2217, ctc_loss=0.1477, cr_loss=0.3702, over 21044.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1466, cr_loss=0.3721, over 4084794.53 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:27:48,322 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.206e+02 2.297e+02 2.510e+02 5.197e+02, threshold=4.593e+02, percent-clipped=1.0 2024-09-18 03:28:08,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=701454.0, ans=0.125 2024-09-18 03:28:27,435 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2024-09-18 03:28:43,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.75 vs. limit=22.5 2024-09-18 03:28:46,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=701510.6666666666, ans=0.2 2024-09-18 03:28:49,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=701510.6666666666, ans=0.1 2024-09-18 03:28:55,395 INFO [train.py:1198] (0/2) Epoch 39, batch 4750, loss[loss=0.217, ctc_loss=0.1435, cr_loss=0.3676, over 20870.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3709, over 4079352.31 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:28:58,036 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.87 vs. limit=10.0 2024-09-18 03:29:23,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=701567.3333333334, ans=0.125 2024-09-18 03:29:36,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=701595.6666666666, ans=0.125 2024-09-18 03:30:11,538 INFO [train.py:1198] (0/2) Epoch 39, batch 4800, loss[loss=0.2269, ctc_loss=0.1512, cr_loss=0.3786, over 21043.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1458, cr_loss=0.3709, over 4084560.51 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:30:20,558 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.185e+02 2.330e+02 2.510e+02 3.666e+02, threshold=4.660e+02, percent-clipped=0.0 2024-09-18 03:30:42,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=701709.0, ans=0.0 2024-09-18 03:30:42,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=701709.0, ans=0.0 2024-09-18 03:30:42,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2024-09-18 03:30:43,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=701737.3333333334, ans=0.125 2024-09-18 03:30:58,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=701765.6666666666, ans=0.1 2024-09-18 03:30:59,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2024-09-18 03:31:30,349 INFO [train.py:1198] (0/2) Epoch 39, batch 4850, loss[loss=0.2172, ctc_loss=0.1418, cr_loss=0.3772, over 21065.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1465, cr_loss=0.3718, over 4080184.34 frames. ], batch size: 56, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:31:45,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=701850.6666666666, ans=0.015 2024-09-18 03:31:49,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=701850.6666666666, ans=0.1 2024-09-18 03:32:14,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=701907.3333333334, ans=0.125 2024-09-18 03:32:40,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=701935.6666666666, ans=0.0 2024-09-18 03:32:46,083 INFO [train.py:1198] (0/2) Epoch 39, batch 4900, loss[loss=0.1777, ctc_loss=0.1161, cr_loss=0.3078, over 20247.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1459, cr_loss=0.3704, over 4072033.91 frames. ], batch size: 45, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:32:50,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=701964.0, ans=0.125 2024-09-18 03:32:55,011 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.209e+02 2.353e+02 2.553e+02 4.303e+02, threshold=4.706e+02, percent-clipped=0.0 2024-09-18 03:32:58,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=701964.0, ans=0.0 2024-09-18 03:33:02,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=701992.3333333334, ans=0.0 2024-09-18 03:33:05,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=701992.3333333334, ans=0.125 2024-09-18 03:33:55,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=702077.3333333334, ans=0.0 2024-09-18 03:34:03,783 INFO [train.py:1198] (0/2) Epoch 39, batch 4950, loss[loss=0.1902, ctc_loss=0.1233, cr_loss=0.3344, over 20838.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.3699, over 4074472.05 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:34:14,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=22.5 2024-09-18 03:34:16,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702105.6666666666, ans=0.1 2024-09-18 03:35:00,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=702190.6666666666, ans=0.2 2024-09-18 03:35:09,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=702219.0, ans=0.125 2024-09-18 03:35:17,877 INFO [train.py:1198] (0/2) Epoch 39, batch 5000, loss[loss=0.2932, ctc_loss=0.2077, cr_loss=0.4274, over 14421.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.3713, over 4066943.79 frames. ], batch size: 149, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:35:17,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=702247.3333333334, ans=0.015 2024-09-18 03:35:28,255 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.209e+02 2.333e+02 2.459e+02 3.408e+02, threshold=4.667e+02, percent-clipped=0.0 2024-09-18 03:35:34,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702275.6666666666, ans=0.1 2024-09-18 03:35:48,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=702304.0, ans=0.0 2024-09-18 03:36:04,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=702332.3333333334, ans=0.025 2024-09-18 03:36:09,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=702332.3333333334, ans=0.125 2024-09-18 03:36:13,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702332.3333333334, ans=0.1 2024-09-18 03:36:16,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=702360.6666666666, ans=0.0 2024-09-18 03:36:23,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-18 03:36:32,969 INFO [train.py:1198] (0/2) Epoch 39, batch 5050, loss[loss=0.1973, ctc_loss=0.1284, cr_loss=0.3447, over 20964.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3711, over 4080580.26 frames. ], batch size: 50, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:36:56,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=702417.3333333334, ans=0.125 2024-09-18 03:37:06,144 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=22.5 2024-09-18 03:37:10,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=12.0 2024-09-18 03:37:21,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=702474.0, ans=0.0 2024-09-18 03:37:39,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=702502.3333333334, ans=0.0 2024-09-18 03:37:40,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-18 03:37:46,919 INFO [train.py:1198] (0/2) Epoch 39, batch 5100, loss[loss=0.202, ctc_loss=0.1316, cr_loss=0.3519, over 20887.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.146, cr_loss=0.3719, over 4077923.33 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:37:57,194 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.244e+02 2.345e+02 2.463e+02 3.614e+02, threshold=4.690e+02, percent-clipped=0.0 2024-09-18 03:38:30,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-18 03:38:59,959 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-248000.pt 2024-09-18 03:39:05,346 INFO [train.py:1198] (0/2) Epoch 39, batch 5150, loss[loss=0.2147, ctc_loss=0.1425, cr_loss=0.3607, over 20976.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1454, cr_loss=0.3704, over 4076433.04 frames. ], batch size: 52, lr: 2.11e-03, grad_scale: 16.0 2024-09-18 03:39:20,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=702700.6666666666, ans=0.125 2024-09-18 03:39:24,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-09-18 03:40:01,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=702757.3333333334, ans=0.125 2024-09-18 03:40:19,914 INFO [train.py:1198] (0/2) Epoch 39, batch 5200, loss[loss=0.2392, ctc_loss=0.1624, cr_loss=0.3836, over 19428.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3705, over 4077746.63 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:40:21,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=702814.0, ans=0.0 2024-09-18 03:40:24,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=702814.0, ans=0.125 2024-09-18 03:40:30,148 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.182e+02 2.339e+02 2.493e+02 7.446e+02, threshold=4.678e+02, percent-clipped=1.0 2024-09-18 03:40:50,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=702870.6666666666, ans=0.0 2024-09-18 03:41:00,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=702870.6666666666, ans=0.0 2024-09-18 03:41:14,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=22.5 2024-09-18 03:41:30,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=702927.3333333334, ans=0.0 2024-09-18 03:41:34,877 INFO [train.py:1198] (0/2) Epoch 39, batch 5250, loss[loss=0.2335, ctc_loss=0.1558, cr_loss=0.3886, over 21081.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1458, cr_loss=0.3718, over 4091210.11 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:41:37,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=22.5 2024-09-18 03:41:39,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702955.6666666666, ans=0.1 2024-09-18 03:41:48,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-18 03:42:09,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=703012.3333333334, ans=0.1 2024-09-18 03:42:49,577 INFO [train.py:1198] (0/2) Epoch 39, batch 5300, loss[loss=0.2172, ctc_loss=0.1446, cr_loss=0.3634, over 20838.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1467, cr_loss=0.3733, over 4092647.50 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:42:59,857 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.164e+02 2.284e+02 2.448e+02 4.598e+02, threshold=4.567e+02, percent-clipped=0.0 2024-09-18 03:43:25,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=703154.0, ans=0.125 2024-09-18 03:44:05,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=703239.0, ans=0.125 2024-09-18 03:44:07,168 INFO [train.py:1198] (0/2) Epoch 39, batch 5350, loss[loss=0.2019, ctc_loss=0.1313, cr_loss=0.3529, over 20972.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1469, cr_loss=0.3731, over 4085891.95 frames. ], batch size: 51, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:44:14,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=703239.0, ans=0.125 2024-09-18 03:44:14,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=703239.0, ans=0.125 2024-09-18 03:44:36,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=703295.6666666666, ans=0.0 2024-09-18 03:44:44,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=703295.6666666666, ans=0.125 2024-09-18 03:45:09,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=15.0 2024-09-18 03:45:09,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=703352.3333333334, ans=0.125 2024-09-18 03:45:10,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=15.0 2024-09-18 03:45:21,228 INFO [train.py:1198] (0/2) Epoch 39, batch 5400, loss[loss=0.222, ctc_loss=0.1496, cr_loss=0.3618, over 21063.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1468, cr_loss=0.3731, over 4099196.64 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 32.0 2024-09-18 03:45:23,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2024-09-18 03:45:25,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=703380.6666666666, ans=0.0 2024-09-18 03:45:31,323 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.226e+02 2.356e+02 2.492e+02 2.774e+02, threshold=4.711e+02, percent-clipped=0.0 2024-09-18 03:45:44,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=703409.0, ans=0.125 2024-09-18 03:46:34,983 INFO [train.py:1198] (0/2) Epoch 39, batch 5450, loss[loss=0.2237, ctc_loss=0.1446, cr_loss=0.3955, over 20961.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1465, cr_loss=0.3732, over 4100434.31 frames. ], batch size: 58, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:46:39,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=703522.3333333334, ans=0.125 2024-09-18 03:46:49,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-09-18 03:46:52,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=703550.6666666666, ans=0.0 2024-09-18 03:47:40,800 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:47:49,287 INFO [train.py:1198] (0/2) Epoch 39, batch 5500, loss[loss=0.1986, ctc_loss=0.1305, cr_loss=0.3407, over 20879.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1469, cr_loss=0.3732, over 4092450.64 frames. ], batch size: 57, lr: 2.10e-03, grad_scale: 16.0 2024-09-18 03:48:01,463 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.242e+02 2.361e+02 2.514e+02 4.210e+02, threshold=4.721e+02, percent-clipped=0.0 2024-09-18 03:48:32,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703720.6666666666, ans=0.125 2024-09-18 03:49:05,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=703805.6666666666, ans=0.025 2024-09-18 03:49:06,397 INFO [train.py:1198] (0/2) Epoch 39, batch 5550, loss[loss=0.269, ctc_loss=0.1834, cr_loss=0.428, over 18445.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1467, cr_loss=0.3731, over 4091048.92 frames. ], batch size: 108, lr: 2.10e-03, grad_scale: 16.0 2024-09-18 03:49:19,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=703805.6666666666, ans=0.2 2024-09-18 03:49:25,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=15.0 2024-09-18 03:49:48,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703862.3333333334, ans=0.125 2024-09-18 03:50:21,065 INFO [train.py:1198] (0/2) Epoch 39, batch 5600, loss[loss=0.227, ctc_loss=0.1497, cr_loss=0.3861, over 20796.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1457, cr_loss=0.3717, over 4100636.74 frames. ], batch size: 53, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:50:32,859 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.226e+02 2.335e+02 2.461e+02 3.574e+02, threshold=4.670e+02, percent-clipped=0.0 2024-09-18 03:50:37,794 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:50:45,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=703975.6666666666, ans=0.2 2024-09-18 03:50:51,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=704004.0, ans=0.0 2024-09-18 03:51:14,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=704032.3333333334, ans=0.0 2024-09-18 03:51:29,972 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:51:35,371 INFO [train.py:1198] (0/2) Epoch 39, batch 5650, loss[loss=0.1761, ctc_loss=0.1152, cr_loss=0.3045, over 20058.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1463, cr_loss=0.3727, over 4099428.86 frames. ], batch size: 44, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:51:38,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=704089.0, ans=0.2 2024-09-18 03:51:52,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=704117.3333333334, ans=0.2 2024-09-18 03:52:30,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=704174.0, ans=0.125 2024-09-18 03:52:52,474 INFO [train.py:1198] (0/2) Epoch 39, batch 5700, loss[loss=0.2159, ctc_loss=0.1434, cr_loss=0.3622, over 21002.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1459, cr_loss=0.3714, over 4096279.02 frames. ], batch size: 61, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:52:52,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=704230.6666666666, ans=0.0 2024-09-18 03:53:04,291 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.246e+02 2.344e+02 2.590e+02 3.010e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-18 03:53:14,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=704259.0, ans=0.2 2024-09-18 03:53:35,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=704315.6666666666, ans=0.125 2024-09-18 03:54:05,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=704372.3333333334, ans=0.125 2024-09-18 03:54:06,447 INFO [train.py:1198] (0/2) Epoch 39, batch 5750, loss[loss=0.2375, ctc_loss=0.1628, cr_loss=0.3736, over 20671.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.147, cr_loss=0.3728, over 4082576.95 frames. ], batch size: 68, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:54:11,806 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=12.0 2024-09-18 03:54:14,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704372.3333333334, ans=0.1 2024-09-18 03:54:21,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=704400.6666666666, ans=0.125 2024-09-18 03:54:23,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-18 03:54:45,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=704429.0, ans=0.125 2024-09-18 03:54:58,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=22.5 2024-09-18 03:55:01,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.64 vs. limit=22.5 2024-09-18 03:55:04,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.77 vs. limit=10.0 2024-09-18 03:55:12,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704485.6666666666, ans=0.1 2024-09-18 03:55:21,044 INFO [train.py:1198] (0/2) Epoch 39, batch 5800, loss[loss=0.2161, ctc_loss=0.1432, cr_loss=0.3641, over 19529.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1474, cr_loss=0.3735, over 4076145.85 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:55:32,899 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.213e+02 2.387e+02 2.552e+02 3.474e+02, threshold=4.774e+02, percent-clipped=0.0 2024-09-18 03:56:03,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=12.0 2024-09-18 03:56:24,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=704627.3333333334, ans=0.125 2024-09-18 03:56:35,804 INFO [train.py:1198] (0/2) Epoch 39, batch 5850, loss[loss=0.195, ctc_loss=0.1266, cr_loss=0.3423, over 20964.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1472, cr_loss=0.373, over 4085027.17 frames. ], batch size: 51, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:56:47,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=704655.6666666666, ans=0.0 2024-09-18 03:56:48,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704655.6666666666, ans=0.1 2024-09-18 03:57:12,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=704712.3333333334, ans=0.025 2024-09-18 03:57:26,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=704740.6666666666, ans=0.125 2024-09-18 03:57:34,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=704740.6666666666, ans=0.125 2024-09-18 03:57:41,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=704769.0, ans=0.025 2024-09-18 03:57:45,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=704769.0, ans=0.125 2024-09-18 03:57:52,784 INFO [train.py:1198] (0/2) Epoch 39, batch 5900, loss[loss=0.2222, ctc_loss=0.1457, cr_loss=0.3825, over 21015.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1467, cr_loss=0.3714, over 4086460.98 frames. ], batch size: 61, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:57:54,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=22.5 2024-09-18 03:58:04,552 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.199e+02 2.339e+02 2.534e+02 6.800e+02, threshold=4.678e+02, percent-clipped=1.0 2024-09-18 03:58:10,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=704825.6666666666, ans=0.0 2024-09-18 03:58:13,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704825.6666666666, ans=0.1 2024-09-18 03:58:21,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-09-18 03:58:48,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=704882.3333333334, ans=0.125 2024-09-18 03:59:01,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=704910.6666666666, ans=0.0 2024-09-18 03:59:07,443 INFO [train.py:1198] (0/2) Epoch 39, batch 5950, loss[loss=0.2404, ctc_loss=0.158, cr_loss=0.4121, over 20972.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1474, cr_loss=0.3725, over 4080301.42 frames. ], batch size: 58, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 03:59:55,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=705024.0, ans=0.0 2024-09-18 03:59:56,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-09-18 04:00:06,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=705052.3333333334, ans=0.2 2024-09-18 04:00:22,327 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=22.5 2024-09-18 04:00:24,412 INFO [train.py:1198] (0/2) Epoch 39, batch 6000, loss[loss=0.2035, ctc_loss=0.1328, cr_loss=0.3534, over 20963.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1481, cr_loss=0.3739, over 4081481.86 frames. ], batch size: 49, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:00:24,413 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 04:00:46,787 INFO [train.py:1230] (0/2) Epoch 39, validation: loss=0.03967, ctc_loss=0.03967, cr_loss=1.424e-14, over 944034.00 frames. 2024-09-18 04:00:46,787 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 04:00:58,688 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.194e+02 2.300e+02 2.472e+02 3.542e+02, threshold=4.601e+02, percent-clipped=0.0 2024-09-18 04:02:00,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=705222.3333333334, ans=0.125 2024-09-18 04:02:02,073 INFO [train.py:1198] (0/2) Epoch 39, batch 6050, loss[loss=0.2264, ctc_loss=0.1496, cr_loss=0.3836, over 20829.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1475, cr_loss=0.3728, over 4079129.52 frames. ], batch size: 65, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:02:23,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=705250.6666666666, ans=0.125 2024-09-18 04:02:51,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=705307.3333333334, ans=0.0 2024-09-18 04:03:16,773 INFO [train.py:1198] (0/2) Epoch 39, batch 6100, loss[loss=0.2151, ctc_loss=0.1413, cr_loss=0.3689, over 20986.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1465, cr_loss=0.3707, over 4080453.15 frames. ], batch size: 55, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:03:18,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=705364.0, ans=15.0 2024-09-18 04:03:28,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.204e+02 2.305e+02 2.440e+02 4.328e+02, threshold=4.610e+02, percent-clipped=0.0 2024-09-18 04:03:51,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-09-18 04:04:04,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=705449.0, ans=0.2 2024-09-18 04:04:23,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=705477.3333333334, ans=0.0 2024-09-18 04:04:30,160 INFO [train.py:1198] (0/2) Epoch 39, batch 6150, loss[loss=0.2312, ctc_loss=0.1528, cr_loss=0.3921, over 21037.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1459, cr_loss=0.3695, over 4076687.64 frames. ], batch size: 63, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:04:31,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705505.6666666666, ans=0.1 2024-09-18 04:05:01,729 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2024-09-18 04:05:04,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=705562.3333333334, ans=0.0 2024-09-18 04:05:17,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=705590.6666666666, ans=0.1 2024-09-18 04:05:21,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.33 vs. limit=15.0 2024-09-18 04:05:22,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=705590.6666666666, ans=0.125 2024-09-18 04:05:31,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-18 04:05:44,844 INFO [train.py:1198] (0/2) Epoch 39, batch 6200, loss[loss=0.1864, ctc_loss=0.1208, cr_loss=0.328, over 19793.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1461, cr_loss=0.3701, over 4055350.47 frames. ], batch size: 44, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:05:47,144 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.08 vs. limit=10.0 2024-09-18 04:05:56,387 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.246e+02 2.417e+02 2.528e+02 7.333e+02, threshold=4.835e+02, percent-clipped=2.0 2024-09-18 04:06:36,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=705732.3333333334, ans=0.125 2024-09-18 04:06:36,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=705732.3333333334, ans=0.0 2024-09-18 04:06:54,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705760.6666666666, ans=0.1 2024-09-18 04:06:58,458 INFO [train.py:1198] (0/2) Epoch 39, batch 6250, loss[loss=0.1813, ctc_loss=0.1172, cr_loss=0.3203, over 20938.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1457, cr_loss=0.3694, over 4038486.51 frames. ], batch size: 48, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:06:59,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-18 04:07:44,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=705874.0, ans=0.015 2024-09-18 04:07:44,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=705874.0, ans=0.125 2024-09-18 04:08:04,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=705902.3333333334, ans=0.125 2024-09-18 04:08:10,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=705902.3333333334, ans=6.0 2024-09-18 04:08:12,576 INFO [train.py:1198] (0/2) Epoch 39, batch 6300, loss[loss=0.1911, ctc_loss=0.1234, cr_loss=0.3389, over 20990.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1439, cr_loss=0.3661, over 4027394.52 frames. ], batch size: 52, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:08:18,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=705930.6666666666, ans=0.125 2024-09-18 04:08:24,058 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.183e+02 2.396e+02 2.589e+02 3.527e+02, threshold=4.792e+02, percent-clipped=0.0 2024-09-18 04:08:37,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.66 vs. limit=22.5 2024-09-18 04:09:08,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=706015.6666666666, ans=0.5 2024-09-18 04:09:08,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2024-09-18 04:09:14,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=706044.0, ans=0.125 2024-09-18 04:09:24,981 INFO [train.py:1198] (0/2) Epoch 39, batch 6350, loss[loss=0.2525, ctc_loss=0.1741, cr_loss=0.3924, over 14313.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1471, cr_loss=0.3673, over 3852599.39 frames. ], batch size: 149, lr: 2.10e-03, grad_scale: 32.0 2024-09-18 04:09:31,021 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:09:40,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=706100.6666666666, ans=0.125 2024-09-18 04:09:59,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=706129.0, ans=0.125 2024-09-18 04:10:23,938 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-39.pt 2024-09-18 04:11:13,987 INFO [train.py:1198] (0/2) Epoch 40, batch 0, loss[loss=0.2467, ctc_loss=0.1633, cr_loss=0.4166, over 20654.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1633, cr_loss=0.4166, over 20654.00 frames. ], batch size: 68, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:11:13,988 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 04:11:32,436 INFO [train.py:1230] (0/2) Epoch 40, validation: loss=0.03941, ctc_loss=0.03941, cr_loss=1.447e-14, over 944034.00 frames. 2024-09-18 04:11:32,436 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 04:11:58,035 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.572e+02 2.820e+02 3.038e+02 4.034e+02, threshold=5.640e+02, percent-clipped=0.0 2024-09-18 04:12:04,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=706245.1666666666, ans=0.0 2024-09-18 04:12:19,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=706273.5, ans=0.125 2024-09-18 04:12:31,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=706301.8333333334, ans=0.125 2024-09-18 04:12:34,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=706301.8333333334, ans=0.0 2024-09-18 04:12:48,444 INFO [train.py:1198] (0/2) Epoch 40, batch 50, loss[loss=0.2097, ctc_loss=0.1377, cr_loss=0.36, over 20875.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.145, cr_loss=0.3723, over 930563.07 frames. ], batch size: 54, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:12:56,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=706330.1666666666, ans=0.125 2024-09-18 04:13:05,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=706358.5, ans=0.125 2024-09-18 04:13:25,102 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=15.0 2024-09-18 04:13:29,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=706386.8333333334, ans=0.0 2024-09-18 04:14:03,779 INFO [train.py:1198] (0/2) Epoch 40, batch 100, loss[loss=0.1893, ctc_loss=0.1238, cr_loss=0.3277, over 20952.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3683, over 1637478.89 frames. ], batch size: 51, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:14:17,928 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.77 vs. limit=22.5 2024-09-18 04:14:29,376 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.168e+02 2.313e+02 2.529e+02 4.230e+02, threshold=4.625e+02, percent-clipped=0.0 2024-09-18 04:14:43,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=706528.5, ans=0.125 2024-09-18 04:15:06,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706585.1666666666, ans=0.1 2024-09-18 04:15:14,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=706585.1666666666, ans=0.025 2024-09-18 04:15:17,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=706613.5, ans=0.0 2024-09-18 04:15:19,129 INFO [train.py:1198] (0/2) Epoch 40, batch 150, loss[loss=0.2571, ctc_loss=0.1734, cr_loss=0.4184, over 20665.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1458, cr_loss=0.3722, over 2181405.35 frames. ], batch size: 68, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:15:19,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=706613.5, ans=0.125 2024-09-18 04:15:27,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706613.5, ans=0.1 2024-09-18 04:15:33,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=706641.8333333334, ans=0.025 2024-09-18 04:16:03,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=706698.5, ans=0.2 2024-09-18 04:16:09,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=706698.5, ans=0.0 2024-09-18 04:16:17,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=706698.5, ans=0.0 2024-09-18 04:16:18,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-18 04:16:35,073 INFO [train.py:1198] (0/2) Epoch 40, batch 200, loss[loss=0.2256, ctc_loss=0.1519, cr_loss=0.3687, over 20371.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1454, cr_loss=0.3714, over 2603866.08 frames. ], batch size: 74, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:16:41,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=706755.1666666666, ans=0.125 2024-09-18 04:17:06,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706783.5, ans=0.1 2024-09-18 04:17:07,689 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.212e+02 2.348e+02 2.520e+02 6.792e+02, threshold=4.697e+02, percent-clipped=1.0 2024-09-18 04:17:11,566 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2024-09-18 04:17:22,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=706811.8333333334, ans=0.125 2024-09-18 04:17:36,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-09-18 04:17:51,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=706868.5, ans=0.125 2024-09-18 04:17:57,079 INFO [train.py:1198] (0/2) Epoch 40, batch 250, loss[loss=0.2345, ctc_loss=0.1563, cr_loss=0.3911, over 20932.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1441, cr_loss=0.3684, over 2935745.98 frames. ], batch size: 60, lr: 2.07e-03, grad_scale: 16.0 2024-09-18 04:17:59,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.10 vs. limit=10.0 2024-09-18 04:18:19,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706925.1666666666, ans=0.1 2024-09-18 04:18:21,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=706925.1666666666, ans=0.0 2024-09-18 04:18:41,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-18 04:18:48,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=706981.8333333334, ans=0.2 2024-09-18 04:19:09,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=707010.1666666666, ans=0.015 2024-09-18 04:19:12,700 INFO [train.py:1198] (0/2) Epoch 40, batch 300, loss[loss=0.1957, ctc_loss=0.1253, cr_loss=0.3521, over 20973.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1451, cr_loss=0.3698, over 3181488.30 frames. ], batch size: 55, lr: 2.07e-03, grad_scale: 16.0 2024-09-18 04:19:15,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=707038.5, ans=0.125 2024-09-18 04:19:23,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=707038.5, ans=0.2 2024-09-18 04:19:24,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=707038.5, ans=0.2 2024-09-18 04:19:39,693 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.195e+02 2.334e+02 2.503e+02 3.189e+02, threshold=4.667e+02, percent-clipped=0.0 2024-09-18 04:20:27,797 INFO [train.py:1198] (0/2) Epoch 40, batch 350, loss[loss=0.2058, ctc_loss=0.1331, cr_loss=0.3635, over 20984.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1453, cr_loss=0.3709, over 3383533.23 frames. ], batch size: 52, lr: 2.07e-03, grad_scale: 16.0 2024-09-18 04:20:28,701 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-09-18 04:20:59,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=707236.8333333334, ans=0.09899494936611666 2024-09-18 04:21:42,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=707321.8333333334, ans=0.0 2024-09-18 04:21:43,720 INFO [train.py:1198] (0/2) Epoch 40, batch 400, loss[loss=0.2302, ctc_loss=0.1496, cr_loss=0.403, over 21030.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1457, cr_loss=0.3709, over 3542832.66 frames. ], batch size: 62, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:22:02,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=707350.1666666666, ans=0.125 2024-09-18 04:22:11,169 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.212e+02 2.334e+02 2.468e+02 5.090e+02, threshold=4.668e+02, percent-clipped=1.0 2024-09-18 04:22:11,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=707350.1666666666, ans=0.0 2024-09-18 04:22:13,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=707378.5, ans=0.0 2024-09-18 04:22:17,630 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:23:05,536 INFO [train.py:1198] (0/2) Epoch 40, batch 450, loss[loss=0.2169, ctc_loss=0.143, cr_loss=0.3694, over 20665.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1463, cr_loss=0.3724, over 3667927.93 frames. ], batch size: 66, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:23:08,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=707463.5, ans=0.125 2024-09-18 04:23:40,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=707520.1666666666, ans=0.125 2024-09-18 04:23:44,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=707520.1666666666, ans=10.0 2024-09-18 04:24:10,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=707576.8333333334, ans=0.0 2024-09-18 04:24:20,694 INFO [train.py:1198] (0/2) Epoch 40, batch 500, loss[loss=0.2191, ctc_loss=0.1485, cr_loss=0.3533, over 21031.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1465, cr_loss=0.3723, over 3757670.44 frames. ], batch size: 62, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:24:43,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=707633.5, ans=0.125 2024-09-18 04:24:46,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=707633.5, ans=0.0 2024-09-18 04:24:47,898 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.249e+02 2.379e+02 2.495e+02 3.005e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 04:24:57,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=707661.8333333334, ans=0.125 2024-09-18 04:25:10,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=707690.1666666666, ans=0.125 2024-09-18 04:25:15,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=707690.1666666666, ans=0.2 2024-09-18 04:25:24,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=707718.5, ans=0.0 2024-09-18 04:25:35,980 INFO [train.py:1198] (0/2) Epoch 40, batch 550, loss[loss=0.2051, ctc_loss=0.1342, cr_loss=0.3541, over 21085.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3708, over 3825139.43 frames. ], batch size: 53, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:25:36,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=707746.8333333334, ans=12.0 2024-09-18 04:26:26,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=707831.8333333334, ans=0.125 2024-09-18 04:26:39,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=707860.1666666666, ans=0.125 2024-09-18 04:26:42,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=707860.1666666666, ans=22.5 2024-09-18 04:26:51,346 INFO [train.py:1198] (0/2) Epoch 40, batch 600, loss[loss=0.2379, ctc_loss=0.1618, cr_loss=0.3804, over 18359.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3705, over 3896482.27 frames. ], batch size: 108, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:27:17,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=707916.8333333334, ans=0.125 2024-09-18 04:27:18,455 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.210e+02 2.343e+02 2.499e+02 4.528e+02, threshold=4.686e+02, percent-clipped=0.0 2024-09-18 04:28:03,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-18 04:28:04,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.04 vs. limit=10.0 2024-09-18 04:28:09,069 INFO [train.py:1198] (0/2) Epoch 40, batch 650, loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3669, over 20768.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3704, over 3951197.01 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:28:15,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=708030.1666666666, ans=0.2 2024-09-18 04:28:19,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=708030.1666666666, ans=0.125 2024-09-18 04:28:57,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=708115.1666666666, ans=0.125 2024-09-18 04:29:09,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=708115.1666666666, ans=0.0 2024-09-18 04:29:27,615 INFO [train.py:1198] (0/2) Epoch 40, batch 700, loss[loss=0.2354, ctc_loss=0.1535, cr_loss=0.4095, over 18339.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1444, cr_loss=0.3693, over 3989823.34 frames. ], batch size: 108, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:29:55,119 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.228e+02 2.397e+02 2.537e+02 4.140e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 04:29:56,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=708228.5, ans=0.125 2024-09-18 04:30:17,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=708256.8333333334, ans=0.0 2024-09-18 04:30:18,461 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2024-09-18 04:30:33,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=15.0 2024-09-18 04:30:39,173 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=12.0 2024-09-18 04:30:43,123 INFO [train.py:1198] (0/2) Epoch 40, batch 750, loss[loss=0.2123, ctc_loss=0.1378, cr_loss=0.3723, over 20957.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1455, cr_loss=0.3712, over 4015395.44 frames. ], batch size: 50, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:30:43,425 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:31:00,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=22.5 2024-09-18 04:31:11,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=708341.8333333334, ans=0.0 2024-09-18 04:31:46,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-18 04:31:59,325 INFO [train.py:1198] (0/2) Epoch 40, batch 800, loss[loss=0.2355, ctc_loss=0.1587, cr_loss=0.3841, over 20063.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1459, cr_loss=0.372, over 4034159.83 frames. ], batch size: 80, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:32:13,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=708483.5, ans=10.0 2024-09-18 04:32:25,971 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.189e+02 2.320e+02 2.492e+02 3.200e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-18 04:32:29,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=708511.8333333334, ans=0.0 2024-09-18 04:33:13,981 INFO [train.py:1198] (0/2) Epoch 40, batch 850, loss[loss=0.1745, ctc_loss=0.1113, cr_loss=0.3156, over 20973.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1464, cr_loss=0.3722, over 4031774.90 frames. ], batch size: 48, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:33:24,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=708596.8333333334, ans=0.0 2024-09-18 04:33:26,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=708596.8333333334, ans=0.2 2024-09-18 04:33:54,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-18 04:34:00,393 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.65 vs. limit=10.0 2024-09-18 04:34:35,864 INFO [train.py:1198] (0/2) Epoch 40, batch 900, loss[loss=0.209, ctc_loss=0.138, cr_loss=0.3549, over 21029.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1457, cr_loss=0.3716, over 4054560.06 frames. ], batch size: 63, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:34:48,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=708738.5, ans=0.125 2024-09-18 04:34:50,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=708766.8333333334, ans=0.125 2024-09-18 04:34:51,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=708766.8333333334, ans=0.0 2024-09-18 04:34:59,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=708766.8333333334, ans=0.025 2024-09-18 04:35:03,386 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.244e+02 2.376e+02 2.600e+02 3.926e+02, threshold=4.752e+02, percent-clipped=0.0 2024-09-18 04:35:08,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=708795.1666666666, ans=0.125 2024-09-18 04:35:10,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=708795.1666666666, ans=0.1 2024-09-18 04:35:22,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=8.0 2024-09-18 04:35:33,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-09-18 04:35:51,992 INFO [train.py:1198] (0/2) Epoch 40, batch 950, loss[loss=0.2199, ctc_loss=0.1464, cr_loss=0.3675, over 21040.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1455, cr_loss=0.3706, over 4066214.25 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:36:08,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=708908.5, ans=0.0 2024-09-18 04:36:36,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=22.5 2024-09-18 04:36:49,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=708965.1666666666, ans=0.0 2024-09-18 04:36:55,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=708993.5, ans=0.125 2024-09-18 04:36:57,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708993.5, ans=0.1 2024-09-18 04:37:07,761 INFO [train.py:1198] (0/2) Epoch 40, batch 1000, loss[loss=0.2153, ctc_loss=0.1414, cr_loss=0.3691, over 20793.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3705, over 4065388.73 frames. ], batch size: 53, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:37:29,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=709050.1666666666, ans=0.125 2024-09-18 04:37:34,680 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.169e+02 2.364e+02 2.541e+02 3.218e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-18 04:38:14,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-18 04:38:23,230 INFO [train.py:1198] (0/2) Epoch 40, batch 1050, loss[loss=0.1963, ctc_loss=0.1284, cr_loss=0.34, over 20980.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1453, cr_loss=0.3708, over 4075554.63 frames. ], batch size: 51, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:38:37,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=709191.8333333334, ans=0.0 2024-09-18 04:38:40,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=709191.8333333334, ans=0.125 2024-09-18 04:39:01,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=709220.1666666666, ans=0.125 2024-09-18 04:39:03,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=22.5 2024-09-18 04:39:10,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709248.5, ans=0.1 2024-09-18 04:39:38,835 INFO [train.py:1198] (0/2) Epoch 40, batch 1100, loss[loss=0.2031, ctc_loss=0.1306, cr_loss=0.3628, over 20962.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1446, cr_loss=0.3706, over 4094534.51 frames. ], batch size: 58, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:40:12,191 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.194e+02 2.309e+02 2.464e+02 3.691e+02, threshold=4.618e+02, percent-clipped=0.0 2024-09-18 04:40:23,513 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-09-18 04:40:34,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709390.1666666666, ans=0.1 2024-09-18 04:40:46,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=709418.5, ans=0.125 2024-09-18 04:41:00,939 INFO [train.py:1198] (0/2) Epoch 40, batch 1150, loss[loss=0.2092, ctc_loss=0.1366, cr_loss=0.3626, over 20769.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1438, cr_loss=0.3684, over 4090630.68 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:41:50,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-18 04:42:02,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=709560.1666666666, ans=0.2 2024-09-18 04:42:06,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-18 04:42:08,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=709560.1666666666, ans=0.125 2024-09-18 04:42:17,548 INFO [train.py:1198] (0/2) Epoch 40, batch 1200, loss[loss=0.2488, ctc_loss=0.1675, cr_loss=0.4066, over 20965.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3682, over 4090456.46 frames. ], batch size: 64, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:42:44,627 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.192e+02 2.327e+02 2.524e+02 2.926e+02, threshold=4.654e+02, percent-clipped=0.0 2024-09-18 04:42:45,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=709616.8333333334, ans=0.125 2024-09-18 04:43:03,002 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:43:06,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=709673.5, ans=0.0 2024-09-18 04:43:18,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=709701.8333333334, ans=0.125 2024-09-18 04:43:25,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=12.0 2024-09-18 04:43:31,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=709730.1666666666, ans=0.1 2024-09-18 04:43:33,023 INFO [train.py:1198] (0/2) Epoch 40, batch 1250, loss[loss=0.2195, ctc_loss=0.1468, cr_loss=0.3632, over 20838.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1444, cr_loss=0.3687, over 4081783.85 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:43:48,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2024-09-18 04:43:55,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=709758.5, ans=0.035 2024-09-18 04:44:04,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=709786.8333333334, ans=0.125 2024-09-18 04:44:06,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709786.8333333334, ans=0.1 2024-09-18 04:44:15,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709786.8333333334, ans=0.1 2024-09-18 04:44:36,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=709843.5, ans=0.2 2024-09-18 04:44:48,545 INFO [train.py:1198] (0/2) Epoch 40, batch 1300, loss[loss=0.2239, ctc_loss=0.1477, cr_loss=0.3807, over 21048.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1444, cr_loss=0.3684, over 4079837.31 frames. ], batch size: 63, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:44:51,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=709871.8333333334, ans=0.025 2024-09-18 04:44:54,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709871.8333333334, ans=0.1 2024-09-18 04:45:00,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=709871.8333333334, ans=0.025 2024-09-18 04:45:15,610 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.079e+02 2.223e+02 2.329e+02 2.563e+02 4.312e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-18 04:45:35,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-09-18 04:45:38,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2024-09-18 04:45:55,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=709985.1666666666, ans=0.0 2024-09-18 04:46:10,296 INFO [train.py:1198] (0/2) Epoch 40, batch 1350, loss[loss=0.2167, ctc_loss=0.1421, cr_loss=0.3726, over 20356.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1454, cr_loss=0.3698, over 4077856.82 frames. ], batch size: 74, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:46:48,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710070.1666666666, ans=0.1 2024-09-18 04:47:25,607 INFO [train.py:1198] (0/2) Epoch 40, batch 1400, loss[loss=0.2119, ctc_loss=0.137, cr_loss=0.3745, over 20799.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3706, over 4085273.60 frames. ], batch size: 53, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:47:52,761 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.249e+02 2.383e+02 2.555e+02 2.990e+02, threshold=4.765e+02, percent-clipped=0.0 2024-09-18 04:47:54,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2024-09-18 04:48:17,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=710240.1666666666, ans=0.0 2024-09-18 04:48:26,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=710268.5, ans=0.125 2024-09-18 04:48:35,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2024-09-18 04:48:41,413 INFO [train.py:1198] (0/2) Epoch 40, batch 1450, loss[loss=0.1779, ctc_loss=0.1144, cr_loss=0.3176, over 20772.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.146, cr_loss=0.371, over 4080885.27 frames. ], batch size: 53, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:48:46,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=710296.8333333334, ans=0.125 2024-09-18 04:49:11,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=710353.5, ans=0.125 2024-09-18 04:49:14,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=710353.5, ans=0.2 2024-09-18 04:49:17,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=710353.5, ans=0.025 2024-09-18 04:49:53,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710410.1666666666, ans=0.1 2024-09-18 04:49:56,509 INFO [train.py:1198] (0/2) Epoch 40, batch 1500, loss[loss=0.2358, ctc_loss=0.155, cr_loss=0.4036, over 21007.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1471, cr_loss=0.3725, over 4072609.95 frames. ], batch size: 63, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:49:56,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710438.5, ans=0.1 2024-09-18 04:50:05,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=710438.5, ans=0.0 2024-09-18 04:50:06,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-18 04:50:19,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=710466.8333333334, ans=0.125 2024-09-18 04:50:23,744 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.185e+02 2.334e+02 2.503e+02 4.735e+02, threshold=4.668e+02, percent-clipped=0.0 2024-09-18 04:50:43,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=710523.5, ans=0.125 2024-09-18 04:50:45,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=710523.5, ans=0.0 2024-09-18 04:51:17,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710580.1666666666, ans=0.1 2024-09-18 04:51:17,547 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-09-18 04:51:18,378 INFO [train.py:1198] (0/2) Epoch 40, batch 1550, loss[loss=0.2428, ctc_loss=0.1613, cr_loss=0.4077, over 20329.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1462, cr_loss=0.3712, over 4084049.85 frames. ], batch size: 74, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:51:33,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=710608.5, ans=0.2 2024-09-18 04:51:44,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710608.5, ans=0.1 2024-09-18 04:52:05,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=710665.1666666666, ans=0.125 2024-09-18 04:52:08,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710665.1666666666, ans=0.1 2024-09-18 04:52:25,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710693.5, ans=0.1 2024-09-18 04:52:25,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710693.5, ans=0.1 2024-09-18 04:52:27,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=710693.5, ans=0.0 2024-09-18 04:52:34,359 INFO [train.py:1198] (0/2) Epoch 40, batch 1600, loss[loss=0.2539, ctc_loss=0.1781, cr_loss=0.3787, over 14253.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1463, cr_loss=0.3713, over 4081259.78 frames. ], batch size: 150, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:52:54,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=710750.1666666666, ans=0.025 2024-09-18 04:53:00,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=710750.1666666666, ans=0.0 2024-09-18 04:53:01,563 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.237e+02 2.362e+02 2.527e+02 3.476e+02, threshold=4.725e+02, percent-clipped=0.0 2024-09-18 04:53:09,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=710778.5, ans=0.025 2024-09-18 04:53:24,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=710806.8333333334, ans=0.125 2024-09-18 04:53:33,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=710835.1666666666, ans=0.0 2024-09-18 04:53:50,197 INFO [train.py:1198] (0/2) Epoch 40, batch 1650, loss[loss=0.2278, ctc_loss=0.1516, cr_loss=0.3812, over 21026.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1454, cr_loss=0.3696, over 4082845.67 frames. ], batch size: 62, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:54:27,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2024-09-18 04:54:55,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=710976.8333333334, ans=0.2 2024-09-18 04:54:58,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=710976.8333333334, ans=0.035 2024-09-18 04:55:06,552 INFO [train.py:1198] (0/2) Epoch 40, batch 1700, loss[loss=0.204, ctc_loss=0.1326, cr_loss=0.3571, over 21050.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1452, cr_loss=0.369, over 4075386.47 frames. ], batch size: 56, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:55:30,675 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:55:33,246 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.182e+02 2.337e+02 2.536e+02 4.066e+02, threshold=4.674e+02, percent-clipped=0.0 2024-09-18 04:55:35,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=711061.8333333334, ans=0.2 2024-09-18 04:55:39,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2024-09-18 04:56:09,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=711118.5, ans=0.125 2024-09-18 04:56:14,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=711118.5, ans=0.125 2024-09-18 04:56:20,984 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-18 04:56:21,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-18 04:56:21,766 INFO [train.py:1198] (0/2) Epoch 40, batch 1750, loss[loss=0.2283, ctc_loss=0.1539, cr_loss=0.3721, over 20928.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1461, cr_loss=0.3708, over 4073677.63 frames. ], batch size: 60, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:56:22,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=711146.8333333334, ans=0.0 2024-09-18 04:56:26,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=711146.8333333334, ans=0.2 2024-09-18 04:56:27,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=22.5 2024-09-18 04:56:28,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=711146.8333333334, ans=0.2 2024-09-18 04:56:35,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=711146.8333333334, ans=0.0 2024-09-18 04:56:46,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=711175.1666666666, ans=0.125 2024-09-18 04:56:52,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=711175.1666666666, ans=10.0 2024-09-18 04:56:56,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=711203.5, ans=0.025 2024-09-18 04:56:59,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=711203.5, ans=0.125 2024-09-18 04:57:43,902 INFO [train.py:1198] (0/2) Epoch 40, batch 1800, loss[loss=0.2572, ctc_loss=0.1795, cr_loss=0.3882, over 14020.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1456, cr_loss=0.3704, over 4076164.87 frames. ], batch size: 149, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:57:54,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=711288.5, ans=0.125 2024-09-18 04:57:57,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=711316.8333333334, ans=0.0 2024-09-18 04:58:11,256 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.231e+02 2.387e+02 2.606e+02 3.849e+02, threshold=4.773e+02, percent-clipped=0.0 2024-09-18 04:58:24,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-18 04:58:48,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=711401.8333333334, ans=0.0 2024-09-18 04:58:59,971 INFO [train.py:1198] (0/2) Epoch 40, batch 1850, loss[loss=0.1745, ctc_loss=0.1158, cr_loss=0.2935, over 20957.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1445, cr_loss=0.3677, over 4082641.50 frames. ], batch size: 50, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 04:59:34,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=711486.8333333334, ans=0.125 2024-09-18 05:00:08,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-18 05:00:15,714 INFO [train.py:1198] (0/2) Epoch 40, batch 1900, loss[loss=0.2242, ctc_loss=0.1467, cr_loss=0.3873, over 20294.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1448, cr_loss=0.3686, over 4079218.04 frames. ], batch size: 74, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:00:23,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=711571.8333333334, ans=0.0 2024-09-18 05:00:25,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711571.8333333334, ans=0.1 2024-09-18 05:00:43,116 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.243e+02 2.329e+02 2.533e+02 7.885e+02, threshold=4.658e+02, percent-clipped=1.0 2024-09-18 05:00:49,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=711628.5, ans=0.125 2024-09-18 05:00:51,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=711628.5, ans=0.0 2024-09-18 05:00:52,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=711628.5, ans=0.125 2024-09-18 05:01:12,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=711656.8333333334, ans=0.2 2024-09-18 05:01:31,416 INFO [train.py:1198] (0/2) Epoch 40, batch 1950, loss[loss=0.2217, ctc_loss=0.1451, cr_loss=0.3829, over 20827.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1445, cr_loss=0.3682, over 4080014.04 frames. ], batch size: 59, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:01:33,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=711713.5, ans=0.04949747468305833 2024-09-18 05:02:29,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=711798.5, ans=0.0 2024-09-18 05:02:41,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.97 vs. limit=10.0 2024-09-18 05:02:52,283 INFO [train.py:1198] (0/2) Epoch 40, batch 2000, loss[loss=0.2533, ctc_loss=0.1688, cr_loss=0.4226, over 20626.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1444, cr_loss=0.3684, over 4087621.96 frames. ], batch size: 66, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:03:06,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.68 vs. limit=6.0 2024-09-18 05:03:13,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=711883.5, ans=0.125 2024-09-18 05:03:19,711 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.217e+02 2.386e+02 2.605e+02 3.145e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-18 05:03:27,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=711911.8333333334, ans=0.0 2024-09-18 05:03:30,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=711911.8333333334, ans=0.125 2024-09-18 05:03:35,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=711911.8333333334, ans=0.125 2024-09-18 05:03:36,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=711940.1666666666, ans=0.0 2024-09-18 05:03:44,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-18 05:04:08,710 INFO [train.py:1198] (0/2) Epoch 40, batch 2050, loss[loss=0.1763, ctc_loss=0.1119, cr_loss=0.322, over 20952.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1441, cr_loss=0.3681, over 4094530.43 frames. ], batch size: 49, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:04:12,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=711996.8333333334, ans=0.025 2024-09-18 05:04:25,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712025.1666666666, ans=0.1 2024-09-18 05:05:09,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=712110.1666666666, ans=0.125 2024-09-18 05:05:14,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712110.1666666666, ans=0.1 2024-09-18 05:05:17,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=712110.1666666666, ans=0.125 2024-09-18 05:05:24,382 INFO [train.py:1198] (0/2) Epoch 40, batch 2100, loss[loss=0.1808, ctc_loss=0.1175, cr_loss=0.3165, over 20976.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1436, cr_loss=0.3679, over 4095424.87 frames. ], batch size: 49, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:05:29,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=712138.5, ans=0.125 2024-09-18 05:05:33,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=712138.5, ans=0.0 2024-09-18 05:05:52,080 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.197e+02 2.316e+02 2.443e+02 3.588e+02, threshold=4.633e+02, percent-clipped=0.0 2024-09-18 05:06:01,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=712195.1666666666, ans=0.125 2024-09-18 05:06:16,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=712223.5, ans=0.125 2024-09-18 05:06:24,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=712251.8333333334, ans=0.0 2024-09-18 05:06:36,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=712251.8333333334, ans=0.125 2024-09-18 05:06:39,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=712280.1666666666, ans=0.125 2024-09-18 05:06:40,451 INFO [train.py:1198] (0/2) Epoch 40, batch 2150, loss[loss=0.2338, ctc_loss=0.1578, cr_loss=0.3801, over 19428.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1437, cr_loss=0.3676, over 4098770.40 frames. ], batch size: 90, lr: 2.07e-03, grad_scale: 32.0 2024-09-18 05:06:45,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=712280.1666666666, ans=0.0 2024-09-18 05:06:47,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2024-09-18 05:06:58,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=712308.5, ans=0.2 2024-09-18 05:07:46,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=712393.5, ans=0.05 2024-09-18 05:07:56,589 INFO [train.py:1198] (0/2) Epoch 40, batch 2200, loss[loss=0.1773, ctc_loss=0.1143, cr_loss=0.3152, over 20994.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1437, cr_loss=0.3673, over 4099813.11 frames. ], batch size: 48, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:07:59,077 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-18 05:08:00,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.00 vs. limit=10.0 2024-09-18 05:08:29,673 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.189e+02 2.341e+02 2.464e+02 3.912e+02, threshold=4.682e+02, percent-clipped=0.0 2024-09-18 05:08:52,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=712506.8333333334, ans=0.125 2024-09-18 05:09:00,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=712506.8333333334, ans=0.0 2024-09-18 05:09:01,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=712535.1666666666, ans=0.125 2024-09-18 05:09:06,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=712535.1666666666, ans=0.125 2024-09-18 05:09:18,269 INFO [train.py:1198] (0/2) Epoch 40, batch 2250, loss[loss=0.2184, ctc_loss=0.1438, cr_loss=0.3728, over 21044.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.144, cr_loss=0.368, over 4105642.88 frames. ], batch size: 62, lr: 2.06e-03, grad_scale: 64.0 2024-09-18 05:10:05,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=712648.5, ans=0.0 2024-09-18 05:10:34,061 INFO [train.py:1198] (0/2) Epoch 40, batch 2300, loss[loss=0.2457, ctc_loss=0.1666, cr_loss=0.3954, over 19414.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1431, cr_loss=0.3665, over 4114933.95 frames. ], batch size: 90, lr: 2.06e-03, grad_scale: 64.0 2024-09-18 05:10:34,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=712705.1666666666, ans=0.125 2024-09-18 05:10:35,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=712705.1666666666, ans=0.125 2024-09-18 05:10:40,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=712705.1666666666, ans=0.125 2024-09-18 05:11:01,034 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.194e+02 2.340e+02 2.474e+02 4.176e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-18 05:11:22,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712790.1666666666, ans=0.1 2024-09-18 05:11:28,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=712790.1666666666, ans=0.0 2024-09-18 05:11:36,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712818.5, ans=0.1 2024-09-18 05:11:38,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-18 05:11:49,327 INFO [train.py:1198] (0/2) Epoch 40, batch 2350, loss[loss=0.2258, ctc_loss=0.1501, cr_loss=0.3787, over 21028.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1437, cr_loss=0.3677, over 4101617.54 frames. ], batch size: 61, lr: 2.06e-03, grad_scale: 64.0 2024-09-18 05:12:03,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712875.1666666666, ans=0.1 2024-09-18 05:12:09,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=712875.1666666666, ans=0.0 2024-09-18 05:12:38,367 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:12:39,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=712931.8333333334, ans=0.025 2024-09-18 05:13:05,327 INFO [train.py:1198] (0/2) Epoch 40, batch 2400, loss[loss=0.2577, ctc_loss=0.1751, cr_loss=0.413, over 19491.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3688, over 4097887.65 frames. ], batch size: 90, lr: 2.06e-03, grad_scale: 64.0 2024-09-18 05:13:29,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=713016.8333333334, ans=0.0 2024-09-18 05:13:32,717 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.281e+02 2.452e+02 2.758e+02 4.358e+02, threshold=4.904e+02, percent-clipped=0.0 2024-09-18 05:14:21,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=713101.8333333334, ans=15.0 2024-09-18 05:14:26,527 INFO [train.py:1198] (0/2) Epoch 40, batch 2450, loss[loss=0.2457, ctc_loss=0.1633, cr_loss=0.4122, over 20705.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1437, cr_loss=0.3682, over 4099489.02 frames. ], batch size: 71, lr: 2.06e-03, grad_scale: 64.0 2024-09-18 05:14:26,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=713130.1666666666, ans=0.125 2024-09-18 05:14:43,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=713158.5, ans=0.125 2024-09-18 05:15:02,953 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:15:32,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=713243.5, ans=0.125 2024-09-18 05:15:42,450 INFO [train.py:1198] (0/2) Epoch 40, batch 2500, loss[loss=0.2489, ctc_loss=0.1669, cr_loss=0.4098, over 20059.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.3688, over 4086901.10 frames. ], batch size: 80, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:15:53,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=713271.8333333334, ans=0.1 2024-09-18 05:16:02,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=713300.1666666666, ans=0.0 2024-09-18 05:16:10,953 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.211e+02 2.344e+02 2.471e+02 3.301e+02, threshold=4.688e+02, percent-clipped=0.0 2024-09-18 05:16:39,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=713356.8333333334, ans=0.125 2024-09-18 05:16:53,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=713385.1666666666, ans=0.125 2024-09-18 05:16:54,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=713385.1666666666, ans=0.1 2024-09-18 05:16:57,596 INFO [train.py:1198] (0/2) Epoch 40, batch 2550, loss[loss=0.1872, ctc_loss=0.1217, cr_loss=0.3275, over 21010.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1446, cr_loss=0.3692, over 4094729.08 frames. ], batch size: 61, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:17:00,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=713413.5, ans=0.125 2024-09-18 05:17:04,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=713413.5, ans=0.125 2024-09-18 05:17:32,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=713470.1666666666, ans=0.0 2024-09-18 05:17:40,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=713470.1666666666, ans=0.125 2024-09-18 05:17:55,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=713498.5, ans=0.0 2024-09-18 05:18:13,578 INFO [train.py:1198] (0/2) Epoch 40, batch 2600, loss[loss=0.2294, ctc_loss=0.1532, cr_loss=0.3811, over 20279.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.144, cr_loss=0.3679, over 4084464.20 frames. ], batch size: 74, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:18:24,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=713555.1666666666, ans=0.025 2024-09-18 05:18:30,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=713583.5, ans=0.0 2024-09-18 05:18:42,328 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.166e+02 2.288e+02 2.468e+02 5.699e+02, threshold=4.576e+02, percent-clipped=1.0 2024-09-18 05:18:49,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.73 vs. limit=6.0 2024-09-18 05:19:35,013 INFO [train.py:1198] (0/2) Epoch 40, batch 2650, loss[loss=0.2155, ctc_loss=0.144, cr_loss=0.3578, over 20976.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1438, cr_loss=0.3676, over 4088723.11 frames. ], batch size: 55, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:20:50,419 INFO [train.py:1198] (0/2) Epoch 40, batch 2700, loss[loss=0.1862, ctc_loss=0.1213, cr_loss=0.3242, over 20774.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1439, cr_loss=0.3672, over 4080542.35 frames. ], batch size: 53, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:20:53,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=713838.5, ans=0.0 2024-09-18 05:20:59,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=713838.5, ans=0.125 2024-09-18 05:21:07,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=713866.8333333334, ans=0.125 2024-09-18 05:21:19,315 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.197e+02 2.311e+02 2.496e+02 3.574e+02, threshold=4.623e+02, percent-clipped=0.0 2024-09-18 05:22:06,514 INFO [train.py:1198] (0/2) Epoch 40, batch 2750, loss[loss=0.2237, ctc_loss=0.149, cr_loss=0.3737, over 19368.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1434, cr_loss=0.3665, over 4084957.90 frames. ], batch size: 90, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:22:08,447 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:22:15,878 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-252000.pt 2024-09-18 05:22:21,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=714008.5, ans=0.125 2024-09-18 05:22:41,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714036.8333333334, ans=0.1 2024-09-18 05:22:50,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=714036.8333333334, ans=0.125 2024-09-18 05:23:19,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=714093.5, ans=0.2 2024-09-18 05:23:23,742 INFO [train.py:1198] (0/2) Epoch 40, batch 2800, loss[loss=0.2122, ctc_loss=0.1417, cr_loss=0.3527, over 20808.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1436, cr_loss=0.3674, over 4089499.02 frames. ], batch size: 59, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:23:52,781 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.199e+02 2.307e+02 2.456e+02 5.088e+02, threshold=4.613e+02, percent-clipped=1.0 2024-09-18 05:24:39,726 INFO [train.py:1198] (0/2) Epoch 40, batch 2850, loss[loss=0.2424, ctc_loss=0.1588, cr_loss=0.418, over 20859.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1434, cr_loss=0.367, over 4097721.22 frames. ], batch size: 65, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:25:15,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=714320.1666666666, ans=0.0 2024-09-18 05:25:40,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=714348.5, ans=0.2 2024-09-18 05:26:01,650 INFO [train.py:1198] (0/2) Epoch 40, batch 2900, loss[loss=0.232, ctc_loss=0.1562, cr_loss=0.3788, over 21035.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1439, cr_loss=0.3677, over 4084195.92 frames. ], batch size: 62, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:26:30,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.193e+02 2.354e+02 2.517e+02 5.695e+02, threshold=4.708e+02, percent-clipped=1.0 2024-09-18 05:26:32,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=714461.8333333334, ans=0.0 2024-09-18 05:27:08,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=714518.5, ans=0.125 2024-09-18 05:27:17,436 INFO [train.py:1198] (0/2) Epoch 40, batch 2950, loss[loss=0.197, ctc_loss=0.1285, cr_loss=0.3425, over 21036.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1436, cr_loss=0.3669, over 4081242.65 frames. ], batch size: 53, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:27:29,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-09-18 05:27:55,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=714603.5, ans=0.125 2024-09-18 05:28:12,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=714631.8333333334, ans=0.0 2024-09-18 05:28:21,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=714660.1666666666, ans=0.2 2024-09-18 05:28:32,978 INFO [train.py:1198] (0/2) Epoch 40, batch 3000, loss[loss=0.2024, ctc_loss=0.1325, cr_loss=0.3492, over 20998.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3685, over 4072289.43 frames. ], batch size: 52, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:28:32,980 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 05:28:52,101 INFO [train.py:1230] (0/2) Epoch 40, validation: loss=0.03997, ctc_loss=0.03997, cr_loss=1.401e-14, over 944034.00 frames. 2024-09-18 05:28:52,102 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 05:29:07,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=714716.8333333334, ans=0.025 2024-09-18 05:29:16,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=714716.8333333334, ans=0.125 2024-09-18 05:29:21,131 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.205e+02 2.349e+02 2.498e+02 3.902e+02, threshold=4.698e+02, percent-clipped=0.0 2024-09-18 05:29:54,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=714801.8333333334, ans=0.125 2024-09-18 05:30:00,685 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:30:07,831 INFO [train.py:1198] (0/2) Epoch 40, batch 3050, loss[loss=0.2276, ctc_loss=0.1513, cr_loss=0.3818, over 20945.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3689, over 4089417.81 frames. ], batch size: 64, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:30:21,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=714858.5, ans=0.09899494936611666 2024-09-18 05:30:46,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=714886.8333333334, ans=0.2 2024-09-18 05:31:16,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=714943.5, ans=0.0 2024-09-18 05:31:27,016 INFO [train.py:1198] (0/2) Epoch 40, batch 3100, loss[loss=0.1813, ctc_loss=0.1173, cr_loss=0.3197, over 20959.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1438, cr_loss=0.3679, over 4093339.32 frames. ], batch size: 50, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:31:55,440 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.196e+02 2.363e+02 2.569e+02 5.054e+02, threshold=4.725e+02, percent-clipped=1.0 2024-09-18 05:32:06,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=715028.5, ans=10.0 2024-09-18 05:32:12,787 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-18 05:32:32,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715085.1666666666, ans=0.1 2024-09-18 05:32:41,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=715113.5, ans=0.125 2024-09-18 05:32:43,023 INFO [train.py:1198] (0/2) Epoch 40, batch 3150, loss[loss=0.2109, ctc_loss=0.1409, cr_loss=0.3501, over 21019.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1444, cr_loss=0.368, over 4085499.17 frames. ], batch size: 61, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:32:44,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=715113.5, ans=0.1 2024-09-18 05:32:49,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=715113.5, ans=0.125 2024-09-18 05:33:35,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=715198.5, ans=0.125 2024-09-18 05:33:46,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=715226.8333333334, ans=0.025 2024-09-18 05:33:49,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=715226.8333333334, ans=0.0 2024-09-18 05:33:58,452 INFO [train.py:1198] (0/2) Epoch 40, batch 3200, loss[loss=0.251, ctc_loss=0.1697, cr_loss=0.4065, over 20950.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.145, cr_loss=0.3691, over 4088773.07 frames. ], batch size: 64, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:34:27,350 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.211e+02 2.329e+02 2.499e+02 3.959e+02, threshold=4.658e+02, percent-clipped=0.0 2024-09-18 05:34:30,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=22.5 2024-09-18 05:34:42,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=715340.1666666666, ans=0.0 2024-09-18 05:34:54,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=715340.1666666666, ans=0.07 2024-09-18 05:35:14,010 INFO [train.py:1198] (0/2) Epoch 40, batch 3250, loss[loss=0.2225, ctc_loss=0.1454, cr_loss=0.3852, over 20870.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1441, cr_loss=0.3683, over 4107403.65 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:35:32,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=715425.1666666666, ans=0.2 2024-09-18 05:35:38,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=715425.1666666666, ans=0.025 2024-09-18 05:35:45,011 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=22.5 2024-09-18 05:36:31,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.91 vs. limit=10.0 2024-09-18 05:36:35,994 INFO [train.py:1198] (0/2) Epoch 40, batch 3300, loss[loss=0.2469, ctc_loss=0.1655, cr_loss=0.407, over 20834.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.3712, over 4099050.56 frames. ], batch size: 65, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:36:46,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=715538.5, ans=0.025 2024-09-18 05:37:06,377 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.240e+02 2.360e+02 2.548e+02 4.061e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-18 05:37:23,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715623.5, ans=0.1 2024-09-18 05:37:23,788 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=22.5 2024-09-18 05:37:31,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=715623.5, ans=0.0 2024-09-18 05:37:52,070 INFO [train.py:1198] (0/2) Epoch 40, batch 3350, loss[loss=0.2392, ctc_loss=0.1665, cr_loss=0.363, over 14475.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.3701, over 4097598.32 frames. ], batch size: 149, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:38:39,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=715765.1666666666, ans=0.125 2024-09-18 05:38:42,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=715765.1666666666, ans=10.0 2024-09-18 05:38:51,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=715793.5, ans=0.125 2024-09-18 05:39:07,756 INFO [train.py:1198] (0/2) Epoch 40, batch 3400, loss[loss=0.2179, ctc_loss=0.1452, cr_loss=0.3635, over 21073.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1442, cr_loss=0.3689, over 4099665.13 frames. ], batch size: 53, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:39:15,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=715821.8333333334, ans=0.125 2024-09-18 05:39:24,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=715850.1666666666, ans=0.125 2024-09-18 05:39:32,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=715850.1666666666, ans=0.125 2024-09-18 05:39:37,990 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.268e+02 2.415e+02 2.555e+02 3.340e+02, threshold=4.831e+02, percent-clipped=0.0 2024-09-18 05:40:23,435 INFO [train.py:1198] (0/2) Epoch 40, batch 3450, loss[loss=0.2015, ctc_loss=0.1324, cr_loss=0.3456, over 20287.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1451, cr_loss=0.3707, over 4103071.68 frames. ], batch size: 45, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:40:35,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=715963.5, ans=0.0 2024-09-18 05:40:48,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=715991.8333333334, ans=0.0 2024-09-18 05:41:22,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=716076.8333333334, ans=0.125 2024-09-18 05:41:36,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=716076.8333333334, ans=0.5 2024-09-18 05:41:37,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=716105.1666666666, ans=0.0 2024-09-18 05:41:39,181 INFO [train.py:1198] (0/2) Epoch 40, batch 3500, loss[loss=0.1899, ctc_loss=0.1235, cr_loss=0.3318, over 20354.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3703, over 4093015.81 frames. ], batch size: 45, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:41:59,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=15.0 2024-09-18 05:42:12,724 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.250e+02 2.387e+02 2.515e+02 3.300e+02, threshold=4.773e+02, percent-clipped=0.0 2024-09-18 05:43:00,997 INFO [train.py:1198] (0/2) Epoch 40, batch 3550, loss[loss=0.257, ctc_loss=0.1776, cr_loss=0.3973, over 18134.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1438, cr_loss=0.368, over 4106700.09 frames. ], batch size: 108, lr: 2.06e-03, grad_scale: 16.0 2024-09-18 05:43:06,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-09-18 05:43:18,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=716275.1666666666, ans=0.125 2024-09-18 05:43:56,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=716331.8333333334, ans=0.05 2024-09-18 05:44:17,834 INFO [train.py:1198] (0/2) Epoch 40, batch 3600, loss[loss=0.2312, ctc_loss=0.1559, cr_loss=0.3762, over 20852.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1441, cr_loss=0.3685, over 4101624.92 frames. ], batch size: 65, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:44:44,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2024-09-18 05:44:48,332 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.192e+02 2.361e+02 2.576e+02 6.666e+02, threshold=4.721e+02, percent-clipped=1.0 2024-09-18 05:44:53,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=716445.1666666666, ans=0.025 2024-09-18 05:44:59,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2024-09-18 05:45:07,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2024-09-18 05:45:16,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=22.5 2024-09-18 05:45:34,280 INFO [train.py:1198] (0/2) Epoch 40, batch 3650, loss[loss=0.1849, ctc_loss=0.1196, cr_loss=0.3269, over 20265.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.37, over 4104613.62 frames. ], batch size: 45, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:45:49,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=716558.5, ans=0.07 2024-09-18 05:46:14,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=716586.8333333334, ans=0.125 2024-09-18 05:46:49,429 INFO [train.py:1198] (0/2) Epoch 40, batch 3700, loss[loss=0.2392, ctc_loss=0.1587, cr_loss=0.4024, over 19970.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1454, cr_loss=0.3713, over 4112735.37 frames. ], batch size: 80, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:46:51,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716671.8333333334, ans=0.1 2024-09-18 05:46:55,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=716671.8333333334, ans=0.2 2024-09-18 05:47:04,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=716700.1666666666, ans=0.125 2024-09-18 05:47:04,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=716700.1666666666, ans=0.1 2024-09-18 05:47:06,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=716700.1666666666, ans=0.0 2024-09-18 05:47:19,457 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.252e+02 2.389e+02 2.518e+02 3.350e+02, threshold=4.779e+02, percent-clipped=0.0 2024-09-18 05:48:08,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=716785.1666666666, ans=0.0 2024-09-18 05:48:10,918 INFO [train.py:1198] (0/2) Epoch 40, batch 3750, loss[loss=0.2082, ctc_loss=0.1374, cr_loss=0.3539, over 20937.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1452, cr_loss=0.3713, over 4121035.55 frames. ], batch size: 51, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:48:47,244 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.68 vs. limit=22.5 2024-09-18 05:49:11,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-09-18 05:49:12,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=716926.8333333334, ans=0.2 2024-09-18 05:49:26,833 INFO [train.py:1198] (0/2) Epoch 40, batch 3800, loss[loss=0.2401, ctc_loss=0.1617, cr_loss=0.3921, over 18489.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.3697, over 4130596.76 frames. ], batch size: 108, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:49:56,937 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.174e+02 2.344e+02 2.485e+02 3.489e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-18 05:50:03,508 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:50:04,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=717011.8333333334, ans=0.05 2024-09-18 05:50:14,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=717040.1666666666, ans=0.0 2024-09-18 05:50:15,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=717040.1666666666, ans=0.125 2024-09-18 05:50:42,677 INFO [train.py:1198] (0/2) Epoch 40, batch 3850, loss[loss=0.1953, ctc_loss=0.1279, cr_loss=0.3368, over 20985.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.145, cr_loss=0.3706, over 4113550.11 frames. ], batch size: 48, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:50:55,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=717096.8333333334, ans=0.125 2024-09-18 05:51:28,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=717181.8333333334, ans=0.1 2024-09-18 05:51:58,329 INFO [train.py:1198] (0/2) Epoch 40, batch 3900, loss[loss=0.2125, ctc_loss=0.1402, cr_loss=0.3615, over 20284.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3707, over 4107699.93 frames. ], batch size: 74, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:52:16,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=717266.8333333334, ans=10.0 2024-09-18 05:52:22,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=717266.8333333334, ans=0.0 2024-09-18 05:52:28,803 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.204e+02 2.334e+02 2.486e+02 5.419e+02, threshold=4.667e+02, percent-clipped=2.0 2024-09-18 05:52:35,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=717295.1666666666, ans=0.125 2024-09-18 05:52:37,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.71 vs. limit=10.0 2024-09-18 05:52:40,069 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-18 05:52:59,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=717351.8333333334, ans=0.0 2024-09-18 05:53:01,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=22.5 2024-09-18 05:53:14,172 INFO [train.py:1198] (0/2) Epoch 40, batch 3950, loss[loss=0.2417, ctc_loss=0.1655, cr_loss=0.3807, over 20665.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3704, over 4119479.88 frames. ], batch size: 68, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:54:03,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.24 vs. limit=15.0 2024-09-18 05:54:05,283 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-18 05:54:15,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=717465.1666666666, ans=0.0 2024-09-18 05:54:36,552 INFO [train.py:1198] (0/2) Epoch 40, batch 4000, loss[loss=0.2321, ctc_loss=0.1543, cr_loss=0.3892, over 20875.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3706, over 4122795.08 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:54:36,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=717521.8333333334, ans=0.125 2024-09-18 05:55:06,574 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.236e+02 2.373e+02 2.510e+02 5.322e+02, threshold=4.745e+02, percent-clipped=1.0 2024-09-18 05:55:07,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=717578.5, ans=0.125 2024-09-18 05:55:34,675 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-09-18 05:55:51,633 INFO [train.py:1198] (0/2) Epoch 40, batch 4050, loss[loss=0.2593, ctc_loss=0.1729, cr_loss=0.4321, over 20048.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1448, cr_loss=0.3697, over 4121804.25 frames. ], batch size: 80, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:56:00,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=717663.5, ans=0.125 2024-09-18 05:56:02,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=717663.5, ans=0.025 2024-09-18 05:56:03,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=717663.5, ans=0.07 2024-09-18 05:57:07,725 INFO [train.py:1198] (0/2) Epoch 40, batch 4100, loss[loss=0.2418, ctc_loss=0.1613, cr_loss=0.4027, over 20686.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.144, cr_loss=0.3681, over 4127374.20 frames. ], batch size: 68, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:57:37,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=717861.8333333334, ans=0.125 2024-09-18 05:57:38,172 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.146e+02 2.266e+02 2.399e+02 2.941e+02, threshold=4.532e+02, percent-clipped=0.0 2024-09-18 05:57:49,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.77 vs. limit=10.0 2024-09-18 05:57:57,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=22.5 2024-09-18 05:58:19,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=717918.5, ans=0.1 2024-09-18 05:58:23,594 INFO [train.py:1198] (0/2) Epoch 40, batch 4150, loss[loss=0.2358, ctc_loss=0.1558, cr_loss=0.3998, over 20355.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1442, cr_loss=0.3685, over 4130778.13 frames. ], batch size: 74, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:58:26,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=717946.8333333334, ans=0.0 2024-09-18 05:58:33,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=717946.8333333334, ans=0.025 2024-09-18 05:59:01,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.56 vs. limit=10.0 2024-09-18 05:59:04,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=718003.5, ans=0.125 2024-09-18 05:59:21,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-09-18 05:59:25,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=718060.1666666666, ans=0.0 2024-09-18 05:59:27,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=718060.1666666666, ans=0.0 2024-09-18 05:59:45,141 INFO [train.py:1198] (0/2) Epoch 40, batch 4200, loss[loss=0.2125, ctc_loss=0.1371, cr_loss=0.3768, over 20982.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1437, cr_loss=0.3678, over 4112364.57 frames. ], batch size: 61, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 05:59:56,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=718088.5, ans=0.0 2024-09-18 06:00:11,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=718116.8333333334, ans=0.0 2024-09-18 06:00:15,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.208e+02 2.321e+02 2.458e+02 3.626e+02, threshold=4.642e+02, percent-clipped=0.0 2024-09-18 06:00:22,454 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2024-09-18 06:00:41,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=22.5 2024-09-18 06:00:52,086 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:01:00,868 INFO [train.py:1198] (0/2) Epoch 40, batch 4250, loss[loss=0.2469, ctc_loss=0.1669, cr_loss=0.3999, over 19337.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1436, cr_loss=0.3679, over 4118041.66 frames. ], batch size: 90, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:01:08,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=718230.1666666666, ans=0.025 2024-09-18 06:01:11,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=718230.1666666666, ans=0.1 2024-09-18 06:01:37,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=718286.8333333334, ans=0.07 2024-09-18 06:01:54,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=718315.1666666666, ans=0.2 2024-09-18 06:01:56,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=718315.1666666666, ans=0.125 2024-09-18 06:02:12,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=718343.5, ans=0.1 2024-09-18 06:02:16,533 INFO [train.py:1198] (0/2) Epoch 40, batch 4300, loss[loss=0.2356, ctc_loss=0.1529, cr_loss=0.4132, over 20893.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3695, over 4111785.56 frames. ], batch size: 57, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:02:17,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=718371.8333333334, ans=0.1 2024-09-18 06:02:24,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=718371.8333333334, ans=0.0 2024-09-18 06:02:34,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=718400.1666666666, ans=22.5 2024-09-18 06:02:47,031 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.232e+02 2.350e+02 2.494e+02 3.008e+02, threshold=4.700e+02, percent-clipped=0.0 2024-09-18 06:02:47,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=718428.5, ans=0.0 2024-09-18 06:02:53,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=718428.5, ans=0.0 2024-09-18 06:03:09,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=718456.8333333334, ans=0.125 2024-09-18 06:03:11,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=718456.8333333334, ans=0.1 2024-09-18 06:03:31,966 INFO [train.py:1198] (0/2) Epoch 40, batch 4350, loss[loss=0.1921, ctc_loss=0.1237, cr_loss=0.3423, over 19829.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1454, cr_loss=0.3712, over 4105204.59 frames. ], batch size: 44, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:03:41,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=718513.5, ans=0.0 2024-09-18 06:04:47,489 INFO [train.py:1198] (0/2) Epoch 40, batch 4400, loss[loss=0.1937, ctc_loss=0.1255, cr_loss=0.341, over 20883.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1444, cr_loss=0.3699, over 4112415.46 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:05:20,454 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.237e+02 2.348e+02 2.483e+02 3.134e+02, threshold=4.695e+02, percent-clipped=0.0 2024-09-18 06:05:54,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.70 vs. limit=6.0 2024-09-18 06:06:08,666 INFO [train.py:1198] (0/2) Epoch 40, batch 4450, loss[loss=0.2148, ctc_loss=0.1449, cr_loss=0.3495, over 21010.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1448, cr_loss=0.3706, over 4111791.84 frames. ], batch size: 63, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:06:13,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=718796.8333333334, ans=0.0 2024-09-18 06:06:48,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=718853.5, ans=0.125 2024-09-18 06:06:52,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=718881.8333333334, ans=0.125 2024-09-18 06:06:58,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=718881.8333333334, ans=0.125 2024-09-18 06:07:20,360 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=22.5 2024-09-18 06:07:24,040 INFO [train.py:1198] (0/2) Epoch 40, batch 4500, loss[loss=0.2307, ctc_loss=0.1534, cr_loss=0.3865, over 21021.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1447, cr_loss=0.3708, over 4113870.33 frames. ], batch size: 63, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:07:39,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-18 06:07:53,996 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.213e+02 2.343e+02 2.475e+02 4.058e+02, threshold=4.686e+02, percent-clipped=0.0 2024-09-18 06:08:39,447 INFO [train.py:1198] (0/2) Epoch 40, batch 4550, loss[loss=0.1991, ctc_loss=0.1323, cr_loss=0.3344, over 20807.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1448, cr_loss=0.3708, over 4102770.80 frames. ], batch size: 53, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:08:47,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=719080.1666666666, ans=0.125 2024-09-18 06:08:52,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=719080.1666666666, ans=0.0 2024-09-18 06:08:56,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=719108.5, ans=0.125 2024-09-18 06:09:13,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=719136.8333333334, ans=0.2 2024-09-18 06:09:25,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-09-18 06:09:40,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=719193.5, ans=0.025 2024-09-18 06:09:52,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=719193.5, ans=0.125 2024-09-18 06:09:55,080 INFO [train.py:1198] (0/2) Epoch 40, batch 4600, loss[loss=0.2089, ctc_loss=0.14, cr_loss=0.3444, over 20885.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.3701, over 4103542.36 frames. ], batch size: 54, lr: 2.06e-03, grad_scale: 32.0 2024-09-18 06:10:01,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=719221.8333333334, ans=0.2 2024-09-18 06:10:15,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=22.5 2024-09-18 06:10:25,687 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.191e+02 2.323e+02 2.509e+02 4.931e+02, threshold=4.647e+02, percent-clipped=1.0 2024-09-18 06:10:42,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=719306.8333333334, ans=0.2 2024-09-18 06:10:46,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-18 06:11:15,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=719363.5, ans=0.125 2024-09-18 06:11:17,094 INFO [train.py:1198] (0/2) Epoch 40, batch 4650, loss[loss=0.2253, ctc_loss=0.1498, cr_loss=0.3779, over 20673.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.369, over 4113958.82 frames. ], batch size: 66, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:11:20,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=719363.5, ans=0.07 2024-09-18 06:11:24,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.72 vs. limit=10.0 2024-09-18 06:11:29,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=719363.5, ans=0.125 2024-09-18 06:11:38,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=719391.8333333334, ans=0.1 2024-09-18 06:11:44,863 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:11:50,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=719420.1666666666, ans=0.125 2024-09-18 06:12:00,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=719420.1666666666, ans=0.125 2024-09-18 06:12:32,943 INFO [train.py:1198] (0/2) Epoch 40, batch 4700, loss[loss=0.2344, ctc_loss=0.157, cr_loss=0.3872, over 20932.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1449, cr_loss=0.3705, over 4098905.03 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:12:42,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=719505.1666666666, ans=10.0 2024-09-18 06:13:03,118 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.847e+02 2.254e+02 2.379e+02 2.509e+02 3.494e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 06:13:10,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=719561.8333333334, ans=0.125 2024-09-18 06:13:33,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=719618.5, ans=0.125 2024-09-18 06:13:48,317 INFO [train.py:1198] (0/2) Epoch 40, batch 4750, loss[loss=0.238, ctc_loss=0.1587, cr_loss=0.3965, over 20067.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1458, cr_loss=0.3716, over 4084170.24 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:14:11,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=719675.1666666666, ans=0.2 2024-09-18 06:14:41,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=719731.8333333334, ans=0.125 2024-09-18 06:15:02,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=719788.5, ans=0.125 2024-09-18 06:15:02,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=719788.5, ans=10.0 2024-09-18 06:15:03,568 INFO [train.py:1198] (0/2) Epoch 40, batch 4800, loss[loss=0.233, ctc_loss=0.1547, cr_loss=0.3915, over 21100.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1455, cr_loss=0.371, over 4086762.19 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:15:03,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=719788.5, ans=0.125 2024-09-18 06:15:08,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=719788.5, ans=0.2 2024-09-18 06:15:11,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=719788.5, ans=0.025 2024-09-18 06:15:26,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=719816.8333333334, ans=0.125 2024-09-18 06:15:32,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=719845.1666666666, ans=0.0 2024-09-18 06:15:32,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=719845.1666666666, ans=0.025 2024-09-18 06:15:34,960 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.042e+02 2.277e+02 2.412e+02 2.562e+02 5.856e+02, threshold=4.825e+02, percent-clipped=1.0 2024-09-18 06:15:35,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=719845.1666666666, ans=0.0 2024-09-18 06:15:36,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=719845.1666666666, ans=0.2 2024-09-18 06:15:50,562 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2024-09-18 06:16:05,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=719901.8333333334, ans=0.2 2024-09-18 06:16:22,057 INFO [train.py:1198] (0/2) Epoch 40, batch 4850, loss[loss=0.2349, ctc_loss=0.1557, cr_loss=0.3962, over 20641.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1461, cr_loss=0.3722, over 4087860.78 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:16:57,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=719986.8333333334, ans=0.2 2024-09-18 06:17:39,960 INFO [train.py:1198] (0/2) Epoch 40, batch 4900, loss[loss=0.2265, ctc_loss=0.151, cr_loss=0.3775, over 20882.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3705, over 4095727.15 frames. ], batch size: 54, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:17:40,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=720071.8333333334, ans=0.125 2024-09-18 06:18:11,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=720128.5, ans=0.0 2024-09-18 06:18:12,904 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.223e+02 2.324e+02 2.522e+02 5.395e+02, threshold=4.649e+02, percent-clipped=1.0 2024-09-18 06:18:23,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=720156.8333333334, ans=0.0 2024-09-18 06:18:32,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720156.8333333334, ans=0.1 2024-09-18 06:18:54,429 INFO [train.py:1198] (0/2) Epoch 40, batch 4950, loss[loss=0.2172, ctc_loss=0.143, cr_loss=0.3709, over 20994.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1454, cr_loss=0.3707, over 4088529.12 frames. ], batch size: 55, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:19:35,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=15.0 2024-09-18 06:19:42,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=720298.5, ans=0.0 2024-09-18 06:20:08,685 INFO [train.py:1198] (0/2) Epoch 40, batch 5000, loss[loss=0.1916, ctc_loss=0.1232, cr_loss=0.3423, over 20956.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3695, over 4096322.30 frames. ], batch size: 48, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:20:09,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=720355.1666666666, ans=0.0 2024-09-18 06:20:14,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=720355.1666666666, ans=0.125 2024-09-18 06:20:41,593 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.206e+02 2.341e+02 2.465e+02 6.433e+02, threshold=4.683e+02, percent-clipped=1.0 2024-09-18 06:20:54,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720440.1666666666, ans=0.1 2024-09-18 06:21:23,119 INFO [train.py:1198] (0/2) Epoch 40, batch 5050, loss[loss=0.2479, ctc_loss=0.1646, cr_loss=0.4165, over 20669.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.3697, over 4098955.45 frames. ], batch size: 68, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:21:57,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=720553.5, ans=0.1 2024-09-18 06:22:14,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=720581.8333333334, ans=0.125 2024-09-18 06:22:33,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=720610.1666666666, ans=0.125 2024-09-18 06:22:37,924 INFO [train.py:1198] (0/2) Epoch 40, batch 5100, loss[loss=0.2624, ctc_loss=0.1849, cr_loss=0.3876, over 14632.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1451, cr_loss=0.3693, over 4085375.32 frames. ], batch size: 149, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:22:45,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=720638.5, ans=0.125 2024-09-18 06:23:10,437 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.204e+02 2.346e+02 2.481e+02 2.923e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-18 06:23:24,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=720723.5, ans=0.2 2024-09-18 06:23:45,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=720751.8333333334, ans=0.125 2024-09-18 06:23:50,421 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-09-18 06:23:52,531 INFO [train.py:1198] (0/2) Epoch 40, batch 5150, loss[loss=0.1929, ctc_loss=0.1242, cr_loss=0.3436, over 19872.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1452, cr_loss=0.3696, over 4079779.83 frames. ], batch size: 44, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:24:00,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-09-18 06:24:13,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=720808.5, ans=10.0 2024-09-18 06:24:55,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=720893.5, ans=0.125 2024-09-18 06:24:55,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=720893.5, ans=0.025 2024-09-18 06:24:58,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=720893.5, ans=0.2 2024-09-18 06:25:02,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-09-18 06:25:07,360 INFO [train.py:1198] (0/2) Epoch 40, batch 5200, loss[loss=0.2082, ctc_loss=0.1365, cr_loss=0.3585, over 21061.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1447, cr_loss=0.3688, over 4093748.56 frames. ], batch size: 53, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:25:07,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=720921.8333333334, ans=0.0 2024-09-18 06:25:19,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=720921.8333333334, ans=0.125 2024-09-18 06:25:29,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=720950.1666666666, ans=0.125 2024-09-18 06:25:30,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.37 vs. limit=10.0 2024-09-18 06:25:42,769 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.162e+02 2.284e+02 2.423e+02 6.480e+02, threshold=4.569e+02, percent-clipped=1.0 2024-09-18 06:26:26,545 INFO [train.py:1198] (0/2) Epoch 40, batch 5250, loss[loss=0.2168, ctc_loss=0.1449, cr_loss=0.3595, over 21072.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1456, cr_loss=0.3702, over 4093156.38 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:26:47,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=721091.8333333334, ans=0.125 2024-09-18 06:27:27,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=721176.8333333334, ans=0.2 2024-09-18 06:27:32,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=721176.8333333334, ans=0.0 2024-09-18 06:27:41,052 INFO [train.py:1198] (0/2) Epoch 40, batch 5300, loss[loss=0.2516, ctc_loss=0.168, cr_loss=0.4184, over 20828.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1467, cr_loss=0.3724, over 4089388.07 frames. ], batch size: 65, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:27:51,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=721205.1666666666, ans=0.125 2024-09-18 06:28:13,289 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 2.189e+02 2.348e+02 2.490e+02 3.604e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-18 06:28:19,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=721261.8333333334, ans=0.125 2024-09-18 06:28:55,274 INFO [train.py:1198] (0/2) Epoch 40, batch 5350, loss[loss=0.2045, ctc_loss=0.1305, cr_loss=0.37, over 21015.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1458, cr_loss=0.3713, over 4095594.22 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:28:55,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=721346.8333333334, ans=0.015 2024-09-18 06:29:03,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-09-18 06:29:10,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2024-09-18 06:30:07,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=721488.5, ans=0.1 2024-09-18 06:30:09,033 INFO [train.py:1198] (0/2) Epoch 40, batch 5400, loss[loss=0.2014, ctc_loss=0.1326, cr_loss=0.3442, over 21006.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1461, cr_loss=0.3722, over 4101979.98 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:30:32,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2024-09-18 06:30:41,796 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.239e+02 2.325e+02 2.499e+02 3.457e+02, threshold=4.651e+02, percent-clipped=0.0 2024-09-18 06:31:07,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=721601.8333333334, ans=0.125 2024-09-18 06:31:10,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=721601.8333333334, ans=0.0 2024-09-18 06:31:23,724 INFO [train.py:1198] (0/2) Epoch 40, batch 5450, loss[loss=0.2042, ctc_loss=0.1345, cr_loss=0.3486, over 20884.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3702, over 4107827.55 frames. ], batch size: 54, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:31:30,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=721630.1666666666, ans=0.2 2024-09-18 06:31:35,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-09-18 06:31:55,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=721686.8333333334, ans=0.125 2024-09-18 06:32:19,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=721715.1666666666, ans=0.025 2024-09-18 06:32:38,345 INFO [train.py:1198] (0/2) Epoch 40, batch 5500, loss[loss=0.2195, ctc_loss=0.1459, cr_loss=0.3677, over 20785.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1446, cr_loss=0.3688, over 4103997.28 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:32:58,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=721800.1666666666, ans=0.025 2024-09-18 06:33:09,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=721828.5, ans=0.0 2024-09-18 06:33:10,900 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.195e+02 2.287e+02 2.514e+02 4.006e+02, threshold=4.573e+02, percent-clipped=0.0 2024-09-18 06:33:18,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=721828.5, ans=0.125 2024-09-18 06:33:35,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=721856.8333333334, ans=0.125 2024-09-18 06:33:52,928 INFO [train.py:1198] (0/2) Epoch 40, batch 5550, loss[loss=0.2344, ctc_loss=0.1558, cr_loss=0.3933, over 20009.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1451, cr_loss=0.3699, over 4102906.43 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:34:30,065 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2024-09-18 06:34:49,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=721998.5, ans=10.0 2024-09-18 06:35:04,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=722026.8333333334, ans=0.0 2024-09-18 06:35:04,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2024-09-18 06:35:09,676 INFO [train.py:1198] (0/2) Epoch 40, batch 5600, loss[loss=0.2841, ctc_loss=0.1971, cr_loss=0.4351, over 14620.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1453, cr_loss=0.3698, over 4090706.79 frames. ], batch size: 149, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:35:09,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=722055.1666666666, ans=0.125 2024-09-18 06:35:19,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=722055.1666666666, ans=0.125 2024-09-18 06:35:44,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.273e+02 2.378e+02 2.596e+02 8.158e+02, threshold=4.756e+02, percent-clipped=1.0 2024-09-18 06:36:06,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722140.1666666666, ans=0.1 2024-09-18 06:36:14,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=722168.5, ans=0.125 2024-09-18 06:36:26,336 INFO [train.py:1198] (0/2) Epoch 40, batch 5650, loss[loss=0.2234, ctc_loss=0.1486, cr_loss=0.3739, over 20831.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1453, cr_loss=0.3708, over 4101275.15 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:36:35,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=722196.8333333334, ans=0.5 2024-09-18 06:36:41,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=722225.1666666666, ans=0.025 2024-09-18 06:37:06,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=722253.5, ans=0.125 2024-09-18 06:37:40,342 INFO [train.py:1198] (0/2) Epoch 40, batch 5700, loss[loss=0.2092, ctc_loss=0.1366, cr_loss=0.3632, over 20763.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.37, over 4103302.23 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:37:47,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=722338.5, ans=0.125 2024-09-18 06:38:12,618 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.198e+02 2.297e+02 2.427e+02 4.374e+02, threshold=4.594e+02, percent-clipped=0.0 2024-09-18 06:38:23,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=722423.5, ans=0.025 2024-09-18 06:38:23,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=22.5 2024-09-18 06:38:32,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=722423.5, ans=0.125 2024-09-18 06:38:54,034 INFO [train.py:1198] (0/2) Epoch 40, batch 5750, loss[loss=0.2321, ctc_loss=0.1551, cr_loss=0.3851, over 20052.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1444, cr_loss=0.3697, over 4100081.28 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:39:37,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=722565.1666666666, ans=0.0 2024-09-18 06:39:55,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=722593.5, ans=0.125 2024-09-18 06:39:56,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=722593.5, ans=0.0 2024-09-18 06:40:08,298 INFO [train.py:1198] (0/2) Epoch 40, batch 5800, loss[loss=0.2263, ctc_loss=0.152, cr_loss=0.3712, over 20051.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.37, over 4090501.12 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:40:17,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=722621.8333333334, ans=0.0 2024-09-18 06:40:17,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=722621.8333333334, ans=0.0 2024-09-18 06:40:42,163 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.228e+02 2.322e+02 2.457e+02 3.664e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-18 06:41:06,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=722735.1666666666, ans=0.125 2024-09-18 06:41:07,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=722735.1666666666, ans=0.125 2024-09-18 06:41:22,009 INFO [train.py:1198] (0/2) Epoch 40, batch 5850, loss[loss=0.2074, ctc_loss=0.1349, cr_loss=0.3625, over 20993.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1457, cr_loss=0.3714, over 4076142.94 frames. ], batch size: 48, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:41:22,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=722763.5, ans=0.025 2024-09-18 06:41:29,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=722763.5, ans=0.0 2024-09-18 06:41:38,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-09-18 06:41:40,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=722791.8333333334, ans=0.125 2024-09-18 06:41:50,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-09-18 06:41:58,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=22.5 2024-09-18 06:42:38,137 INFO [train.py:1198] (0/2) Epoch 40, batch 5900, loss[loss=0.2394, ctc_loss=0.1604, cr_loss=0.3952, over 18129.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1455, cr_loss=0.3711, over 4078329.64 frames. ], batch size: 108, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:43:12,566 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.213e+02 2.406e+02 2.573e+02 4.343e+02, threshold=4.812e+02, percent-clipped=0.0 2024-09-18 06:43:19,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=722961.8333333334, ans=0.125 2024-09-18 06:43:30,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-09-18 06:43:55,131 INFO [train.py:1198] (0/2) Epoch 40, batch 5950, loss[loss=0.2322, ctc_loss=0.1554, cr_loss=0.3841, over 20711.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1459, cr_loss=0.3717, over 4060952.45 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 16.0 2024-09-18 06:44:14,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=723075.1666666666, ans=0.125 2024-09-18 06:44:35,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=723103.5, ans=0.125 2024-09-18 06:44:45,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=723131.8333333334, ans=0.0 2024-09-18 06:45:02,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723160.1666666666, ans=0.1 2024-09-18 06:45:04,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=723160.1666666666, ans=0.125 2024-09-18 06:45:09,801 INFO [train.py:1198] (0/2) Epoch 40, batch 6000, loss[loss=0.2249, ctc_loss=0.1493, cr_loss=0.3779, over 20887.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1456, cr_loss=0.3711, over 4062741.77 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:45:09,801 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 06:45:24,000 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0689, 4.5420, 4.5689, 4.8999], device='cuda:0') 2024-09-18 06:45:31,395 INFO [train.py:1230] (0/2) Epoch 40, validation: loss=0.03931, ctc_loss=0.03931, cr_loss=1.423e-14, over 944034.00 frames. 2024-09-18 06:45:31,396 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 06:45:33,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=723188.5, ans=0.125 2024-09-18 06:46:05,325 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.242e+02 2.381e+02 2.556e+02 3.426e+02, threshold=4.762e+02, percent-clipped=0.0 2024-09-18 06:46:06,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.76 vs. limit=15.0 2024-09-18 06:46:12,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723245.1666666666, ans=0.1 2024-09-18 06:46:46,686 INFO [train.py:1198] (0/2) Epoch 40, batch 6050, loss[loss=0.2198, ctc_loss=0.145, cr_loss=0.3738, over 20032.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1456, cr_loss=0.3717, over 4077269.66 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:47:45,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=723443.5, ans=0.0 2024-09-18 06:48:01,227 INFO [train.py:1198] (0/2) Epoch 40, batch 6100, loss[loss=0.2332, ctc_loss=0.1519, cr_loss=0.4061, over 20956.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1458, cr_loss=0.3723, over 4082111.37 frames. ], batch size: 58, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:48:35,026 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.232e+02 2.364e+02 2.512e+02 4.651e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-18 06:48:45,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723556.8333333334, ans=0.1 2024-09-18 06:48:51,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=723556.8333333334, ans=0.025 2024-09-18 06:49:14,582 INFO [train.py:1198] (0/2) Epoch 40, batch 6150, loss[loss=0.1927, ctc_loss=0.125, cr_loss=0.3381, over 20989.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.3714, over 4072776.10 frames. ], batch size: 52, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:49:31,645 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-09-18 06:49:41,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=723641.8333333334, ans=0.125 2024-09-18 06:49:45,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=723670.1666666666, ans=0.125 2024-09-18 06:50:00,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723698.5, ans=0.1 2024-09-18 06:50:03,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2024-09-18 06:50:04,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=723698.5, ans=0.125 2024-09-18 06:50:28,979 INFO [train.py:1198] (0/2) Epoch 40, batch 6200, loss[loss=0.2195, ctc_loss=0.1444, cr_loss=0.3757, over 20875.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1462, cr_loss=0.3719, over 4049897.37 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:50:54,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723783.5, ans=0.1 2024-09-18 06:51:01,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723811.8333333334, ans=0.1 2024-09-18 06:51:03,580 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.199e+02 2.310e+02 2.501e+02 4.148e+02, threshold=4.620e+02, percent-clipped=0.0 2024-09-18 06:51:08,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=723811.8333333334, ans=0.0 2024-09-18 06:51:15,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723840.1666666666, ans=0.1 2024-09-18 06:51:29,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=723868.5, ans=0.07 2024-09-18 06:51:39,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=723868.5, ans=0.125 2024-09-18 06:51:43,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=22.5 2024-09-18 06:51:43,926 INFO [train.py:1198] (0/2) Epoch 40, batch 6250, loss[loss=0.2013, ctc_loss=0.1308, cr_loss=0.3525, over 20315.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1464, cr_loss=0.3712, over 4012663.78 frames. ], batch size: 45, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:51:55,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=723896.8333333334, ans=0.125 2024-09-18 06:52:00,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=723925.1666666666, ans=0.125 2024-09-18 06:52:25,583 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.14 vs. limit=6.0 2024-09-18 06:52:50,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=724010.1666666666, ans=0.125 2024-09-18 06:52:56,853 INFO [train.py:1198] (0/2) Epoch 40, batch 6300, loss[loss=0.181, ctc_loss=0.1195, cr_loss=0.3074, over 20983.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1474, cr_loss=0.3716, over 3958987.93 frames. ], batch size: 52, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:53:06,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2024-09-18 06:53:23,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724066.8333333334, ans=0.1 2024-09-18 06:53:29,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=724095.1666666666, ans=0.125 2024-09-18 06:53:30,982 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.285e+02 2.505e+02 2.763e+02 3.528e+02, threshold=5.009e+02, percent-clipped=0.0 2024-09-18 06:53:51,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=724123.5, ans=10.0 2024-09-18 06:54:10,922 INFO [train.py:1198] (0/2) Epoch 40, batch 6350, loss[loss=0.2072, ctc_loss=0.1354, cr_loss=0.3594, over 21002.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1467, cr_loss=0.3683, over 3871488.51 frames. ], batch size: 52, lr: 2.05e-03, grad_scale: 32.0 2024-09-18 06:54:21,942 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2024-09-18 06:55:09,654 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-40.pt 2024-09-18 06:55:58,403 INFO [train.py:1198] (0/2) Epoch 41, batch 0, loss[loss=0.2182, ctc_loss=0.1416, cr_loss=0.383, over 20839.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1416, cr_loss=0.383, over 20839.00 frames. ], batch size: 59, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 06:55:58,404 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 06:56:18,011 INFO [train.py:1230] (0/2) Epoch 41, validation: loss=0.0391, ctc_loss=0.0391, cr_loss=1.436e-14, over 944034.00 frames. 2024-09-18 06:56:18,012 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 06:56:30,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=724296.3333333334, ans=0.125 2024-09-18 06:56:39,723 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-18 06:57:06,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=724381.3333333334, ans=0.2 2024-09-18 06:57:07,587 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.339e+02 2.665e+02 2.902e+02 5.017e+02, threshold=5.331e+02, percent-clipped=1.0 2024-09-18 06:57:24,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724409.6666666666, ans=0.1 2024-09-18 06:57:29,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=22.5 2024-09-18 06:57:34,705 INFO [train.py:1198] (0/2) Epoch 41, batch 50, loss[loss=0.2289, ctc_loss=0.1531, cr_loss=0.3792, over 20824.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1434, cr_loss=0.3664, over 929386.54 frames. ], batch size: 59, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 06:58:14,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=724494.6666666666, ans=0.2 2024-09-18 06:58:44,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=724551.3333333334, ans=0.125 2024-09-18 06:58:49,993 INFO [train.py:1198] (0/2) Epoch 41, batch 100, loss[loss=0.2461, ctc_loss=0.1643, cr_loss=0.4088, over 20814.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1433, cr_loss=0.3668, over 1629752.85 frames. ], batch size: 65, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 06:59:15,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=724608.0, ans=0.2 2024-09-18 06:59:20,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=724636.3333333334, ans=0.07 2024-09-18 06:59:39,856 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.224e+02 2.374e+02 2.549e+02 3.947e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-18 07:00:07,303 INFO [train.py:1198] (0/2) Epoch 41, batch 150, loss[loss=0.2171, ctc_loss=0.1424, cr_loss=0.3734, over 20886.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1438, cr_loss=0.3679, over 2168204.57 frames. ], batch size: 57, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:00:07,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=724721.3333333334, ans=0.125 2024-09-18 07:00:18,778 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2024-09-18 07:00:34,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=724749.6666666666, ans=0.0 2024-09-18 07:00:41,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=724778.0, ans=0.07 2024-09-18 07:00:50,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724806.3333333334, ans=0.1 2024-09-18 07:00:52,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=724806.3333333334, ans=0.0 2024-09-18 07:01:21,953 INFO [train.py:1198] (0/2) Epoch 41, batch 200, loss[loss=0.1965, ctc_loss=0.1268, cr_loss=0.3486, over 21088.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1449, cr_loss=0.3704, over 2595244.12 frames. ], batch size: 59, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:01:23,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=724863.0, ans=0.025 2024-09-18 07:01:50,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=724919.6666666666, ans=0.125 2024-09-18 07:01:55,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=724919.6666666666, ans=0.2 2024-09-18 07:01:58,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=724919.6666666666, ans=0.0 2024-09-18 07:02:07,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=724948.0, ans=0.025 2024-09-18 07:02:10,325 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.197e+02 2.300e+02 2.501e+02 4.580e+02, threshold=4.599e+02, percent-clipped=0.0 2024-09-18 07:02:19,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=724948.0, ans=0.125 2024-09-18 07:02:22,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.49 vs. limit=15.0 2024-09-18 07:02:39,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=725004.6666666666, ans=0.125 2024-09-18 07:02:40,692 INFO [train.py:1198] (0/2) Epoch 41, batch 250, loss[loss=0.26, ctc_loss=0.1728, cr_loss=0.436, over 20954.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1443, cr_loss=0.3699, over 2926996.34 frames. ], batch size: 64, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:03:56,398 INFO [train.py:1198] (0/2) Epoch 41, batch 300, loss[loss=0.249, ctc_loss=0.17, cr_loss=0.395, over 18287.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1443, cr_loss=0.3698, over 3183190.77 frames. ], batch size: 108, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:03:58,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=725146.3333333334, ans=0.0 2024-09-18 07:04:12,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=725174.6666666666, ans=0.125 2024-09-18 07:04:43,994 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.181e+02 2.315e+02 2.486e+02 2.972e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-18 07:04:59,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=725259.6666666666, ans=0.025 2024-09-18 07:05:14,420 INFO [train.py:1198] (0/2) Epoch 41, batch 350, loss[loss=0.278, ctc_loss=0.1956, cr_loss=0.4117, over 14505.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3696, over 3380347.29 frames. ], batch size: 149, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:05:37,395 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-256000.pt 2024-09-18 07:05:46,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=725344.6666666666, ans=0.0 2024-09-18 07:06:01,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725373.0, ans=0.1 2024-09-18 07:06:06,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=12.0 2024-09-18 07:06:25,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=725401.3333333334, ans=0.1 2024-09-18 07:06:30,968 INFO [train.py:1198] (0/2) Epoch 41, batch 400, loss[loss=0.2786, ctc_loss=0.1944, cr_loss=0.4208, over 18157.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1443, cr_loss=0.3695, over 3534781.49 frames. ], batch size: 108, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:06:33,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.50 vs. limit=22.5 2024-09-18 07:07:00,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=725486.3333333334, ans=0.2 2024-09-18 07:07:01,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=725486.3333333334, ans=0.125 2024-09-18 07:07:19,393 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.240e+02 2.375e+02 2.534e+02 3.924e+02, threshold=4.750e+02, percent-clipped=0.0 2024-09-18 07:07:46,899 INFO [train.py:1198] (0/2) Epoch 41, batch 450, loss[loss=0.2465, ctc_loss=0.1668, cr_loss=0.3986, over 20973.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1436, cr_loss=0.3681, over 3662565.12 frames. ], batch size: 55, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:07:58,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.42 vs. limit=15.0 2024-09-18 07:08:02,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=725599.6666666666, ans=0.125 2024-09-18 07:08:17,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=725628.0, ans=0.125 2024-09-18 07:09:04,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=725713.0, ans=0.125 2024-09-18 07:09:05,473 INFO [train.py:1198] (0/2) Epoch 41, batch 500, loss[loss=0.1911, ctc_loss=0.1248, cr_loss=0.3314, over 20981.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3697, over 3749530.76 frames. ], batch size: 51, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:09:05,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=725713.0, ans=0.0 2024-09-18 07:09:08,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=725713.0, ans=0.1 2024-09-18 07:09:42,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=725769.6666666666, ans=0.125 2024-09-18 07:09:48,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=725769.6666666666, ans=0.2 2024-09-18 07:09:55,665 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.198e+02 2.324e+02 2.477e+02 3.326e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-18 07:10:03,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=725798.0, ans=0.2 2024-09-18 07:10:11,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=725826.3333333334, ans=0.2 2024-09-18 07:10:17,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=725826.3333333334, ans=0.2 2024-09-18 07:10:21,607 INFO [train.py:1198] (0/2) Epoch 41, batch 550, loss[loss=0.22, ctc_loss=0.1441, cr_loss=0.3797, over 20805.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.3688, over 3830279.81 frames. ], batch size: 56, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:10:28,421 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=22.5 2024-09-18 07:10:40,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=725883.0, ans=0.0 2024-09-18 07:11:21,132 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:11:24,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=725968.0, ans=0.04949747468305833 2024-09-18 07:11:40,828 INFO [train.py:1198] (0/2) Epoch 41, batch 600, loss[loss=0.2086, ctc_loss=0.1381, cr_loss=0.3528, over 20771.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1434, cr_loss=0.3679, over 3878809.76 frames. ], batch size: 53, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:12:27,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=726081.3333333334, ans=0.2 2024-09-18 07:12:30,520 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.188e+02 2.318e+02 2.434e+02 8.782e+02, threshold=4.636e+02, percent-clipped=1.0 2024-09-18 07:12:54,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=726138.0, ans=0.0 2024-09-18 07:12:56,071 INFO [train.py:1198] (0/2) Epoch 41, batch 650, loss[loss=0.205, ctc_loss=0.1342, cr_loss=0.354, over 20789.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1429, cr_loss=0.3668, over 3933340.16 frames. ], batch size: 56, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:14:14,088 INFO [train.py:1198] (0/2) Epoch 41, batch 700, loss[loss=0.1856, ctc_loss=0.1215, cr_loss=0.3204, over 21068.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.3687, over 3958274.27 frames. ], batch size: 53, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:14:39,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=726308.0, ans=0.025 2024-09-18 07:14:50,348 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:15:03,705 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.249e+02 2.365e+02 2.549e+02 3.781e+02, threshold=4.731e+02, percent-clipped=0.0 2024-09-18 07:15:29,418 INFO [train.py:1198] (0/2) Epoch 41, batch 750, loss[loss=0.2191, ctc_loss=0.144, cr_loss=0.3757, over 20872.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.3692, over 3988426.20 frames. ], batch size: 65, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:15:49,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=726449.6666666666, ans=0.0 2024-09-18 07:15:56,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=22.5 2024-09-18 07:16:48,272 INFO [train.py:1198] (0/2) Epoch 41, batch 800, loss[loss=0.2028, ctc_loss=0.1334, cr_loss=0.3474, over 20795.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.3701, over 4010518.21 frames. ], batch size: 53, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:16:50,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=12.0 2024-09-18 07:17:37,612 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.274e+02 2.412e+02 2.552e+02 4.256e+02, threshold=4.824e+02, percent-clipped=0.0 2024-09-18 07:18:03,267 INFO [train.py:1198] (0/2) Epoch 41, batch 850, loss[loss=0.2178, ctc_loss=0.1442, cr_loss=0.3682, over 21071.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3706, over 4038996.91 frames. ], batch size: 59, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:19:18,721 INFO [train.py:1198] (0/2) Epoch 41, batch 900, loss[loss=0.1812, ctc_loss=0.1151, cr_loss=0.3303, over 19961.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3698, over 4048611.70 frames. ], batch size: 44, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:19:22,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=726846.3333333334, ans=0.2 2024-09-18 07:19:26,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=726846.3333333334, ans=0.0 2024-09-18 07:19:37,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=726874.6666666666, ans=0.125 2024-09-18 07:20:11,724 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.202e+02 2.312e+02 2.459e+02 4.292e+02, threshold=4.624e+02, percent-clipped=0.0 2024-09-18 07:20:16,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=726931.3333333334, ans=0.1 2024-09-18 07:20:24,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.12 vs. limit=10.0 2024-09-18 07:20:36,893 INFO [train.py:1198] (0/2) Epoch 41, batch 950, loss[loss=0.2135, ctc_loss=0.14, cr_loss=0.3674, over 19451.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1449, cr_loss=0.371, over 4065848.05 frames. ], batch size: 90, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:20:42,443 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=8.0 2024-09-18 07:21:08,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=12.0 2024-09-18 07:21:52,350 INFO [train.py:1198] (0/2) Epoch 41, batch 1000, loss[loss=0.1896, ctc_loss=0.1214, cr_loss=0.3414, over 21063.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3708, over 4047182.21 frames. ], batch size: 53, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:22:45,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=12.0 2024-09-18 07:22:45,370 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-18 07:22:45,612 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.216e+02 2.380e+02 2.511e+02 4.521e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 07:22:52,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=727214.6666666666, ans=0.5 2024-09-18 07:23:11,715 INFO [train.py:1198] (0/2) Epoch 41, batch 1050, loss[loss=0.2902, ctc_loss=0.2037, cr_loss=0.4326, over 14195.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1448, cr_loss=0.3691, over 4043544.08 frames. ], batch size: 149, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:23:15,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=727271.3333333334, ans=0.125 2024-09-18 07:23:33,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=727299.6666666666, ans=0.025 2024-09-18 07:23:38,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=727299.6666666666, ans=0.125 2024-09-18 07:23:40,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=727328.0, ans=0.0 2024-09-18 07:23:43,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727328.0, ans=0.1 2024-09-18 07:23:43,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=727328.0, ans=0.2 2024-09-18 07:24:10,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=727384.6666666666, ans=0.125 2024-09-18 07:24:20,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=727384.6666666666, ans=0.125 2024-09-18 07:24:26,600 INFO [train.py:1198] (0/2) Epoch 41, batch 1100, loss[loss=0.2076, ctc_loss=0.1383, cr_loss=0.3469, over 20873.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.3698, over 4055845.63 frames. ], batch size: 57, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:24:54,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=727441.3333333334, ans=0.0 2024-09-18 07:25:00,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=727469.6666666666, ans=0.0 2024-09-18 07:25:16,518 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.218e+02 2.397e+02 2.583e+02 4.244e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 07:25:41,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=727526.3333333334, ans=0.125 2024-09-18 07:25:45,328 INFO [train.py:1198] (0/2) Epoch 41, batch 1150, loss[loss=0.2533, ctc_loss=0.171, cr_loss=0.4117, over 20014.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1458, cr_loss=0.3704, over 4061661.16 frames. ], batch size: 80, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:25:47,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=727554.6666666666, ans=0.0 2024-09-18 07:25:57,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=727554.6666666666, ans=0.125 2024-09-18 07:26:11,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=727583.0, ans=0.2 2024-09-18 07:26:26,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=727611.3333333334, ans=0.0 2024-09-18 07:26:34,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=727639.6666666666, ans=0.0 2024-09-18 07:26:46,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727668.0, ans=0.1 2024-09-18 07:27:01,795 INFO [train.py:1198] (0/2) Epoch 41, batch 1200, loss[loss=0.232, ctc_loss=0.1527, cr_loss=0.3963, over 20840.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1455, cr_loss=0.3698, over 4067977.95 frames. ], batch size: 59, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:27:07,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2024-09-18 07:27:10,517 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0 2024-09-18 07:27:18,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=727724.6666666666, ans=0.125 2024-09-18 07:27:22,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=727724.6666666666, ans=0.2 2024-09-18 07:27:22,469 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=22.5 2024-09-18 07:27:56,574 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.218e+02 2.368e+02 2.540e+02 3.737e+02, threshold=4.737e+02, percent-clipped=0.0 2024-09-18 07:28:08,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=727809.6666666666, ans=0.125 2024-09-18 07:28:15,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=727809.6666666666, ans=0.0 2024-09-18 07:28:18,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727809.6666666666, ans=0.1 2024-09-18 07:28:20,886 INFO [train.py:1198] (0/2) Epoch 41, batch 1250, loss[loss=0.2397, ctc_loss=0.1583, cr_loss=0.4074, over 21030.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1448, cr_loss=0.3693, over 4074935.18 frames. ], batch size: 61, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:28:30,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=727838.0, ans=0.125 2024-09-18 07:29:02,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-18 07:29:03,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=727894.6666666666, ans=22.5 2024-09-18 07:29:29,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-09-18 07:29:35,841 INFO [train.py:1198] (0/2) Epoch 41, batch 1300, loss[loss=0.1709, ctc_loss=0.1099, cr_loss=0.3049, over 20950.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1442, cr_loss=0.3683, over 4073324.48 frames. ], batch size: 49, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:29:59,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2024-09-18 07:30:16,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=728036.3333333334, ans=0.0 2024-09-18 07:30:27,198 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.243e+02 2.391e+02 2.591e+02 6.088e+02, threshold=4.783e+02, percent-clipped=1.0 2024-09-18 07:30:35,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=728093.0, ans=0.125 2024-09-18 07:30:47,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=728093.0, ans=0.0 2024-09-18 07:30:51,212 INFO [train.py:1198] (0/2) Epoch 41, batch 1350, loss[loss=0.184, ctc_loss=0.1193, cr_loss=0.3238, over 19827.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.3693, over 4078331.71 frames. ], batch size: 44, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:31:03,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=728121.3333333334, ans=0.0 2024-09-18 07:31:34,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=728178.0, ans=0.125 2024-09-18 07:31:38,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=728206.3333333334, ans=0.125 2024-09-18 07:31:54,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=728234.6666666666, ans=0.125 2024-09-18 07:32:10,542 INFO [train.py:1198] (0/2) Epoch 41, batch 1400, loss[loss=0.193, ctc_loss=0.1284, cr_loss=0.3232, over 20947.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1431, cr_loss=0.3674, over 4086786.55 frames. ], batch size: 49, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:32:14,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=728263.0, ans=0.1 2024-09-18 07:32:52,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728319.6666666666, ans=0.1 2024-09-18 07:32:52,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=728319.6666666666, ans=0.125 2024-09-18 07:33:04,005 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.297e+02 2.441e+02 2.623e+02 8.118e+02, threshold=4.882e+02, percent-clipped=1.0 2024-09-18 07:33:29,630 INFO [train.py:1198] (0/2) Epoch 41, batch 1450, loss[loss=0.2215, ctc_loss=0.1472, cr_loss=0.3714, over 19278.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1431, cr_loss=0.3671, over 4087874.80 frames. ], batch size: 90, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:33:55,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=728433.0, ans=0.125 2024-09-18 07:34:06,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=22.5 2024-09-18 07:34:09,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=728461.3333333334, ans=0.0 2024-09-18 07:34:36,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=728518.0, ans=0.125 2024-09-18 07:34:45,293 INFO [train.py:1198] (0/2) Epoch 41, batch 1500, loss[loss=0.2181, ctc_loss=0.1452, cr_loss=0.3645, over 20905.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3691, over 4087335.67 frames. ], batch size: 60, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:34:53,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=728546.3333333334, ans=0.125 2024-09-18 07:34:57,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=728546.3333333334, ans=0.125 2024-09-18 07:35:38,530 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.219e+02 2.350e+02 2.490e+02 3.492e+02, threshold=4.700e+02, percent-clipped=0.0 2024-09-18 07:36:00,905 INFO [train.py:1198] (0/2) Epoch 41, batch 1550, loss[loss=0.2442, ctc_loss=0.1622, cr_loss=0.4101, over 20942.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1441, cr_loss=0.3681, over 4092198.85 frames. ], batch size: 60, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:36:14,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.36 vs. limit=22.5 2024-09-18 07:36:14,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=728716.3333333334, ans=0.125 2024-09-18 07:36:26,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=728716.3333333334, ans=0.125 2024-09-18 07:36:29,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.43 vs. limit=8.0 2024-09-18 07:37:19,819 INFO [train.py:1198] (0/2) Epoch 41, batch 1600, loss[loss=0.2038, ctc_loss=0.1354, cr_loss=0.3423, over 21084.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1444, cr_loss=0.3682, over 4102585.42 frames. ], batch size: 53, lr: 2.02e-03, grad_scale: 32.0 2024-09-18 07:37:21,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=728829.6666666666, ans=0.0 2024-09-18 07:37:36,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=728858.0, ans=0.0 2024-09-18 07:37:56,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=728886.3333333334, ans=0.0 2024-09-18 07:37:58,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-18 07:38:12,910 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.041e+02 2.192e+02 2.282e+02 2.402e+02 2.882e+02, threshold=4.563e+02, percent-clipped=0.0 2024-09-18 07:38:35,455 INFO [train.py:1198] (0/2) Epoch 41, batch 1650, loss[loss=0.2233, ctc_loss=0.1465, cr_loss=0.3837, over 21031.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.369, over 4100266.61 frames. ], batch size: 62, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:39:26,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=729056.3333333334, ans=0.2 2024-09-18 07:39:36,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=729056.3333333334, ans=0.0 2024-09-18 07:39:41,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=729084.6666666666, ans=0.125 2024-09-18 07:39:45,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=729084.6666666666, ans=10.0 2024-09-18 07:39:54,781 INFO [train.py:1198] (0/2) Epoch 41, batch 1700, loss[loss=0.2322, ctc_loss=0.1551, cr_loss=0.3855, over 21043.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3695, over 4108159.12 frames. ], batch size: 62, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:39:59,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=729113.0, ans=0.125 2024-09-18 07:40:01,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729113.0, ans=0.1 2024-09-18 07:40:39,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2024-09-18 07:40:40,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=729198.0, ans=0.125 2024-09-18 07:40:49,384 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.205e+02 2.353e+02 2.501e+02 3.279e+02, threshold=4.706e+02, percent-clipped=0.0 2024-09-18 07:40:58,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=729226.3333333334, ans=0.0 2024-09-18 07:41:10,553 INFO [train.py:1198] (0/2) Epoch 41, batch 1750, loss[loss=0.2333, ctc_loss=0.1546, cr_loss=0.3934, over 20874.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1451, cr_loss=0.37, over 4110244.43 frames. ], batch size: 65, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:41:16,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729254.6666666666, ans=0.1 2024-09-18 07:41:17,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.17 vs. limit=22.5 2024-09-18 07:41:38,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=729283.0, ans=0.04949747468305833 2024-09-18 07:41:47,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=729311.3333333334, ans=0.0 2024-09-18 07:41:51,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=729311.3333333334, ans=0.0 2024-09-18 07:41:56,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=729339.6666666666, ans=0.2 2024-09-18 07:42:08,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=729339.6666666666, ans=0.125 2024-09-18 07:42:14,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=729368.0, ans=0.125 2024-09-18 07:42:26,484 INFO [train.py:1198] (0/2) Epoch 41, batch 1800, loss[loss=0.2303, ctc_loss=0.1537, cr_loss=0.3829, over 20691.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1447, cr_loss=0.3689, over 4119312.53 frames. ], batch size: 71, lr: 2.02e-03, grad_scale: 16.0 2024-09-18 07:43:02,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=729453.0, ans=0.125 2024-09-18 07:43:22,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2024-09-18 07:43:22,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=12.0 2024-09-18 07:43:25,014 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.201e+02 2.314e+02 2.529e+02 3.521e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-18 07:43:41,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2024-09-18 07:43:44,984 INFO [train.py:1198] (0/2) Epoch 41, batch 1850, loss[loss=0.2017, ctc_loss=0.131, cr_loss=0.3534, over 20937.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1439, cr_loss=0.3681, over 4117741.22 frames. ], batch size: 49, lr: 2.02e-03, grad_scale: 8.0 2024-09-18 07:43:51,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=729538.0, ans=0.125 2024-09-18 07:44:03,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=729566.3333333334, ans=0.2 2024-09-18 07:44:09,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-18 07:44:15,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=729594.6666666666, ans=0.025 2024-09-18 07:44:37,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=729623.0, ans=10.0 2024-09-18 07:45:00,099 INFO [train.py:1198] (0/2) Epoch 41, batch 1900, loss[loss=0.2261, ctc_loss=0.1507, cr_loss=0.3768, over 21003.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3685, over 4118822.94 frames. ], batch size: 61, lr: 2.01e-03, grad_scale: 8.0 2024-09-18 07:45:00,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=729679.6666666666, ans=0.125 2024-09-18 07:45:05,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=729679.6666666666, ans=0.125 2024-09-18 07:45:21,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=12.0 2024-09-18 07:45:30,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729708.0, ans=0.1 2024-09-18 07:45:41,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=729736.3333333334, ans=0.04949747468305833 2024-09-18 07:45:53,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=729764.6666666666, ans=0.0 2024-09-18 07:45:54,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=729764.6666666666, ans=10.0 2024-09-18 07:45:58,745 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.184e+02 2.286e+02 2.420e+02 2.896e+02, threshold=4.572e+02, percent-clipped=0.0 2024-09-18 07:46:18,572 INFO [train.py:1198] (0/2) Epoch 41, batch 1950, loss[loss=0.2039, ctc_loss=0.1337, cr_loss=0.351, over 20787.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.3692, over 4114600.34 frames. ], batch size: 53, lr: 2.01e-03, grad_scale: 8.0 2024-09-18 07:46:19,706 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.20 vs. limit=12.0 2024-09-18 07:46:37,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=729849.6666666666, ans=0.0 2024-09-18 07:46:53,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=729878.0, ans=0.0 2024-09-18 07:47:21,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=729934.6666666666, ans=0.0 2024-09-18 07:47:32,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=729934.6666666666, ans=0.0 2024-09-18 07:47:34,972 INFO [train.py:1198] (0/2) Epoch 41, batch 2000, loss[loss=0.2589, ctc_loss=0.1788, cr_loss=0.4007, over 14032.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1445, cr_loss=0.3702, over 4109976.44 frames. ], batch size: 149, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:48:31,263 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.222e+02 2.343e+02 2.508e+02 4.807e+02, threshold=4.686e+02, percent-clipped=1.0 2024-09-18 07:48:53,979 INFO [train.py:1198] (0/2) Epoch 41, batch 2050, loss[loss=0.2575, ctc_loss=0.172, cr_loss=0.4276, over 20644.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.3702, over 4110719.39 frames. ], batch size: 66, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:49:24,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=730161.3333333334, ans=0.0 2024-09-18 07:50:09,036 INFO [train.py:1198] (0/2) Epoch 41, batch 2100, loss[loss=0.2027, ctc_loss=0.133, cr_loss=0.3487, over 21028.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1442, cr_loss=0.3692, over 4104359.67 frames. ], batch size: 63, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:51:07,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-18 07:51:07,770 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.196e+02 2.356e+02 2.510e+02 3.523e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-18 07:51:27,860 INFO [train.py:1198] (0/2) Epoch 41, batch 2150, loss[loss=0.2067, ctc_loss=0.1336, cr_loss=0.3653, over 20870.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1449, cr_loss=0.3705, over 4100693.73 frames. ], batch size: 54, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:51:46,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=730416.3333333334, ans=0.0 2024-09-18 07:51:47,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=730416.3333333334, ans=0.125 2024-09-18 07:52:30,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-18 07:52:43,743 INFO [train.py:1198] (0/2) Epoch 41, batch 2200, loss[loss=0.1809, ctc_loss=0.1171, cr_loss=0.3192, over 20948.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1459, cr_loss=0.3724, over 4089913.96 frames. ], batch size: 50, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:53:06,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=730558.0, ans=0.125 2024-09-18 07:53:24,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=730586.3333333334, ans=0.125 2024-09-18 07:53:36,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=730614.6666666666, ans=0.0 2024-09-18 07:53:39,181 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.235e+02 2.379e+02 2.572e+02 3.091e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 07:53:41,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=730614.6666666666, ans=0.125 2024-09-18 07:53:45,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=730643.0, ans=0.2 2024-09-18 07:53:46,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-18 07:53:49,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=12.0 2024-09-18 07:53:59,091 INFO [train.py:1198] (0/2) Epoch 41, batch 2250, loss[loss=0.2079, ctc_loss=0.1358, cr_loss=0.3606, over 21024.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1458, cr_loss=0.3721, over 4083020.10 frames. ], batch size: 61, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:54:04,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=730671.3333333334, ans=0.125 2024-09-18 07:54:34,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=730728.0, ans=0.125 2024-09-18 07:55:07,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=730784.6666666666, ans=0.0 2024-09-18 07:55:17,677 INFO [train.py:1198] (0/2) Epoch 41, batch 2300, loss[loss=0.2221, ctc_loss=0.1476, cr_loss=0.3725, over 21001.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1461, cr_loss=0.3726, over 4080260.10 frames. ], batch size: 63, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:55:20,951 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:55:37,061 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=22.5 2024-09-18 07:55:45,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=730841.3333333334, ans=0.0 2024-09-18 07:55:47,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=730869.6666666666, ans=0.1 2024-09-18 07:56:00,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=730869.6666666666, ans=0.125 2024-09-18 07:56:02,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=730898.0, ans=0.125 2024-09-18 07:56:13,444 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.244e+02 2.374e+02 2.504e+02 4.440e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-18 07:56:16,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=730926.3333333334, ans=0.0 2024-09-18 07:56:35,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=12.0 2024-09-18 07:56:36,258 INFO [train.py:1198] (0/2) Epoch 41, batch 2350, loss[loss=0.2256, ctc_loss=0.1448, cr_loss=0.4038, over 20980.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.146, cr_loss=0.3727, over 4077741.22 frames. ], batch size: 55, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 07:57:15,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=731011.3333333334, ans=0.125 2024-09-18 07:57:26,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=731039.6666666666, ans=0.125 2024-09-18 07:57:35,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=731068.0, ans=0.0 2024-09-18 07:57:38,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-09-18 07:57:49,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=731068.0, ans=0.125 2024-09-18 07:57:51,927 INFO [train.py:1198] (0/2) Epoch 41, batch 2400, loss[loss=0.266, ctc_loss=0.1872, cr_loss=0.394, over 13978.00 frames. ], tot_loss[loss=0.2212, ctc_loss=0.1466, cr_loss=0.3731, over 4076187.33 frames. ], batch size: 149, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 07:58:26,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=731153.0, ans=0.125 2024-09-18 07:58:47,903 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.052e+02 2.276e+02 2.402e+02 2.528e+02 4.233e+02, threshold=4.803e+02, percent-clipped=0.0 2024-09-18 07:58:58,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=731209.6666666666, ans=0.0 2024-09-18 07:59:04,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=731209.6666666666, ans=0.0 2024-09-18 07:59:07,550 INFO [train.py:1198] (0/2) Epoch 41, batch 2450, loss[loss=0.2479, ctc_loss=0.1681, cr_loss=0.3987, over 20277.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1469, cr_loss=0.3734, over 4088289.33 frames. ], batch size: 74, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 07:59:09,698 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2024-09-18 07:59:17,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=731238.0, ans=0.0 2024-09-18 07:59:36,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=731294.6666666666, ans=0.125 2024-09-18 07:59:42,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=731294.6666666666, ans=0.0 2024-09-18 08:00:06,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=731323.0, ans=0.0 2024-09-18 08:00:24,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-09-18 08:00:26,442 INFO [train.py:1198] (0/2) Epoch 41, batch 2500, loss[loss=0.2405, ctc_loss=0.16, cr_loss=0.4026, over 20976.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1467, cr_loss=0.3731, over 4089734.85 frames. ], batch size: 58, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:00:40,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=731408.0, ans=0.125 2024-09-18 08:01:02,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.20 vs. limit=15.0 2024-09-18 08:01:22,572 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.207e+02 2.324e+02 2.475e+02 4.356e+02, threshold=4.647e+02, percent-clipped=0.0 2024-09-18 08:01:24,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=731464.6666666666, ans=0.2 2024-09-18 08:01:32,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=15.0 2024-09-18 08:01:42,288 INFO [train.py:1198] (0/2) Epoch 41, batch 2550, loss[loss=0.215, ctc_loss=0.1433, cr_loss=0.3586, over 20883.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1463, cr_loss=0.3722, over 4092975.38 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:01:59,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2024-09-18 08:02:16,102 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2024-09-18 08:02:29,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=731606.3333333334, ans=0.2 2024-09-18 08:02:38,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=731606.3333333334, ans=0.125 2024-09-18 08:03:00,796 INFO [train.py:1198] (0/2) Epoch 41, batch 2600, loss[loss=0.206, ctc_loss=0.1331, cr_loss=0.3648, over 21071.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1459, cr_loss=0.3714, over 4097812.49 frames. ], batch size: 53, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:03:48,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=731748.0, ans=0.0 2024-09-18 08:03:57,071 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.195e+02 2.353e+02 2.539e+02 4.235e+02, threshold=4.706e+02, percent-clipped=0.0 2024-09-18 08:03:59,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=731748.0, ans=0.2 2024-09-18 08:04:16,638 INFO [train.py:1198] (0/2) Epoch 41, batch 2650, loss[loss=0.2135, ctc_loss=0.1415, cr_loss=0.36, over 21057.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1459, cr_loss=0.372, over 4106219.35 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:04:30,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=731833.0, ans=0.0 2024-09-18 08:05:20,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=731918.0, ans=0.0 2024-09-18 08:05:34,593 INFO [train.py:1198] (0/2) Epoch 41, batch 2700, loss[loss=0.2009, ctc_loss=0.1322, cr_loss=0.3434, over 20861.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1463, cr_loss=0.373, over 4103114.64 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:05:40,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=731946.3333333334, ans=0.0 2024-09-18 08:06:26,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732031.3333333334, ans=0.1 2024-09-18 08:06:30,440 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.229e+02 2.379e+02 2.585e+02 3.513e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 08:06:38,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=732059.6666666666, ans=0.0 2024-09-18 08:06:50,122 INFO [train.py:1198] (0/2) Epoch 41, batch 2750, loss[loss=0.2254, ctc_loss=0.1515, cr_loss=0.3697, over 20794.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1456, cr_loss=0.3721, over 4100062.60 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:06:53,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=732088.0, ans=0.0 2024-09-18 08:07:01,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2024-09-18 08:07:41,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=732173.0, ans=0.0 2024-09-18 08:07:50,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.07 vs. limit=10.0 2024-09-18 08:07:53,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=732201.3333333334, ans=0.0 2024-09-18 08:08:08,406 INFO [train.py:1198] (0/2) Epoch 41, batch 2800, loss[loss=0.2179, ctc_loss=0.1427, cr_loss=0.3763, over 20888.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1458, cr_loss=0.3728, over 4099048.35 frames. ], batch size: 54, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:08:48,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=732286.3333333334, ans=0.125 2024-09-18 08:09:02,733 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.20 vs. limit=15.0 2024-09-18 08:09:04,742 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.212e+02 2.345e+02 2.470e+02 3.194e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-18 08:09:13,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=732343.0, ans=0.0 2024-09-18 08:09:14,574 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-18 08:09:24,319 INFO [train.py:1198] (0/2) Epoch 41, batch 2850, loss[loss=0.2018, ctc_loss=0.1314, cr_loss=0.352, over 20759.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1454, cr_loss=0.3715, over 4107003.80 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:09:44,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=732399.6666666666, ans=0.05 2024-09-18 08:10:05,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=15.0 2024-09-18 08:10:06,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=732428.0, ans=0.0 2024-09-18 08:10:16,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=732456.3333333334, ans=0.125 2024-09-18 08:10:25,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=732484.6666666666, ans=0.0 2024-09-18 08:10:40,672 INFO [train.py:1198] (0/2) Epoch 41, batch 2900, loss[loss=0.2325, ctc_loss=0.153, cr_loss=0.3974, over 20969.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1454, cr_loss=0.3711, over 4102775.36 frames. ], batch size: 55, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:11:39,319 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.024e+02 2.249e+02 2.391e+02 2.541e+02 3.899e+02, threshold=4.782e+02, percent-clipped=0.0 2024-09-18 08:11:57,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=732654.6666666666, ans=0.125 2024-09-18 08:11:59,082 INFO [train.py:1198] (0/2) Epoch 41, batch 2950, loss[loss=0.2408, ctc_loss=0.1583, cr_loss=0.4125, over 20674.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3717, over 4099256.23 frames. ], batch size: 71, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:12:14,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=732683.0, ans=0.0 2024-09-18 08:12:23,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=732683.0, ans=0.125 2024-09-18 08:13:00,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.53 vs. limit=15.0 2024-09-18 08:13:14,610 INFO [train.py:1198] (0/2) Epoch 41, batch 3000, loss[loss=0.1934, ctc_loss=0.129, cr_loss=0.322, over 20941.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1453, cr_loss=0.3713, over 4101260.53 frames. ], batch size: 50, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:13:14,611 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 08:13:34,552 INFO [train.py:1230] (0/2) Epoch 41, validation: loss=0.04006, ctc_loss=0.04006, cr_loss=1.437e-14, over 944034.00 frames. 2024-09-18 08:13:34,553 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 08:13:46,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=732796.3333333334, ans=0.125 2024-09-18 08:13:52,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=732824.6666666666, ans=0.125 2024-09-18 08:13:54,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=732824.6666666666, ans=0.125 2024-09-18 08:13:54,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-09-18 08:14:06,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=732853.0, ans=0.125 2024-09-18 08:14:29,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=732881.3333333334, ans=0.125 2024-09-18 08:14:30,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2024-09-18 08:14:30,488 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.190e+02 2.363e+02 2.508e+02 3.386e+02, threshold=4.726e+02, percent-clipped=0.0 2024-09-18 08:14:50,274 INFO [train.py:1198] (0/2) Epoch 41, batch 3050, loss[loss=0.2504, ctc_loss=0.1699, cr_loss=0.4027, over 21010.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1461, cr_loss=0.3727, over 4096085.12 frames. ], batch size: 63, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:15:05,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=732966.3333333334, ans=0.125 2024-09-18 08:16:00,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-18 08:16:05,139 INFO [train.py:1198] (0/2) Epoch 41, batch 3100, loss[loss=0.2268, ctc_loss=0.1469, cr_loss=0.3996, over 20945.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1459, cr_loss=0.3721, over 4079705.59 frames. ], batch size: 60, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:16:37,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=733136.3333333334, ans=0.2 2024-09-18 08:16:43,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733136.3333333334, ans=0.1 2024-09-18 08:16:58,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=733164.6666666666, ans=0.015 2024-09-18 08:17:04,388 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.250e+02 2.387e+02 2.580e+02 3.720e+02, threshold=4.773e+02, percent-clipped=0.0 2024-09-18 08:17:16,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=733193.0, ans=0.125 2024-09-18 08:17:16,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=733193.0, ans=0.025 2024-09-18 08:17:24,151 INFO [train.py:1198] (0/2) Epoch 41, batch 3150, loss[loss=0.2112, ctc_loss=0.1395, cr_loss=0.3587, over 20781.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1447, cr_loss=0.3704, over 4084564.21 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:17:31,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=733221.3333333334, ans=0.2 2024-09-18 08:17:41,065 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=91.90 vs. limit=15.0 2024-09-18 08:18:30,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=733334.6666666666, ans=0.0 2024-09-18 08:18:40,451 INFO [train.py:1198] (0/2) Epoch 41, batch 3200, loss[loss=0.2244, ctc_loss=0.151, cr_loss=0.3671, over 20699.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1449, cr_loss=0.3711, over 4096366.90 frames. ], batch size: 68, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:18:40,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=733363.0, ans=0.0 2024-09-18 08:18:43,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=733363.0, ans=0.1 2024-09-18 08:18:51,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-18 08:19:32,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=733448.0, ans=0.125 2024-09-18 08:19:39,367 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.245e+02 2.375e+02 2.572e+02 3.384e+02, threshold=4.749e+02, percent-clipped=0.0 2024-09-18 08:19:58,750 INFO [train.py:1198] (0/2) Epoch 41, batch 3250, loss[loss=0.2261, ctc_loss=0.1507, cr_loss=0.3773, over 20858.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1455, cr_loss=0.3715, over 4074942.38 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:20:19,161 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.01 vs. limit=10.0 2024-09-18 08:20:22,748 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:20:27,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=733561.3333333334, ans=0.1 2024-09-18 08:20:36,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=12.0 2024-09-18 08:20:46,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=733589.6666666666, ans=0.0 2024-09-18 08:21:03,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=733618.0, ans=0.025 2024-09-18 08:21:13,792 INFO [train.py:1198] (0/2) Epoch 41, batch 3300, loss[loss=0.2211, ctc_loss=0.1467, cr_loss=0.3719, over 20116.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1459, cr_loss=0.3725, over 4076770.15 frames. ], batch size: 80, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:21:17,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=733646.3333333334, ans=0.0 2024-09-18 08:21:46,346 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-09-18 08:21:52,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=733703.0, ans=0.2 2024-09-18 08:22:09,582 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.252e+02 2.363e+02 2.555e+02 4.123e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-18 08:22:17,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=733759.6666666666, ans=0.125 2024-09-18 08:22:23,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=733759.6666666666, ans=0.025 2024-09-18 08:22:30,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=733788.0, ans=0.125 2024-09-18 08:22:32,092 INFO [train.py:1198] (0/2) Epoch 41, batch 3350, loss[loss=0.187, ctc_loss=0.1242, cr_loss=0.3139, over 21048.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1452, cr_loss=0.371, over 4090572.85 frames. ], batch size: 53, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:23:37,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=733901.3333333334, ans=0.2 2024-09-18 08:23:47,530 INFO [train.py:1198] (0/2) Epoch 41, batch 3400, loss[loss=0.2138, ctc_loss=0.1413, cr_loss=0.3626, over 20877.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3706, over 4094537.24 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:23:52,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=733929.6666666666, ans=0.04949747468305833 2024-09-18 08:24:06,443 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-09-18 08:24:34,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=734014.6666666666, ans=0.125 2024-09-18 08:24:45,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=734014.6666666666, ans=0.0 2024-09-18 08:24:46,634 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.201e+02 2.323e+02 2.492e+02 4.629e+02, threshold=4.646e+02, percent-clipped=0.0 2024-09-18 08:25:06,187 INFO [train.py:1198] (0/2) Epoch 41, batch 3450, loss[loss=0.272, ctc_loss=0.183, cr_loss=0.4451, over 20661.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1455, cr_loss=0.3712, over 4100577.80 frames. ], batch size: 66, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:25:53,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=734156.3333333334, ans=0.0 2024-09-18 08:26:21,747 INFO [train.py:1198] (0/2) Epoch 41, batch 3500, loss[loss=0.1926, ctc_loss=0.1235, cr_loss=0.3456, over 20993.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3708, over 4098289.87 frames. ], batch size: 48, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:26:27,393 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.91 vs. limit=5.0 2024-09-18 08:26:55,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=734269.6666666666, ans=0.025 2024-09-18 08:27:01,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=734269.6666666666, ans=0.0 2024-09-18 08:27:09,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734298.0, ans=0.1 2024-09-18 08:27:17,586 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.242e+02 2.384e+02 2.510e+02 3.270e+02, threshold=4.768e+02, percent-clipped=0.0 2024-09-18 08:27:30,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=734326.3333333334, ans=0.2 2024-09-18 08:27:34,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=734326.3333333334, ans=0.125 2024-09-18 08:27:37,446 INFO [train.py:1198] (0/2) Epoch 41, batch 3550, loss[loss=0.2366, ctc_loss=0.157, cr_loss=0.3978, over 20953.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3695, over 4105097.70 frames. ], batch size: 64, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:28:43,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734468.0, ans=0.1 2024-09-18 08:28:46,948 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:28:51,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=734468.0, ans=0.125 2024-09-18 08:28:55,514 INFO [train.py:1198] (0/2) Epoch 41, batch 3600, loss[loss=0.2529, ctc_loss=0.1739, cr_loss=0.3953, over 14194.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1442, cr_loss=0.3685, over 4078784.76 frames. ], batch size: 149, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:28:56,140 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=12.0 2024-09-18 08:29:00,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=734496.3333333334, ans=0.125 2024-09-18 08:29:01,936 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:29:03,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734496.3333333334, ans=0.1 2024-09-18 08:29:04,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=734496.3333333334, ans=0.125 2024-09-18 08:29:23,018 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:29:31,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-18 08:29:36,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=734553.0, ans=0.0 2024-09-18 08:29:37,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734553.0, ans=0.1 2024-09-18 08:29:45,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=734581.3333333334, ans=0.95 2024-09-18 08:29:52,642 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.227e+02 2.331e+02 2.551e+02 3.365e+02, threshold=4.662e+02, percent-clipped=0.0 2024-09-18 08:30:00,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-09-18 08:30:10,807 INFO [train.py:1198] (0/2) Epoch 41, batch 3650, loss[loss=0.2019, ctc_loss=0.1314, cr_loss=0.3526, over 21072.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1443, cr_loss=0.3688, over 4089348.11 frames. ], batch size: 53, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:30:21,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=734638.0, ans=0.125 2024-09-18 08:30:27,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734666.3333333334, ans=0.1 2024-09-18 08:30:29,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=734666.3333333334, ans=0.2 2024-09-18 08:31:08,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=734723.0, ans=0.0 2024-09-18 08:31:25,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=734751.3333333334, ans=0.125 2024-09-18 08:31:29,470 INFO [train.py:1198] (0/2) Epoch 41, batch 3700, loss[loss=0.2168, ctc_loss=0.1439, cr_loss=0.3649, over 20889.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1443, cr_loss=0.3682, over 4094112.48 frames. ], batch size: 57, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:31:58,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734836.3333333334, ans=0.1 2024-09-18 08:31:58,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=22.5 2024-09-18 08:32:13,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=734864.6666666666, ans=0.0 2024-09-18 08:32:22,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=734864.6666666666, ans=0.0 2024-09-18 08:32:26,934 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.259e+02 2.355e+02 2.544e+02 4.484e+02, threshold=4.710e+02, percent-clipped=0.0 2024-09-18 08:32:28,866 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:32:45,199 INFO [train.py:1198] (0/2) Epoch 41, batch 3750, loss[loss=0.2549, ctc_loss=0.1701, cr_loss=0.4237, over 20859.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1445, cr_loss=0.3688, over 4096310.78 frames. ], batch size: 65, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:33:18,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=734978.0, ans=0.2 2024-09-18 08:33:37,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=735006.3333333334, ans=0.125 2024-09-18 08:34:01,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=735034.6666666666, ans=0.125 2024-09-18 08:34:04,064 INFO [train.py:1198] (0/2) Epoch 41, batch 3800, loss[loss=0.2406, ctc_loss=0.1651, cr_loss=0.3775, over 14077.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3686, over 4088226.95 frames. ], batch size: 149, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:34:58,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2024-09-18 08:35:01,308 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.236e+02 2.355e+02 2.550e+02 3.423e+02, threshold=4.710e+02, percent-clipped=0.0 2024-09-18 08:35:06,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=735176.3333333334, ans=0.0 2024-09-18 08:35:19,264 INFO [train.py:1198] (0/2) Epoch 41, batch 3850, loss[loss=0.1947, ctc_loss=0.1266, cr_loss=0.3401, over 20989.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1445, cr_loss=0.3687, over 4080987.78 frames. ], batch size: 52, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:35:24,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=735204.6666666666, ans=0.125 2024-09-18 08:35:47,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2024-09-18 08:35:51,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735261.3333333334, ans=0.1 2024-09-18 08:35:57,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=735261.3333333334, ans=0.2 2024-09-18 08:36:03,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=735261.3333333334, ans=0.0 2024-09-18 08:36:06,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=735289.6666666666, ans=0.05 2024-09-18 08:36:08,337 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-09-18 08:36:27,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735318.0, ans=0.1 2024-09-18 08:36:37,948 INFO [train.py:1198] (0/2) Epoch 41, batch 3900, loss[loss=0.1948, ctc_loss=0.1274, cr_loss=0.337, over 20773.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.3686, over 4090910.48 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:37:35,566 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.250e+02 2.371e+02 2.535e+02 8.170e+02, threshold=4.742e+02, percent-clipped=2.0 2024-09-18 08:37:46,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=735459.6666666666, ans=0.125 2024-09-18 08:37:49,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=735459.6666666666, ans=0.125 2024-09-18 08:37:53,734 INFO [train.py:1198] (0/2) Epoch 41, batch 3950, loss[loss=0.1796, ctc_loss=0.1164, cr_loss=0.3159, over 20941.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1438, cr_loss=0.3682, over 4097665.42 frames. ], batch size: 50, lr: 2.01e-03, grad_scale: 16.0 2024-09-18 08:38:25,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=735544.6666666666, ans=0.125 2024-09-18 08:38:37,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=735573.0, ans=0.125 2024-09-18 08:39:11,576 INFO [train.py:1198] (0/2) Epoch 41, batch 4000, loss[loss=0.2172, ctc_loss=0.1426, cr_loss=0.3731, over 21056.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1451, cr_loss=0.3697, over 4080159.18 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:39:13,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-18 08:39:30,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=22.5 2024-09-18 08:39:36,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=735658.0, ans=0.125 2024-09-18 08:39:40,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.30 vs. limit=15.0 2024-09-18 08:40:09,600 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.255e+02 2.383e+02 2.587e+02 3.614e+02, threshold=4.765e+02, percent-clipped=0.0 2024-09-18 08:40:10,490 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2024-09-18 08:40:20,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=735743.0, ans=0.125 2024-09-18 08:40:26,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=735771.3333333334, ans=0.125 2024-09-18 08:40:28,122 INFO [train.py:1198] (0/2) Epoch 41, batch 4050, loss[loss=0.1941, ctc_loss=0.1258, cr_loss=0.3417, over 20867.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3696, over 4094125.96 frames. ], batch size: 54, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:40:42,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=735799.6666666666, ans=0.0 2024-09-18 08:41:10,724 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:41:13,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=735856.3333333334, ans=0.125 2024-09-18 08:41:40,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=735884.6666666666, ans=0.0 2024-09-18 08:41:43,246 INFO [train.py:1198] (0/2) Epoch 41, batch 4100, loss[loss=0.254, ctc_loss=0.173, cr_loss=0.4051, over 19907.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1449, cr_loss=0.3704, over 4105237.34 frames. ], batch size: 80, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:41:48,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=735913.0, ans=0.125 2024-09-18 08:41:49,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2024-09-18 08:42:02,027 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=22.5 2024-09-18 08:42:16,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=735969.6666666666, ans=0.125 2024-09-18 08:42:24,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=735969.6666666666, ans=0.125 2024-09-18 08:42:28,951 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:42:43,511 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.220e+02 2.353e+02 2.500e+02 3.228e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-18 08:42:45,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=736026.3333333334, ans=0.0 2024-09-18 08:42:55,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=736026.3333333334, ans=0.0 2024-09-18 08:43:01,523 INFO [train.py:1198] (0/2) Epoch 41, batch 4150, loss[loss=0.1995, ctc_loss=0.1277, cr_loss=0.3595, over 20787.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1454, cr_loss=0.3718, over 4100606.80 frames. ], batch size: 53, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:43:09,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=736054.6666666666, ans=10.0 2024-09-18 08:43:14,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=736054.6666666666, ans=0.0 2024-09-18 08:43:19,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736083.0, ans=0.1 2024-09-18 08:43:35,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=736111.3333333334, ans=0.125 2024-09-18 08:43:46,022 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=12.0 2024-09-18 08:44:17,035 INFO [train.py:1198] (0/2) Epoch 41, batch 4200, loss[loss=0.1905, ctc_loss=0.1232, cr_loss=0.3364, over 21003.00 frames. ], tot_loss[loss=0.221, ctc_loss=0.1465, cr_loss=0.3726, over 4078329.55 frames. ], batch size: 48, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:45:17,480 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.249e+02 2.413e+02 2.618e+02 4.722e+02, threshold=4.827e+02, percent-clipped=1.0 2024-09-18 08:45:35,306 INFO [train.py:1198] (0/2) Epoch 41, batch 4250, loss[loss=0.2096, ctc_loss=0.139, cr_loss=0.3528, over 20777.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1466, cr_loss=0.3725, over 4076658.11 frames. ], batch size: 56, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:45:53,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=736366.3333333334, ans=0.025 2024-09-18 08:46:21,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=736423.0, ans=0.125 2024-09-18 08:46:25,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=736423.0, ans=0.125 2024-09-18 08:46:28,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=736423.0, ans=0.125 2024-09-18 08:46:31,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=736423.0, ans=0.125 2024-09-18 08:46:44,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=736451.3333333334, ans=0.125 2024-09-18 08:46:51,653 INFO [train.py:1198] (0/2) Epoch 41, batch 4300, loss[loss=0.2278, ctc_loss=0.1489, cr_loss=0.3942, over 21017.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1461, cr_loss=0.3719, over 4070876.96 frames. ], batch size: 61, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:46:56,854 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=15.0 2024-09-18 08:47:02,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736479.6666666666, ans=0.1 2024-09-18 08:47:03,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=736479.6666666666, ans=0.125 2024-09-18 08:47:11,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=736508.0, ans=0.125 2024-09-18 08:47:40,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.39 vs. limit=12.0 2024-09-18 08:47:51,328 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.275e+02 2.403e+02 2.606e+02 4.181e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-18 08:47:56,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736593.0, ans=0.1 2024-09-18 08:47:57,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=736593.0, ans=0.09899494936611666 2024-09-18 08:48:05,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=736593.0, ans=0.125 2024-09-18 08:48:09,312 INFO [train.py:1198] (0/2) Epoch 41, batch 4350, loss[loss=0.2331, ctc_loss=0.1537, cr_loss=0.3968, over 21011.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1462, cr_loss=0.3722, over 4055574.59 frames. ], batch size: 61, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:48:14,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736621.3333333334, ans=0.1 2024-09-18 08:48:32,099 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-260000.pt 2024-09-18 08:48:51,488 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=22.5 2024-09-18 08:49:01,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=736706.3333333334, ans=0.125 2024-09-18 08:49:03,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=736706.3333333334, ans=0.015 2024-09-18 08:49:24,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=736763.0, ans=0.035 2024-09-18 08:49:25,571 INFO [train.py:1198] (0/2) Epoch 41, batch 4400, loss[loss=0.1858, ctc_loss=0.1217, cr_loss=0.3205, over 20950.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1461, cr_loss=0.3722, over 4069262.33 frames. ], batch size: 48, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:49:36,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2024-09-18 08:49:48,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=736791.3333333334, ans=0.1 2024-09-18 08:50:22,732 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.223e+02 2.365e+02 2.507e+02 2.832e+02, threshold=4.729e+02, percent-clipped=0.0 2024-09-18 08:50:43,951 INFO [train.py:1198] (0/2) Epoch 41, batch 4450, loss[loss=0.241, ctc_loss=0.16, cr_loss=0.4047, over 20701.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1459, cr_loss=0.3725, over 4075532.82 frames. ], batch size: 71, lr: 2.01e-03, grad_scale: 32.0 2024-09-18 08:50:53,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=736904.6666666666, ans=0.0 2024-09-18 08:51:28,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736989.6666666666, ans=0.1 2024-09-18 08:51:28,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=736989.6666666666, ans=0.0 2024-09-18 08:51:50,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=737018.0, ans=0.0 2024-09-18 08:51:59,729 INFO [train.py:1198] (0/2) Epoch 41, batch 4500, loss[loss=0.2015, ctc_loss=0.132, cr_loss=0.3474, over 20960.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1457, cr_loss=0.3718, over 4085944.91 frames. ], batch size: 52, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:52:42,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=737103.0, ans=0.125 2024-09-18 08:52:58,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.193e+02 2.308e+02 2.500e+02 3.939e+02, threshold=4.615e+02, percent-clipped=0.0 2024-09-18 08:53:07,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=737159.6666666666, ans=0.125 2024-09-18 08:53:17,940 INFO [train.py:1198] (0/2) Epoch 41, batch 4550, loss[loss=0.2244, ctc_loss=0.1488, cr_loss=0.3778, over 19525.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1454, cr_loss=0.3709, over 4096662.53 frames. ], batch size: 90, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:53:59,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737244.6666666666, ans=0.1 2024-09-18 08:54:25,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=737301.3333333334, ans=0.025 2024-09-18 08:54:32,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=737329.6666666666, ans=0.125 2024-09-18 08:54:33,840 INFO [train.py:1198] (0/2) Epoch 41, batch 4600, loss[loss=0.2108, ctc_loss=0.14, cr_loss=0.3544, over 20971.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.147, cr_loss=0.3731, over 4088996.95 frames. ], batch size: 58, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:54:37,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=737329.6666666666, ans=0.125 2024-09-18 08:54:48,296 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:54:48,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.57 vs. limit=10.0 2024-09-18 08:54:51,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=737358.0, ans=0.025 2024-09-18 08:55:09,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737386.3333333334, ans=0.1 2024-09-18 08:55:12,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=737386.3333333334, ans=0.125 2024-09-18 08:55:17,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=737386.3333333334, ans=0.125 2024-09-18 08:55:17,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=737386.3333333334, ans=0.2 2024-09-18 08:55:33,377 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.274e+02 2.413e+02 2.588e+02 3.283e+02, threshold=4.827e+02, percent-clipped=0.0 2024-09-18 08:55:50,104 INFO [train.py:1198] (0/2) Epoch 41, batch 4650, loss[loss=0.2354, ctc_loss=0.1578, cr_loss=0.388, over 20513.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1467, cr_loss=0.373, over 4097065.73 frames. ], batch size: 75, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:56:16,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=737499.6666666666, ans=0.025 2024-09-18 08:56:52,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-09-18 08:57:08,543 INFO [train.py:1198] (0/2) Epoch 41, batch 4700, loss[loss=0.2098, ctc_loss=0.1415, cr_loss=0.3418, over 20979.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1449, cr_loss=0.3702, over 4109725.90 frames. ], batch size: 55, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:58:07,068 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.188e+02 2.340e+02 2.521e+02 3.459e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-18 08:58:23,715 INFO [train.py:1198] (0/2) Epoch 41, batch 4750, loss[loss=0.2506, ctc_loss=0.1696, cr_loss=0.405, over 19386.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3703, over 4108812.64 frames. ], batch size: 90, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 08:58:42,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=737783.0, ans=0.0 2024-09-18 08:58:43,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=737783.0, ans=0.07 2024-09-18 08:58:49,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737783.0, ans=0.1 2024-09-18 08:59:06,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=737811.3333333334, ans=0.025 2024-09-18 08:59:09,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=737811.3333333334, ans=0.0 2024-09-18 08:59:17,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737839.6666666666, ans=0.1 2024-09-18 08:59:42,543 INFO [train.py:1198] (0/2) Epoch 41, batch 4800, loss[loss=0.1992, ctc_loss=0.1305, cr_loss=0.3432, over 20940.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.145, cr_loss=0.3702, over 4102932.42 frames. ], batch size: 50, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 08:59:45,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=737896.3333333334, ans=0.125 2024-09-18 09:00:02,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=737924.6666666666, ans=10.0 2024-09-18 09:00:05,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=737924.6666666666, ans=0.125 2024-09-18 09:00:22,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=737953.0, ans=0.0 2024-09-18 09:00:41,704 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.879e+02 2.225e+02 2.346e+02 2.523e+02 3.172e+02, threshold=4.693e+02, percent-clipped=0.0 2024-09-18 09:00:58,186 INFO [train.py:1198] (0/2) Epoch 41, batch 4850, loss[loss=0.22, ctc_loss=0.146, cr_loss=0.3701, over 20847.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1451, cr_loss=0.3701, over 4098227.53 frames. ], batch size: 65, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:01:28,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=738094.6666666666, ans=0.125 2024-09-18 09:01:38,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=738094.6666666666, ans=0.125 2024-09-18 09:01:59,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=738151.3333333334, ans=0.125 2024-09-18 09:02:15,553 INFO [train.py:1198] (0/2) Epoch 41, batch 4900, loss[loss=0.2268, ctc_loss=0.1515, cr_loss=0.3768, over 20888.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1456, cr_loss=0.3703, over 4093018.08 frames. ], batch size: 54, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:02:21,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=738179.6666666666, ans=0.0 2024-09-18 09:02:21,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=738179.6666666666, ans=0.0 2024-09-18 09:02:35,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=738208.0, ans=0.0 2024-09-18 09:02:41,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=738208.0, ans=0.125 2024-09-18 09:02:59,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=738264.6666666666, ans=0.125 2024-09-18 09:03:13,732 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.274e+02 2.389e+02 2.557e+02 4.192e+02, threshold=4.778e+02, percent-clipped=0.0 2024-09-18 09:03:30,172 INFO [train.py:1198] (0/2) Epoch 41, batch 4950, loss[loss=0.2109, ctc_loss=0.1363, cr_loss=0.3731, over 20973.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.3689, over 4092968.93 frames. ], batch size: 48, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:03:54,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738349.6666666666, ans=0.1 2024-09-18 09:03:58,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=738378.0, ans=0.025 2024-09-18 09:04:00,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=738378.0, ans=0.125 2024-09-18 09:04:06,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-09-18 09:04:44,870 INFO [train.py:1198] (0/2) Epoch 41, batch 5000, loss[loss=0.2147, ctc_loss=0.1411, cr_loss=0.3679, over 20885.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1441, cr_loss=0.3685, over 4097553.34 frames. ], batch size: 54, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:05:07,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=738491.3333333334, ans=0.0 2024-09-18 09:05:09,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=738491.3333333334, ans=0.025 2024-09-18 09:05:37,364 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-09-18 09:05:38,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=738548.0, ans=0.0 2024-09-18 09:05:42,756 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.875e+02 2.229e+02 2.329e+02 2.482e+02 3.017e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-18 09:05:46,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=738576.3333333334, ans=0.2 2024-09-18 09:05:59,069 INFO [train.py:1198] (0/2) Epoch 41, batch 5050, loss[loss=0.2206, ctc_loss=0.1407, cr_loss=0.3991, over 20886.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1442, cr_loss=0.3688, over 4109610.87 frames. ], batch size: 54, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:06:03,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=738604.6666666666, ans=0.125 2024-09-18 09:06:10,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.97 vs. limit=12.0 2024-09-18 09:06:14,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2024-09-18 09:06:28,318 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=15.0 2024-09-18 09:07:05,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=738718.0, ans=0.0 2024-09-18 09:07:09,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=738718.0, ans=0.0 2024-09-18 09:07:16,867 INFO [train.py:1198] (0/2) Epoch 41, batch 5100, loss[loss=0.2116, ctc_loss=0.1385, cr_loss=0.3654, over 21057.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1434, cr_loss=0.3678, over 4116538.95 frames. ], batch size: 53, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:07:27,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=738746.3333333334, ans=15.0 2024-09-18 09:07:30,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738774.6666666666, ans=0.1 2024-09-18 09:07:33,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=738774.6666666666, ans=0.125 2024-09-18 09:07:47,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=738803.0, ans=0.125 2024-09-18 09:08:14,304 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.236e+02 2.364e+02 2.502e+02 9.245e+02, threshold=4.728e+02, percent-clipped=1.0 2024-09-18 09:08:30,727 INFO [train.py:1198] (0/2) Epoch 41, batch 5150, loss[loss=0.2527, ctc_loss=0.1718, cr_loss=0.4045, over 18097.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1434, cr_loss=0.3682, over 4117730.32 frames. ], batch size: 108, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:09:29,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=739001.3333333334, ans=0.2 2024-09-18 09:09:44,326 INFO [train.py:1198] (0/2) Epoch 41, batch 5200, loss[loss=0.2268, ctc_loss=0.1513, cr_loss=0.3775, over 20932.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.3694, over 4120612.41 frames. ], batch size: 60, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:10:05,548 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2024-09-18 09:10:38,549 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-18 09:10:42,189 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.243e+02 2.374e+02 2.484e+02 3.795e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-18 09:10:55,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=739143.0, ans=0.025 2024-09-18 09:10:58,406 INFO [train.py:1198] (0/2) Epoch 41, batch 5250, loss[loss=0.233, ctc_loss=0.1528, cr_loss=0.401, over 21016.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.3691, over 4125253.86 frames. ], batch size: 61, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:11:15,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=739199.6666666666, ans=0.0 2024-09-18 09:12:00,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=739284.6666666666, ans=6.0 2024-09-18 09:12:14,736 INFO [train.py:1198] (0/2) Epoch 41, batch 5300, loss[loss=0.2565, ctc_loss=0.1736, cr_loss=0.4146, over 20877.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1439, cr_loss=0.3694, over 4124931.51 frames. ], batch size: 65, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:13:12,450 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.236e+02 2.374e+02 2.566e+02 3.810e+02, threshold=4.747e+02, percent-clipped=0.0 2024-09-18 09:13:18,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=739426.3333333334, ans=0.2 2024-09-18 09:13:28,812 INFO [train.py:1198] (0/2) Epoch 41, batch 5350, loss[loss=0.2327, ctc_loss=0.157, cr_loss=0.3785, over 21027.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.3692, over 4124627.31 frames. ], batch size: 62, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:13:33,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=739454.6666666666, ans=0.0 2024-09-18 09:13:41,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=739454.6666666666, ans=0.5 2024-09-18 09:13:42,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=739483.0, ans=0.125 2024-09-18 09:14:08,954 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=12.0 2024-09-18 09:14:35,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=739568.0, ans=0.0 2024-09-18 09:14:43,633 INFO [train.py:1198] (0/2) Epoch 41, batch 5400, loss[loss=0.2453, ctc_loss=0.1713, cr_loss=0.3702, over 14188.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1444, cr_loss=0.3702, over 4114827.50 frames. ], batch size: 149, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 09:14:50,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=739596.3333333334, ans=0.125 2024-09-18 09:15:37,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=739681.3333333334, ans=0.2 2024-09-18 09:15:43,018 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.240e+02 2.378e+02 2.529e+02 4.836e+02, threshold=4.755e+02, percent-clipped=1.0 2024-09-18 09:15:58,165 INFO [train.py:1198] (0/2) Epoch 41, batch 5450, loss[loss=0.2129, ctc_loss=0.1421, cr_loss=0.3537, over 21004.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1432, cr_loss=0.3681, over 4119397.98 frames. ], batch size: 61, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 09:16:17,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=739766.3333333334, ans=0.0 2024-09-18 09:16:19,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=739766.3333333334, ans=0.125 2024-09-18 09:16:42,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=739794.6666666666, ans=0.0 2024-09-18 09:16:51,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=739823.0, ans=0.125 2024-09-18 09:16:58,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739851.3333333334, ans=0.1 2024-09-18 09:17:15,001 INFO [train.py:1198] (0/2) Epoch 41, batch 5500, loss[loss=0.2249, ctc_loss=0.1475, cr_loss=0.3867, over 20691.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.3684, over 4111612.64 frames. ], batch size: 71, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 09:17:39,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-09-18 09:18:02,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=739964.6666666666, ans=0.0 2024-09-18 09:18:14,744 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.213e+02 2.368e+02 2.525e+02 3.505e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-18 09:18:28,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-09-18 09:18:29,463 INFO [train.py:1198] (0/2) Epoch 41, batch 5550, loss[loss=0.2514, ctc_loss=0.167, cr_loss=0.4219, over 20964.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3698, over 4101601.45 frames. ], batch size: 64, lr: 2.00e-03, grad_scale: 16.0 2024-09-18 09:18:34,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=740021.3333333334, ans=0.125 2024-09-18 09:19:43,460 INFO [train.py:1198] (0/2) Epoch 41, batch 5600, loss[loss=0.2532, ctc_loss=0.1723, cr_loss=0.4046, over 18116.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1446, cr_loss=0.3703, over 4088789.02 frames. ], batch size: 108, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:20:00,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=740191.3333333334, ans=0.125 2024-09-18 09:20:06,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=740191.3333333334, ans=0.0 2024-09-18 09:20:11,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-09-18 09:20:12,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=740219.6666666666, ans=0.125 2024-09-18 09:20:15,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=740219.6666666666, ans=0.0 2024-09-18 09:20:45,565 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.191e+02 2.374e+02 2.584e+02 3.616e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-18 09:20:59,961 INFO [train.py:1198] (0/2) Epoch 41, batch 5650, loss[loss=0.2539, ctc_loss=0.1731, cr_loss=0.4042, over 18201.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3703, over 4073361.65 frames. ], batch size: 108, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:21:04,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=740304.6666666666, ans=0.0 2024-09-18 09:21:17,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=740333.0, ans=0.0 2024-09-18 09:21:53,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=740389.6666666666, ans=0.2 2024-09-18 09:22:14,381 INFO [train.py:1198] (0/2) Epoch 41, batch 5700, loss[loss=0.1917, ctc_loss=0.1254, cr_loss=0.3318, over 20249.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.144, cr_loss=0.3681, over 4080969.09 frames. ], batch size: 45, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:23:13,924 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.191e+02 2.352e+02 2.519e+02 5.380e+02, threshold=4.703e+02, percent-clipped=1.0 2024-09-18 09:23:28,657 INFO [train.py:1198] (0/2) Epoch 41, batch 5750, loss[loss=0.1947, ctc_loss=0.1262, cr_loss=0.3426, over 20782.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1434, cr_loss=0.3674, over 4093360.93 frames. ], batch size: 53, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:23:36,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=740588.0, ans=0.0 2024-09-18 09:23:41,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=740616.3333333334, ans=0.125 2024-09-18 09:23:52,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=740616.3333333334, ans=0.09899494936611666 2024-09-18 09:24:04,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=740644.6666666666, ans=0.0 2024-09-18 09:24:23,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=740673.0, ans=0.125 2024-09-18 09:24:38,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=740701.3333333334, ans=0.0 2024-09-18 09:24:38,942 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-09-18 09:24:42,451 INFO [train.py:1198] (0/2) Epoch 41, batch 5800, loss[loss=0.2195, ctc_loss=0.1462, cr_loss=0.3662, over 21089.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1445, cr_loss=0.3693, over 4087849.58 frames. ], batch size: 56, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:25:01,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=740758.0, ans=0.2 2024-09-18 09:25:03,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=740758.0, ans=0.09899494936611666 2024-09-18 09:25:44,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.230e+02 2.360e+02 2.568e+02 4.519e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-18 09:25:59,368 INFO [train.py:1198] (0/2) Epoch 41, batch 5850, loss[loss=0.2104, ctc_loss=0.1396, cr_loss=0.3541, over 20879.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.3692, over 4079327.76 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:26:04,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=740871.3333333334, ans=0.125 2024-09-18 09:26:32,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=22.5 2024-09-18 09:26:47,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-09-18 09:26:58,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=740984.6666666666, ans=0.125 2024-09-18 09:27:01,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=740984.6666666666, ans=0.0 2024-09-18 09:27:13,535 INFO [train.py:1198] (0/2) Epoch 41, batch 5900, loss[loss=0.2187, ctc_loss=0.1476, cr_loss=0.3556, over 20355.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3706, over 4089968.57 frames. ], batch size: 74, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:27:22,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=741013.0, ans=0.2 2024-09-18 09:28:06,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2024-09-18 09:28:13,365 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.246e+02 2.395e+02 2.503e+02 3.331e+02, threshold=4.789e+02, percent-clipped=0.0 2024-09-18 09:28:15,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=741126.3333333334, ans=0.125 2024-09-18 09:28:28,056 INFO [train.py:1198] (0/2) Epoch 41, batch 5950, loss[loss=0.2025, ctc_loss=0.1312, cr_loss=0.3561, over 20965.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1447, cr_loss=0.3704, over 4098670.45 frames. ], batch size: 49, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:28:35,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=741154.6666666666, ans=0.0 2024-09-18 09:29:25,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-18 09:29:44,854 INFO [train.py:1198] (0/2) Epoch 41, batch 6000, loss[loss=0.194, ctc_loss=0.1276, cr_loss=0.3319, over 20884.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1444, cr_loss=0.3697, over 4102481.96 frames. ], batch size: 57, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:29:44,855 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 09:30:04,275 INFO [train.py:1230] (0/2) Epoch 41, validation: loss=0.03985, ctc_loss=0.03985, cr_loss=1.46e-14, over 944034.00 frames. 2024-09-18 09:30:04,276 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 09:30:37,842 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2024-09-18 09:31:04,710 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.233e+02 2.355e+02 2.496e+02 4.074e+02, threshold=4.709e+02, percent-clipped=0.0 2024-09-18 09:31:19,446 INFO [train.py:1198] (0/2) Epoch 41, batch 6050, loss[loss=0.192, ctc_loss=0.1241, cr_loss=0.3399, over 20984.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.3695, over 4100149.21 frames. ], batch size: 50, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:31:28,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=741438.0, ans=0.025 2024-09-18 09:31:52,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=741494.6666666666, ans=0.125 2024-09-18 09:32:00,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=741494.6666666666, ans=0.0 2024-09-18 09:32:02,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=12.0 2024-09-18 09:32:06,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741523.0, ans=0.1 2024-09-18 09:32:10,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=741523.0, ans=0.0 2024-09-18 09:32:11,198 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-09-18 09:32:14,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=12.0 2024-09-18 09:32:27,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=741551.3333333334, ans=0.2 2024-09-18 09:32:34,421 INFO [train.py:1198] (0/2) Epoch 41, batch 6100, loss[loss=0.2042, ctc_loss=0.1364, cr_loss=0.3393, over 21074.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.3689, over 4092685.84 frames. ], batch size: 59, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:32:47,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741579.6666666666, ans=0.1 2024-09-18 09:32:55,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=741608.0, ans=0.2 2024-09-18 09:32:56,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=741608.0, ans=0.125 2024-09-18 09:32:58,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=741608.0, ans=0.0 2024-09-18 09:33:01,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=22.5 2024-09-18 09:33:08,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2024-09-18 09:33:35,000 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.242e+02 2.359e+02 2.548e+02 5.006e+02, threshold=4.719e+02, percent-clipped=1.0 2024-09-18 09:33:36,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=741693.0, ans=0.125 2024-09-18 09:33:38,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=741693.0, ans=0.125 2024-09-18 09:33:48,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=741721.3333333334, ans=0.125 2024-09-18 09:33:49,871 INFO [train.py:1198] (0/2) Epoch 41, batch 6150, loss[loss=0.1755, ctc_loss=0.1124, cr_loss=0.3158, over 20959.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1448, cr_loss=0.3702, over 4087636.11 frames. ], batch size: 48, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:34:02,461 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=22.5 2024-09-18 09:34:45,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-18 09:34:54,423 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:34:57,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=741834.6666666666, ans=0.0 2024-09-18 09:35:04,552 INFO [train.py:1198] (0/2) Epoch 41, batch 6200, loss[loss=0.2396, ctc_loss=0.1594, cr_loss=0.4009, over 20976.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1448, cr_loss=0.37, over 4083347.74 frames. ], batch size: 64, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:35:06,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=741863.0, ans=0.2 2024-09-18 09:35:34,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741919.6666666666, ans=0.1 2024-09-18 09:35:41,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=741919.6666666666, ans=0.025 2024-09-18 09:35:43,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=741919.6666666666, ans=0.2 2024-09-18 09:35:46,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=741919.6666666666, ans=0.2 2024-09-18 09:35:47,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=741948.0, ans=0.0 2024-09-18 09:36:03,926 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.224e+02 2.396e+02 2.713e+02 4.642e+02, threshold=4.792e+02, percent-clipped=0.0 2024-09-18 09:36:07,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=741976.3333333334, ans=0.2 2024-09-18 09:36:10,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2024-09-18 09:36:18,809 INFO [train.py:1198] (0/2) Epoch 41, batch 6250, loss[loss=0.2049, ctc_loss=0.1312, cr_loss=0.3685, over 20137.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1447, cr_loss=0.3692, over 4053856.16 frames. ], batch size: 44, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:37:22,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=742118.0, ans=0.0 2024-09-18 09:37:23,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-18 09:37:34,204 INFO [train.py:1198] (0/2) Epoch 41, batch 6300, loss[loss=0.1837, ctc_loss=0.1178, cr_loss=0.3296, over 20980.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1443, cr_loss=0.3678, over 4025576.91 frames. ], batch size: 48, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:37:38,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=742146.3333333334, ans=0.2 2024-09-18 09:38:03,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=742203.0, ans=0.125 2024-09-18 09:38:07,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=742203.0, ans=0.07 2024-09-18 09:38:22,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=742231.3333333334, ans=0.1 2024-09-18 09:38:31,559 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.330e+02 2.542e+02 2.772e+02 5.785e+02, threshold=5.083e+02, percent-clipped=1.0 2024-09-18 09:38:37,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=742259.6666666666, ans=0.0 2024-09-18 09:38:42,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-09-18 09:38:45,554 INFO [train.py:1198] (0/2) Epoch 41, batch 6350, loss[loss=0.2498, ctc_loss=0.1717, cr_loss=0.3903, over 14301.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1486, cr_loss=0.3705, over 3864026.85 frames. ], batch size: 149, lr: 2.00e-03, grad_scale: 32.0 2024-09-18 09:38:48,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=742288.0, ans=0.025 2024-09-18 09:39:26,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=742344.6666666666, ans=0.125 2024-09-18 09:39:44,028 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-41.pt 2024-09-18 09:40:34,172 INFO [train.py:1198] (0/2) Epoch 42, batch 0, loss[loss=0.2141, ctc_loss=0.1414, cr_loss=0.3632, over 21016.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1414, cr_loss=0.3632, over 21016.00 frames. ], batch size: 61, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:40:34,173 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 09:40:52,717 INFO [train.py:1230] (0/2) Epoch 42, validation: loss=0.03936, ctc_loss=0.03936, cr_loss=1.479e-14, over 944034.00 frames. 2024-09-18 09:40:52,718 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 09:40:56,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=742404.1666666666, ans=0.1 2024-09-18 09:41:14,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=742432.5, ans=0.2 2024-09-18 09:41:14,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=742432.5, ans=0.95 2024-09-18 09:41:16,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=742432.5, ans=0.07 2024-09-18 09:42:06,601 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.356e+02 2.604e+02 2.851e+02 3.491e+02, threshold=5.207e+02, percent-clipped=0.0 2024-09-18 09:42:08,086 INFO [train.py:1198] (0/2) Epoch 42, batch 50, loss[loss=0.2474, ctc_loss=0.1649, cr_loss=0.4122, over 20837.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1458, cr_loss=0.3692, over 918967.69 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:42:32,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=742574.1666666666, ans=0.025 2024-09-18 09:43:13,553 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.06 vs. limit=6.0 2024-09-18 09:43:22,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-09-18 09:43:23,797 INFO [train.py:1198] (0/2) Epoch 42, batch 100, loss[loss=0.199, ctc_loss=0.1285, cr_loss=0.3523, over 21077.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1438, cr_loss=0.367, over 1627915.89 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:44:09,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742772.5, ans=0.1 2024-09-18 09:44:32,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=22.5 2024-09-18 09:44:37,277 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.184e+02 2.334e+02 2.514e+02 3.309e+02, threshold=4.667e+02, percent-clipped=0.0 2024-09-18 09:44:38,764 INFO [train.py:1198] (0/2) Epoch 42, batch 150, loss[loss=0.2284, ctc_loss=0.151, cr_loss=0.3872, over 20701.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1432, cr_loss=0.367, over 2176161.60 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:44:44,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=742829.1666666666, ans=0.125 2024-09-18 09:45:08,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=742857.5, ans=0.125 2024-09-18 09:45:35,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=742914.1666666666, ans=0.025 2024-09-18 09:45:59,221 INFO [train.py:1198] (0/2) Epoch 42, batch 200, loss[loss=0.2329, ctc_loss=0.1606, cr_loss=0.3617, over 14472.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1439, cr_loss=0.3666, over 2589976.81 frames. ], batch size: 149, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:45:59,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.13 vs. limit=22.5 2024-09-18 09:46:41,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=743027.5, ans=0.125 2024-09-18 09:46:42,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=743027.5, ans=0.125 2024-09-18 09:47:01,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=743084.1666666666, ans=0.02 2024-09-18 09:47:13,011 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.237e+02 2.378e+02 2.544e+02 3.361e+02, threshold=4.755e+02, percent-clipped=0.0 2024-09-18 09:47:14,503 INFO [train.py:1198] (0/2) Epoch 42, batch 250, loss[loss=0.2284, ctc_loss=0.1533, cr_loss=0.3759, over 20642.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1439, cr_loss=0.3677, over 2920890.09 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:47:24,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=743112.5, ans=0.125 2024-09-18 09:47:38,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=743140.8333333334, ans=0.0 2024-09-18 09:47:43,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=743169.1666666666, ans=0.025 2024-09-18 09:47:51,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=743169.1666666666, ans=0.125 2024-09-18 09:48:08,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=743197.5, ans=0.0 2024-09-18 09:48:19,295 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-18 09:48:30,787 INFO [train.py:1198] (0/2) Epoch 42, batch 300, loss[loss=0.1815, ctc_loss=0.1196, cr_loss=0.3091, over 20379.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1423, cr_loss=0.3657, over 3184670.65 frames. ], batch size: 45, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:49:44,459 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.829e+02 2.207e+02 2.319e+02 2.513e+02 3.537e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-18 09:49:46,033 INFO [train.py:1198] (0/2) Epoch 42, batch 350, loss[loss=0.207, ctc_loss=0.1349, cr_loss=0.3603, over 20839.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.3697, over 3380972.55 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:50:07,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743424.1666666666, ans=0.1 2024-09-18 09:50:10,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=743424.1666666666, ans=0.125 2024-09-18 09:50:29,968 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-18 09:51:02,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=743509.1666666666, ans=0.125 2024-09-18 09:51:04,994 INFO [train.py:1198] (0/2) Epoch 42, batch 400, loss[loss=0.1988, ctc_loss=0.1297, cr_loss=0.3455, over 20904.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.369, over 3543522.87 frames. ], batch size: 54, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:51:40,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=743594.1666666666, ans=0.125 2024-09-18 09:52:22,951 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.219e+02 2.344e+02 2.535e+02 3.383e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-18 09:52:24,436 INFO [train.py:1198] (0/2) Epoch 42, batch 450, loss[loss=0.2134, ctc_loss=0.1411, cr_loss=0.3616, over 20923.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3683, over 3645123.38 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:52:28,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=743679.1666666666, ans=0.2 2024-09-18 09:52:44,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743707.5, ans=0.1 2024-09-18 09:52:47,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=743707.5, ans=0.125 2024-09-18 09:52:49,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=743707.5, ans=0.125 2024-09-18 09:52:59,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=743735.8333333334, ans=0.025 2024-09-18 09:53:12,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743764.1666666666, ans=0.1 2024-09-18 09:53:35,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=743792.5, ans=0.2 2024-09-18 09:53:40,828 INFO [train.py:1198] (0/2) Epoch 42, batch 500, loss[loss=0.2429, ctc_loss=0.1616, cr_loss=0.4067, over 20862.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1441, cr_loss=0.3692, over 3750248.38 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:54:04,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=743849.1666666666, ans=0.125 2024-09-18 09:54:13,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=743877.5, ans=0.95 2024-09-18 09:54:26,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743905.8333333334, ans=0.1 2024-09-18 09:54:28,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=743905.8333333334, ans=0.125 2024-09-18 09:54:32,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=743905.8333333334, ans=0.125 2024-09-18 09:54:34,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=743905.8333333334, ans=0.2 2024-09-18 09:54:52,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=743934.1666666666, ans=0.0 2024-09-18 09:54:55,134 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.194e+02 2.364e+02 2.460e+02 3.175e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-18 09:54:55,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=743962.5, ans=0.0 2024-09-18 09:54:56,716 INFO [train.py:1198] (0/2) Epoch 42, batch 550, loss[loss=0.2335, ctc_loss=0.1544, cr_loss=0.3956, over 20835.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3683, over 3819825.49 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:55:46,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=744047.5, ans=0.125 2024-09-18 09:55:51,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=744047.5, ans=0.0 2024-09-18 09:56:15,193 INFO [train.py:1198] (0/2) Epoch 42, batch 600, loss[loss=0.2471, ctc_loss=0.1727, cr_loss=0.3721, over 14435.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.143, cr_loss=0.3659, over 3872428.72 frames. ], batch size: 150, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:56:15,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=744104.1666666666, ans=0.125 2024-09-18 09:56:44,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=22.5 2024-09-18 09:57:07,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=744189.1666666666, ans=0.125 2024-09-18 09:57:26,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744217.5, ans=0.125 2024-09-18 09:57:32,834 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.188e+02 2.304e+02 2.442e+02 3.148e+02, threshold=4.608e+02, percent-clipped=0.0 2024-09-18 09:57:34,364 INFO [train.py:1198] (0/2) Epoch 42, batch 650, loss[loss=0.2149, ctc_loss=0.1423, cr_loss=0.363, over 20833.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1417, cr_loss=0.3645, over 3937254.73 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:58:12,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=744302.5, ans=0.0 2024-09-18 09:58:37,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=744359.1666666666, ans=0.125 2024-09-18 09:58:40,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.17 vs. limit=22.5 2024-09-18 09:58:47,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2024-09-18 09:58:50,049 INFO [train.py:1198] (0/2) Epoch 42, batch 700, loss[loss=0.243, ctc_loss=0.1633, cr_loss=0.3988, over 18333.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1425, cr_loss=0.3658, over 3961350.72 frames. ], batch size: 108, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 09:59:39,801 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2024-09-18 10:00:02,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=22.5 2024-09-18 10:00:04,647 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.223e+02 2.320e+02 2.480e+02 3.406e+02, threshold=4.640e+02, percent-clipped=0.0 2024-09-18 10:00:06,220 INFO [train.py:1198] (0/2) Epoch 42, batch 750, loss[loss=0.2199, ctc_loss=0.1441, cr_loss=0.379, over 20780.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1428, cr_loss=0.3667, over 3993319.82 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:00:14,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=744529.1666666666, ans=0.125 2024-09-18 10:00:43,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=15.0 2024-09-18 10:01:21,067 INFO [train.py:1198] (0/2) Epoch 42, batch 800, loss[loss=0.2134, ctc_loss=0.1393, cr_loss=0.3706, over 21052.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1433, cr_loss=0.3669, over 4010700.45 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:01:26,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=744670.8333333334, ans=0.125 2024-09-18 10:01:32,547 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2024-09-18 10:01:38,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=744699.1666666666, ans=0.025 2024-09-18 10:02:03,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=744727.5, ans=0.125 2024-09-18 10:02:27,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=744784.1666666666, ans=0.125 2024-09-18 10:02:37,845 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.174e+02 2.338e+02 2.463e+02 3.185e+02, threshold=4.675e+02, percent-clipped=0.0 2024-09-18 10:02:39,467 INFO [train.py:1198] (0/2) Epoch 42, batch 850, loss[loss=0.2301, ctc_loss=0.1516, cr_loss=0.3927, over 20965.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.144, cr_loss=0.3684, over 4041596.76 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:02:59,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744840.8333333334, ans=0.1 2024-09-18 10:03:40,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=22.5 2024-09-18 10:03:41,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=744925.8333333334, ans=0.0 2024-09-18 10:03:57,399 INFO [train.py:1198] (0/2) Epoch 42, batch 900, loss[loss=0.2376, ctc_loss=0.1581, cr_loss=0.3976, over 20823.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1448, cr_loss=0.3693, over 4054095.70 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:03:59,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=744954.1666666666, ans=0.125 2024-09-18 10:05:04,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=745067.5, ans=0.0 2024-09-18 10:05:11,482 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.215e+02 2.380e+02 2.533e+02 5.796e+02, threshold=4.760e+02, percent-clipped=0.0 2024-09-18 10:05:12,998 INFO [train.py:1198] (0/2) Epoch 42, batch 950, loss[loss=0.2086, ctc_loss=0.1381, cr_loss=0.3524, over 21077.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1448, cr_loss=0.3694, over 4071072.76 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:05:16,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=745095.8333333334, ans=0.125 2024-09-18 10:05:20,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=745095.8333333334, ans=0.125 2024-09-18 10:05:28,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=745124.1666666666, ans=10.0 2024-09-18 10:05:30,221 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.49 vs. limit=10.0 2024-09-18 10:06:28,494 INFO [train.py:1198] (0/2) Epoch 42, batch 1000, loss[loss=0.1871, ctc_loss=0.1219, cr_loss=0.3258, over 20966.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1438, cr_loss=0.368, over 4077428.87 frames. ], batch size: 48, lr: 1.97e-03, grad_scale: 64.0 2024-09-18 10:06:34,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=745237.5, ans=0.025 2024-09-18 10:06:38,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-09-18 10:07:04,765 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-09-18 10:07:10,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=745294.1666666666, ans=0.2 2024-09-18 10:07:13,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=745322.5, ans=0.5 2024-09-18 10:07:14,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=745322.5, ans=0.125 2024-09-18 10:07:17,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=745322.5, ans=10.0 2024-09-18 10:07:44,699 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.247e+02 2.357e+02 2.504e+02 3.558e+02, threshold=4.714e+02, percent-clipped=1.0 2024-09-18 10:07:45,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=745379.1666666666, ans=0.125 2024-09-18 10:07:46,289 INFO [train.py:1198] (0/2) Epoch 42, batch 1050, loss[loss=0.1986, ctc_loss=0.13, cr_loss=0.3431, over 20807.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1444, cr_loss=0.3693, over 4086081.68 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 64.0 2024-09-18 10:07:49,734 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:08:04,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=745407.5, ans=0.0 2024-09-18 10:08:10,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=745407.5, ans=0.125 2024-09-18 10:08:26,419 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.07 vs. limit=15.0 2024-09-18 10:08:33,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=745464.1666666666, ans=0.125 2024-09-18 10:08:50,113 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:09:02,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=745492.5, ans=0.125 2024-09-18 10:09:04,854 INFO [train.py:1198] (0/2) Epoch 42, batch 1100, loss[loss=0.2146, ctc_loss=0.1406, cr_loss=0.3703, over 20874.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1436, cr_loss=0.3681, over 4095129.85 frames. ], batch size: 54, lr: 1.97e-03, grad_scale: 64.0 2024-09-18 10:09:12,833 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:09:24,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=745549.1666666666, ans=0.2 2024-09-18 10:09:33,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=745577.5, ans=0.125 2024-09-18 10:09:48,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=745605.8333333334, ans=0.125 2024-09-18 10:09:59,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=745605.8333333334, ans=0.125 2024-09-18 10:10:20,237 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.218e+02 2.350e+02 2.561e+02 3.606e+02, threshold=4.699e+02, percent-clipped=0.0 2024-09-18 10:10:20,256 INFO [train.py:1198] (0/2) Epoch 42, batch 1150, loss[loss=0.2009, ctc_loss=0.1316, cr_loss=0.3466, over 21023.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1441, cr_loss=0.3685, over 4079454.20 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:10:46,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=745690.8333333334, ans=0.125 2024-09-18 10:10:57,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-09-18 10:11:13,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745747.5, ans=0.1 2024-09-18 10:11:35,745 INFO [train.py:1198] (0/2) Epoch 42, batch 1200, loss[loss=0.1904, ctc_loss=0.1241, cr_loss=0.3312, over 20990.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.3698, over 4094118.50 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:11:54,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=745832.5, ans=0.125 2024-09-18 10:12:24,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=745889.1666666666, ans=0.125 2024-09-18 10:12:51,513 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.269e+02 2.420e+02 2.563e+02 3.088e+02, threshold=4.840e+02, percent-clipped=0.0 2024-09-18 10:12:51,533 INFO [train.py:1198] (0/2) Epoch 42, batch 1250, loss[loss=0.2067, ctc_loss=0.1343, cr_loss=0.3619, over 20979.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1449, cr_loss=0.3697, over 4088449.37 frames. ], batch size: 48, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:13:26,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=746002.5, ans=0.125 2024-09-18 10:13:45,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-18 10:13:46,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746030.8333333334, ans=0.1 2024-09-18 10:13:55,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-18 10:13:56,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=746059.1666666666, ans=0.125 2024-09-18 10:14:13,310 INFO [train.py:1198] (0/2) Epoch 42, batch 1300, loss[loss=0.2789, ctc_loss=0.1971, cr_loss=0.4091, over 14342.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1448, cr_loss=0.3697, over 4078835.06 frames. ], batch size: 150, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:14:21,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=746087.5, ans=0.0 2024-09-18 10:14:42,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=746144.1666666666, ans=0.125 2024-09-18 10:14:52,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=746144.1666666666, ans=0.05 2024-09-18 10:14:52,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=746144.1666666666, ans=0.0 2024-09-18 10:15:07,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=746172.5, ans=0.125 2024-09-18 10:15:21,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-09-18 10:15:28,709 INFO [train.py:1198] (0/2) Epoch 42, batch 1350, loss[loss=0.1909, ctc_loss=0.1231, cr_loss=0.3391, over 20977.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1451, cr_loss=0.3706, over 4082368.45 frames. ], batch size: 48, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:15:30,186 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.222e+02 2.363e+02 2.515e+02 4.459e+02, threshold=4.726e+02, percent-clipped=0.0 2024-09-18 10:15:38,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746229.1666666666, ans=0.1 2024-09-18 10:16:20,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=22.5 2024-09-18 10:16:45,034 INFO [train.py:1198] (0/2) Epoch 42, batch 1400, loss[loss=0.2051, ctc_loss=0.1346, cr_loss=0.3525, over 20923.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.3698, over 4091340.32 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:16:57,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=746370.8333333334, ans=0.125 2024-09-18 10:17:11,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=746399.1666666666, ans=0.125 2024-09-18 10:17:21,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=746427.5, ans=0.2 2024-09-18 10:17:47,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=746484.1666666666, ans=0.125 2024-09-18 10:17:58,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=746484.1666666666, ans=0.0 2024-09-18 10:18:00,738 INFO [train.py:1198] (0/2) Epoch 42, batch 1450, loss[loss=0.2218, ctc_loss=0.1461, cr_loss=0.3788, over 20935.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1443, cr_loss=0.3702, over 4093071.74 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:18:02,281 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.239e+02 2.394e+02 2.518e+02 6.410e+02, threshold=4.788e+02, percent-clipped=1.0 2024-09-18 10:19:00,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=746597.5, ans=0.025 2024-09-18 10:19:10,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=746625.8333333334, ans=0.125 2024-09-18 10:19:18,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=12.0 2024-09-18 10:19:19,600 INFO [train.py:1198] (0/2) Epoch 42, batch 1500, loss[loss=0.1951, ctc_loss=0.1259, cr_loss=0.3459, over 19941.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.3691, over 4098563.59 frames. ], batch size: 44, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:19:20,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2024-09-18 10:19:32,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.87 vs. limit=10.0 2024-09-18 10:20:07,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-09-18 10:20:27,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=746767.5, ans=0.0 2024-09-18 10:20:38,996 INFO [train.py:1198] (0/2) Epoch 42, batch 1550, loss[loss=0.2677, ctc_loss=0.1857, cr_loss=0.4101, over 14175.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1435, cr_loss=0.3689, over 4103732.67 frames. ], batch size: 149, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:20:40,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.218e+02 2.353e+02 2.529e+02 5.470e+02, threshold=4.706e+02, percent-clipped=1.0 2024-09-18 10:20:42,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746795.8333333334, ans=0.1 2024-09-18 10:20:42,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746795.8333333334, ans=0.1 2024-09-18 10:20:57,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=746824.1666666666, ans=0.2 2024-09-18 10:21:55,015 INFO [train.py:1198] (0/2) Epoch 42, batch 1600, loss[loss=0.1761, ctc_loss=0.1105, cr_loss=0.3279, over 20985.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1443, cr_loss=0.3699, over 4065674.63 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:22:04,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=746937.5, ans=0.2 2024-09-18 10:22:06,501 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-18 10:22:12,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746965.8333333334, ans=0.1 2024-09-18 10:22:37,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=746994.1666666666, ans=0.0 2024-09-18 10:22:38,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=747022.5, ans=0.125 2024-09-18 10:22:48,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747022.5, ans=0.1 2024-09-18 10:22:52,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747022.5, ans=0.1 2024-09-18 10:22:53,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.39 vs. limit=10.0 2024-09-18 10:23:05,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=747050.8333333334, ans=0.2 2024-09-18 10:23:10,080 INFO [train.py:1198] (0/2) Epoch 42, batch 1650, loss[loss=0.196, ctc_loss=0.127, cr_loss=0.3448, over 21054.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3687, over 4083688.99 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:23:11,625 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.220e+02 2.344e+02 2.507e+02 5.464e+02, threshold=4.689e+02, percent-clipped=1.0 2024-09-18 10:23:21,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-18 10:24:25,482 INFO [train.py:1198] (0/2) Epoch 42, batch 1700, loss[loss=0.1968, ctc_loss=0.1294, cr_loss=0.337, over 20870.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1439, cr_loss=0.3693, over 4089629.39 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:24:40,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=747220.8333333334, ans=0.05 2024-09-18 10:24:47,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-18 10:24:55,263 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.99 vs. limit=10.0 2024-09-18 10:25:13,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-18 10:25:37,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-18 10:25:47,579 INFO [train.py:1198] (0/2) Epoch 42, batch 1750, loss[loss=0.2219, ctc_loss=0.146, cr_loss=0.3794, over 20975.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1442, cr_loss=0.3697, over 4086393.43 frames. ], batch size: 55, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:25:50,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.217e+02 2.340e+02 2.503e+02 9.604e+02, threshold=4.680e+02, percent-clipped=1.0 2024-09-18 10:25:50,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=747362.5, ans=0.2 2024-09-18 10:25:50,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=747362.5, ans=0.07 2024-09-18 10:26:43,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-18 10:27:01,767 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-09-18 10:27:03,951 INFO [train.py:1198] (0/2) Epoch 42, batch 1800, loss[loss=0.2165, ctc_loss=0.1443, cr_loss=0.3608, over 20349.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.144, cr_loss=0.3685, over 4069495.00 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:27:23,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-18 10:27:24,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2024-09-18 10:27:42,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=747560.8333333334, ans=0.125 2024-09-18 10:28:00,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=747589.1666666666, ans=0.125 2024-09-18 10:28:08,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=747617.5, ans=0.125 2024-09-18 10:28:12,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=747617.5, ans=0.0 2024-09-18 10:28:20,002 INFO [train.py:1198] (0/2) Epoch 42, batch 1850, loss[loss=0.1905, ctc_loss=0.1222, cr_loss=0.3416, over 20965.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1436, cr_loss=0.368, over 4073513.33 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:28:23,019 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.198e+02 2.365e+02 2.541e+02 3.359e+02, threshold=4.730e+02, percent-clipped=0.0 2024-09-18 10:28:34,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-09-18 10:28:45,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=747674.1666666666, ans=0.2 2024-09-18 10:28:46,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=747674.1666666666, ans=0.015 2024-09-18 10:28:51,348 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:29:00,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747702.5, ans=0.1 2024-09-18 10:29:29,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=747759.1666666666, ans=0.125 2024-09-18 10:29:36,347 INFO [train.py:1198] (0/2) Epoch 42, batch 1900, loss[loss=0.2264, ctc_loss=0.1503, cr_loss=0.3804, over 20258.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3674, over 4091506.33 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:29:36,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=747787.5, ans=0.0 2024-09-18 10:29:41,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=747787.5, ans=0.0 2024-09-18 10:30:00,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=747815.8333333334, ans=0.035 2024-09-18 10:30:32,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=747872.5, ans=0.125 2024-09-18 10:30:49,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=747900.8333333334, ans=0.125 2024-09-18 10:30:55,060 INFO [train.py:1198] (0/2) Epoch 42, batch 1950, loss[loss=0.2179, ctc_loss=0.1429, cr_loss=0.3749, over 21084.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3677, over 4102144.79 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:30:58,066 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.228e+02 2.376e+02 2.477e+02 7.835e+02, threshold=4.753e+02, percent-clipped=1.0 2024-09-18 10:31:02,269 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=22.5 2024-09-18 10:31:04,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=747929.1666666666, ans=0.05 2024-09-18 10:31:09,887 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-18 10:31:34,780 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-264000.pt 2024-09-18 10:31:45,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=748014.1666666666, ans=0.07 2024-09-18 10:32:00,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-18 10:32:15,404 INFO [train.py:1198] (0/2) Epoch 42, batch 2000, loss[loss=0.2748, ctc_loss=0.1906, cr_loss=0.4208, over 14228.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1434, cr_loss=0.369, over 4093091.37 frames. ], batch size: 149, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:32:17,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-09-18 10:32:38,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-18 10:32:39,972 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:32:58,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=748127.5, ans=0.125 2024-09-18 10:33:08,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748155.8333333334, ans=0.1 2024-09-18 10:33:19,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=748184.1666666666, ans=0.2 2024-09-18 10:33:30,877 INFO [train.py:1198] (0/2) Epoch 42, batch 2050, loss[loss=0.2296, ctc_loss=0.1548, cr_loss=0.3737, over 19397.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1448, cr_loss=0.3709, over 4087395.12 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 32.0 2024-09-18 10:33:33,833 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.267e+02 2.395e+02 2.497e+02 3.540e+02, threshold=4.790e+02, percent-clipped=0.0 2024-09-18 10:33:43,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=748212.5, ans=0.2 2024-09-18 10:33:46,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=748240.8333333334, ans=0.125 2024-09-18 10:34:25,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=748297.5, ans=0.125 2024-09-18 10:34:46,602 INFO [train.py:1198] (0/2) Epoch 42, batch 2100, loss[loss=0.1964, ctc_loss=0.1268, cr_loss=0.3483, over 21064.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1442, cr_loss=0.3693, over 4087047.87 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:35:32,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=748439.1666666666, ans=0.125 2024-09-18 10:35:44,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=748439.1666666666, ans=0.0 2024-09-18 10:35:52,439 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2024-09-18 10:36:05,157 INFO [train.py:1198] (0/2) Epoch 42, batch 2150, loss[loss=0.2031, ctc_loss=0.1349, cr_loss=0.3407, over 20753.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1448, cr_loss=0.3702, over 4075091.43 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:36:09,455 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.228e+02 2.319e+02 2.504e+02 3.288e+02, threshold=4.639e+02, percent-clipped=0.0 2024-09-18 10:36:09,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=748495.8333333334, ans=0.125 2024-09-18 10:36:12,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=748495.8333333334, ans=0.125 2024-09-18 10:36:30,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=748524.1666666666, ans=0.5 2024-09-18 10:36:38,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=748552.5, ans=0.0 2024-09-18 10:36:42,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-09-18 10:36:58,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2024-09-18 10:37:13,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=22.5 2024-09-18 10:37:21,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=748609.1666666666, ans=0.2 2024-09-18 10:37:23,748 INFO [train.py:1198] (0/2) Epoch 42, batch 2200, loss[loss=0.2325, ctc_loss=0.1574, cr_loss=0.3757, over 21044.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1448, cr_loss=0.3703, over 4079774.33 frames. ], batch size: 62, lr: 1.97e-03, grad_scale: 16.0 2024-09-18 10:37:28,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.57 vs. limit=10.0 2024-09-18 10:38:25,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=748750.8333333334, ans=0.125 2024-09-18 10:38:39,177 INFO [train.py:1198] (0/2) Epoch 42, batch 2250, loss[loss=0.2245, ctc_loss=0.1485, cr_loss=0.3801, over 20892.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1445, cr_loss=0.3699, over 4093707.71 frames. ], batch size: 54, lr: 1.96e-03, grad_scale: 16.0 2024-09-18 10:38:43,830 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.301e+02 2.453e+02 2.611e+02 4.533e+02, threshold=4.905e+02, percent-clipped=0.0 2024-09-18 10:38:44,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2024-09-18 10:38:54,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=748807.5, ans=0.125 2024-09-18 10:38:59,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=748807.5, ans=0.0 2024-09-18 10:39:31,919 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2024-09-18 10:39:41,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=748892.5, ans=0.125 2024-09-18 10:39:47,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=748892.5, ans=0.0 2024-09-18 10:39:50,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=748892.5, ans=0.0 2024-09-18 10:39:55,006 INFO [train.py:1198] (0/2) Epoch 42, batch 2300, loss[loss=0.2142, ctc_loss=0.1423, cr_loss=0.3596, over 20871.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1441, cr_loss=0.3693, over 4107106.94 frames. ], batch size: 57, lr: 1.96e-03, grad_scale: 16.0 2024-09-18 10:40:18,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.89 vs. limit=10.0 2024-09-18 10:40:24,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=748977.5, ans=0.125 2024-09-18 10:40:47,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2024-09-18 10:41:10,781 INFO [train.py:1198] (0/2) Epoch 42, batch 2350, loss[loss=0.1996, ctc_loss=0.1303, cr_loss=0.3463, over 21080.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1442, cr_loss=0.3688, over 4088680.89 frames. ], batch size: 53, lr: 1.96e-03, grad_scale: 16.0 2024-09-18 10:41:15,370 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.218e+02 2.333e+02 2.492e+02 6.561e+02, threshold=4.666e+02, percent-clipped=1.0 2024-09-18 10:41:26,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=749090.8333333334, ans=0.125 2024-09-18 10:42:08,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=749147.5, ans=0.0 2024-09-18 10:42:18,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=749175.8333333334, ans=0.025 2024-09-18 10:42:28,923 INFO [train.py:1198] (0/2) Epoch 42, batch 2400, loss[loss=0.2063, ctc_loss=0.1351, cr_loss=0.3564, over 20766.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.3687, over 4091548.10 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:42:38,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=749204.1666666666, ans=0.04949747468305833 2024-09-18 10:42:44,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=749204.1666666666, ans=0.125 2024-09-18 10:42:56,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-09-18 10:42:58,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=749232.5, ans=0.125 2024-09-18 10:43:24,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=749289.1666666666, ans=0.125 2024-09-18 10:43:47,819 INFO [train.py:1198] (0/2) Epoch 42, batch 2450, loss[loss=0.2217, ctc_loss=0.1478, cr_loss=0.3698, over 21027.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3683, over 4090545.64 frames. ], batch size: 61, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:43:52,429 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.197e+02 2.347e+02 2.522e+02 5.398e+02, threshold=4.693e+02, percent-clipped=1.0 2024-09-18 10:43:54,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.72 vs. limit=6.0 2024-09-18 10:44:23,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2024-09-18 10:44:27,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749402.5, ans=0.1 2024-09-18 10:45:03,991 INFO [train.py:1198] (0/2) Epoch 42, batch 2500, loss[loss=0.1932, ctc_loss=0.1262, cr_loss=0.3345, over 20974.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1437, cr_loss=0.3677, over 4091746.86 frames. ], batch size: 50, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:45:17,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=749515.8333333334, ans=0.015 2024-09-18 10:45:40,806 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=10.0 2024-09-18 10:45:44,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=749544.1666666666, ans=0.125 2024-09-18 10:45:52,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=749572.5, ans=0.1 2024-09-18 10:46:18,517 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2024-09-18 10:46:19,306 INFO [train.py:1198] (0/2) Epoch 42, batch 2550, loss[loss=0.1833, ctc_loss=0.1186, cr_loss=0.3234, over 20977.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1444, cr_loss=0.3691, over 4091412.19 frames. ], batch size: 51, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:46:21,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=749629.1666666666, ans=0.2 2024-09-18 10:46:23,755 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.217e+02 2.341e+02 2.487e+02 4.101e+02, threshold=4.682e+02, percent-clipped=0.0 2024-09-18 10:46:43,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=749657.5, ans=0.0 2024-09-18 10:46:55,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=749685.8333333334, ans=0.025 2024-09-18 10:47:30,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=749742.5, ans=0.5 2024-09-18 10:47:37,922 INFO [train.py:1198] (0/2) Epoch 42, batch 2600, loss[loss=0.1861, ctc_loss=0.1194, cr_loss=0.3335, over 20978.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1433, cr_loss=0.3677, over 4103496.17 frames. ], batch size: 55, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:47:59,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=749799.1666666666, ans=0.025 2024-09-18 10:48:01,205 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.11 vs. limit=10.0 2024-09-18 10:48:05,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=749799.1666666666, ans=0.125 2024-09-18 10:48:24,882 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:48:40,240 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:48:56,550 INFO [train.py:1198] (0/2) Epoch 42, batch 2650, loss[loss=0.2349, ctc_loss=0.1545, cr_loss=0.402, over 20670.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1434, cr_loss=0.3681, over 4103386.44 frames. ], batch size: 66, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:49:01,205 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.245e+02 2.356e+02 2.477e+02 3.406e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-18 10:50:10,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=750025.8333333334, ans=0.2 2024-09-18 10:50:13,037 INFO [train.py:1198] (0/2) Epoch 42, batch 2700, loss[loss=0.2851, ctc_loss=0.2013, cr_loss=0.4191, over 14029.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1436, cr_loss=0.3682, over 4105468.27 frames. ], batch size: 149, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:50:19,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750054.1666666666, ans=0.1 2024-09-18 10:50:34,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=750082.5, ans=0.125 2024-09-18 10:50:43,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=750110.8333333334, ans=0.125 2024-09-18 10:50:45,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=750110.8333333334, ans=0.125 2024-09-18 10:50:55,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=750110.8333333334, ans=0.125 2024-09-18 10:51:19,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=750167.5, ans=0.125 2024-09-18 10:51:28,556 INFO [train.py:1198] (0/2) Epoch 42, batch 2750, loss[loss=0.2325, ctc_loss=0.1535, cr_loss=0.3952, over 21018.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1442, cr_loss=0.3694, over 4103072.35 frames. ], batch size: 63, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:51:33,123 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.208e+02 2.324e+02 2.458e+02 3.285e+02, threshold=4.649e+02, percent-clipped=0.0 2024-09-18 10:51:55,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=12.0 2024-09-18 10:52:26,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750280.8333333334, ans=0.1 2024-09-18 10:52:44,670 INFO [train.py:1198] (0/2) Epoch 42, batch 2800, loss[loss=0.2086, ctc_loss=0.1384, cr_loss=0.3512, over 20889.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1443, cr_loss=0.3688, over 4094700.07 frames. ], batch size: 54, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:52:48,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=750337.5, ans=0.2 2024-09-18 10:53:02,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=750365.8333333334, ans=0.02 2024-09-18 10:53:40,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=750422.5, ans=0.0 2024-09-18 10:54:03,345 INFO [train.py:1198] (0/2) Epoch 42, batch 2850, loss[loss=0.1735, ctc_loss=0.1106, cr_loss=0.3147, over 20253.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1439, cr_loss=0.3683, over 4089338.89 frames. ], batch size: 45, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:54:10,803 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.233e+02 2.385e+02 2.523e+02 4.065e+02, threshold=4.770e+02, percent-clipped=0.0 2024-09-18 10:54:21,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=750507.5, ans=0.125 2024-09-18 10:54:50,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=750564.1666666666, ans=0.125 2024-09-18 10:54:58,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-18 10:55:16,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=750592.5, ans=0.1 2024-09-18 10:55:21,811 INFO [train.py:1198] (0/2) Epoch 42, batch 2900, loss[loss=0.2512, ctc_loss=0.1695, cr_loss=0.4084, over 20643.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1437, cr_loss=0.3678, over 4090459.95 frames. ], batch size: 68, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:55:28,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=750620.8333333334, ans=0.025 2024-09-18 10:56:18,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=750705.8333333334, ans=0.0 2024-09-18 10:56:26,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=12.0 2024-09-18 10:56:37,840 INFO [train.py:1198] (0/2) Epoch 42, batch 2950, loss[loss=0.2719, ctc_loss=0.184, cr_loss=0.4393, over 18043.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1438, cr_loss=0.3673, over 4073312.97 frames. ], batch size: 108, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:56:38,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-09-18 10:56:42,350 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.223e+02 2.424e+02 2.596e+02 3.606e+02, threshold=4.848e+02, percent-clipped=0.0 2024-09-18 10:57:04,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=750790.8333333334, ans=0.1 2024-09-18 10:57:10,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-18 10:57:42,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=750875.8333333334, ans=0.0 2024-09-18 10:57:53,906 INFO [train.py:1198] (0/2) Epoch 42, batch 3000, loss[loss=0.1868, ctc_loss=0.12, cr_loss=0.3341, over 19921.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1445, cr_loss=0.3685, over 4054377.20 frames. ], batch size: 44, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:57:53,907 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 10:58:25,077 INFO [train.py:1230] (0/2) Epoch 42, validation: loss=0.03965, ctc_loss=0.03965, cr_loss=1.495e-14, over 944034.00 frames. 2024-09-18 10:58:25,078 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 10:59:13,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2024-09-18 10:59:28,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2024-09-18 10:59:29,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=751017.5, ans=0.125 2024-09-18 10:59:32,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=751017.5, ans=0.0 2024-09-18 10:59:44,271 INFO [train.py:1198] (0/2) Epoch 42, batch 3050, loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3683, over 20771.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1446, cr_loss=0.3693, over 4061817.13 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 10:59:51,601 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.222e+02 2.355e+02 2.517e+02 4.209e+02, threshold=4.709e+02, percent-clipped=0.0 2024-09-18 11:00:28,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.47 vs. limit=22.5 2024-09-18 11:00:59,458 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=15.0 2024-09-18 11:01:02,847 INFO [train.py:1198] (0/2) Epoch 42, batch 3100, loss[loss=0.2182, ctc_loss=0.1416, cr_loss=0.3832, over 20839.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.145, cr_loss=0.3702, over 4055268.03 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:01:38,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-09-18 11:02:05,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=751300.8333333334, ans=0.0 2024-09-18 11:02:14,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=751300.8333333334, ans=0.125 2024-09-18 11:02:18,414 INFO [train.py:1198] (0/2) Epoch 42, batch 3150, loss[loss=0.245, ctc_loss=0.165, cr_loss=0.4002, over 20844.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1442, cr_loss=0.3684, over 4072942.06 frames. ], batch size: 65, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:02:22,909 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.210e+02 2.304e+02 2.494e+02 3.038e+02, threshold=4.608e+02, percent-clipped=0.0 2024-09-18 11:02:49,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=751385.8333333334, ans=0.125 2024-09-18 11:03:12,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=751414.1666666666, ans=0.2 2024-09-18 11:03:34,654 INFO [train.py:1198] (0/2) Epoch 42, batch 3200, loss[loss=0.2635, ctc_loss=0.1864, cr_loss=0.3853, over 14284.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.3698, over 4069796.14 frames. ], batch size: 149, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:03:45,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=751470.8333333334, ans=0.125 2024-09-18 11:03:53,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-18 11:04:45,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751584.1666666666, ans=0.1 2024-09-18 11:04:53,070 INFO [train.py:1198] (0/2) Epoch 42, batch 3250, loss[loss=0.1855, ctc_loss=0.1192, cr_loss=0.3315, over 20999.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1441, cr_loss=0.3682, over 4062825.49 frames. ], batch size: 52, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:04:57,503 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.263e+02 2.396e+02 2.527e+02 4.570e+02, threshold=4.792e+02, percent-clipped=0.0 2024-09-18 11:05:11,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-18 11:05:36,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=751669.1666666666, ans=0.125 2024-09-18 11:06:01,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-09-18 11:06:11,275 INFO [train.py:1198] (0/2) Epoch 42, batch 3300, loss[loss=0.2152, ctc_loss=0.1447, cr_loss=0.3528, over 20849.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1443, cr_loss=0.3679, over 4061881.26 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:06:27,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.08 vs. limit=6.0 2024-09-18 11:06:32,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=751782.5, ans=0.07 2024-09-18 11:06:36,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=751782.5, ans=0.1 2024-09-18 11:06:44,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=751810.8333333334, ans=0.0 2024-09-18 11:06:49,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=751810.8333333334, ans=0.125 2024-09-18 11:07:27,023 INFO [train.py:1198] (0/2) Epoch 42, batch 3350, loss[loss=0.278, ctc_loss=0.1988, cr_loss=0.3963, over 14327.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1448, cr_loss=0.3685, over 4051676.74 frames. ], batch size: 150, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:07:31,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.215e+02 2.369e+02 2.477e+02 3.546e+02, threshold=4.739e+02, percent-clipped=0.0 2024-09-18 11:08:14,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=751980.8333333334, ans=0.0 2024-09-18 11:08:16,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-18 11:08:42,102 INFO [train.py:1198] (0/2) Epoch 42, batch 3400, loss[loss=0.1907, ctc_loss=0.1244, cr_loss=0.3315, over 20952.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1453, cr_loss=0.3698, over 4060488.02 frames. ], batch size: 50, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:09:27,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=752122.5, ans=0.0 2024-09-18 11:09:41,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=752150.8333333334, ans=0.125 2024-09-18 11:10:00,513 INFO [train.py:1198] (0/2) Epoch 42, batch 3450, loss[loss=0.1828, ctc_loss=0.119, cr_loss=0.3191, over 20987.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1451, cr_loss=0.3698, over 4066548.55 frames. ], batch size: 51, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:10:01,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.67 vs. limit=22.5 2024-09-18 11:10:01,448 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=22.5 2024-09-18 11:10:02,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.63 vs. limit=10.0 2024-09-18 11:10:05,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.220e+02 2.371e+02 2.527e+02 4.053e+02, threshold=4.742e+02, percent-clipped=0.0 2024-09-18 11:10:16,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.07 vs. limit=6.0 2024-09-18 11:10:31,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752235.8333333334, ans=0.1 2024-09-18 11:10:58,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=752264.1666666666, ans=0.04949747468305833 2024-09-18 11:11:19,760 INFO [train.py:1198] (0/2) Epoch 42, batch 3500, loss[loss=0.1969, ctc_loss=0.1282, cr_loss=0.3435, over 21026.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.3698, over 4073303.46 frames. ], batch size: 48, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:11:41,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=752349.1666666666, ans=0.125 2024-09-18 11:11:45,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=752349.1666666666, ans=0.95 2024-09-18 11:11:52,161 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=12.0 2024-09-18 11:11:54,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=752377.5, ans=0.0 2024-09-18 11:12:26,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752434.1666666666, ans=0.1 2024-09-18 11:12:35,247 INFO [train.py:1198] (0/2) Epoch 42, batch 3550, loss[loss=0.2331, ctc_loss=0.1554, cr_loss=0.3888, over 19404.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3712, over 4061824.83 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:12:39,778 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.270e+02 2.399e+02 2.600e+02 3.964e+02, threshold=4.798e+02, percent-clipped=0.0 2024-09-18 11:12:59,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=752490.8333333334, ans=0.2 2024-09-18 11:13:44,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=752575.8333333334, ans=0.125 2024-09-18 11:13:50,300 INFO [train.py:1198] (0/2) Epoch 42, batch 3600, loss[loss=0.1857, ctc_loss=0.1225, cr_loss=0.3164, over 21045.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1459, cr_loss=0.3713, over 4068338.38 frames. ], batch size: 53, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:13:58,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=752604.1666666666, ans=0.0 2024-09-18 11:14:40,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=752689.1666666666, ans=0.035 2024-09-18 11:14:53,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=752717.5, ans=0.0 2024-09-18 11:15:06,097 INFO [train.py:1198] (0/2) Epoch 42, batch 3650, loss[loss=0.2181, ctc_loss=0.141, cr_loss=0.3855, over 21053.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.146, cr_loss=0.3721, over 4065893.01 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:15:10,525 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.202e+02 2.348e+02 2.464e+02 3.004e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 11:15:14,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-18 11:15:32,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=752774.1666666666, ans=0.125 2024-09-18 11:16:25,385 INFO [train.py:1198] (0/2) Epoch 42, batch 3700, loss[loss=0.2527, ctc_loss=0.1651, cr_loss=0.4379, over 20636.00 frames. ], tot_loss[loss=0.2197, ctc_loss=0.1454, cr_loss=0.3718, over 4079018.86 frames. ], batch size: 68, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:16:35,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=752887.5, ans=0.125 2024-09-18 11:16:51,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=752915.8333333334, ans=0.125 2024-09-18 11:16:55,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=752915.8333333334, ans=0.0 2024-09-18 11:17:04,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752944.1666666666, ans=0.1 2024-09-18 11:17:06,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=752944.1666666666, ans=0.125 2024-09-18 11:17:18,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2024-09-18 11:17:43,512 INFO [train.py:1198] (0/2) Epoch 42, batch 3750, loss[loss=0.2193, ctc_loss=0.1422, cr_loss=0.3854, over 20943.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.145, cr_loss=0.3709, over 4089820.27 frames. ], batch size: 50, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:17:47,993 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.231e+02 2.360e+02 2.537e+02 3.045e+02, threshold=4.719e+02, percent-clipped=0.0 2024-09-18 11:17:49,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=753029.1666666666, ans=0.05 2024-09-18 11:18:20,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=753085.8333333334, ans=0.0 2024-09-18 11:18:58,564 INFO [train.py:1198] (0/2) Epoch 42, batch 3800, loss[loss=0.1787, ctc_loss=0.1182, cr_loss=0.3027, over 20955.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1463, cr_loss=0.3723, over 4066643.23 frames. ], batch size: 49, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:19:12,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=753199.1666666666, ans=0.0 2024-09-18 11:19:23,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=753199.1666666666, ans=0.1 2024-09-18 11:19:26,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-18 11:19:30,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=753227.5, ans=0.125 2024-09-18 11:19:38,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=753227.5, ans=0.125 2024-09-18 11:20:02,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=753284.1666666666, ans=0.2 2024-09-18 11:20:14,568 INFO [train.py:1198] (0/2) Epoch 42, batch 3850, loss[loss=0.2298, ctc_loss=0.1532, cr_loss=0.3831, over 20984.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.146, cr_loss=0.3719, over 4068697.65 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:20:16,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=753312.5, ans=0.5 2024-09-18 11:20:19,099 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.299e+02 2.403e+02 2.610e+02 3.655e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-18 11:20:52,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=753369.1666666666, ans=0.125 2024-09-18 11:21:32,732 INFO [train.py:1198] (0/2) Epoch 42, batch 3900, loss[loss=0.2627, ctc_loss=0.1753, cr_loss=0.437, over 18132.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.146, cr_loss=0.3726, over 4076772.77 frames. ], batch size: 108, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:21:48,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=753482.5, ans=0.2 2024-09-18 11:21:51,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=753482.5, ans=0.0 2024-09-18 11:21:54,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=753482.5, ans=0.125 2024-09-18 11:22:15,090 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:22:34,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=753567.5, ans=0.0 2024-09-18 11:22:51,272 INFO [train.py:1198] (0/2) Epoch 42, batch 3950, loss[loss=0.1991, ctc_loss=0.129, cr_loss=0.3505, over 20969.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3712, over 4079674.21 frames. ], batch size: 51, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:22:55,808 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.232e+02 2.359e+02 2.467e+02 3.381e+02, threshold=4.718e+02, percent-clipped=0.0 2024-09-18 11:23:18,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-18 11:23:52,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=753709.1666666666, ans=0.1 2024-09-18 11:23:56,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=753709.1666666666, ans=0.125 2024-09-18 11:24:07,205 INFO [train.py:1198] (0/2) Epoch 42, batch 4000, loss[loss=0.2277, ctc_loss=0.15, cr_loss=0.3887, over 20255.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1459, cr_loss=0.3725, over 4081980.95 frames. ], batch size: 74, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:24:27,319 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:24:31,954 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-09-18 11:24:39,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=753794.1666666666, ans=0.125 2024-09-18 11:25:07,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=753850.8333333334, ans=0.0 2024-09-18 11:25:09,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=753850.8333333334, ans=0.125 2024-09-18 11:25:19,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=753850.8333333334, ans=0.05 2024-09-18 11:25:21,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=753879.1666666666, ans=0.0 2024-09-18 11:25:22,407 INFO [train.py:1198] (0/2) Epoch 42, batch 4050, loss[loss=0.2302, ctc_loss=0.1522, cr_loss=0.3899, over 21074.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1454, cr_loss=0.371, over 4072224.57 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:25:28,629 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.209e+02 2.341e+02 2.493e+02 2.885e+02, threshold=4.682e+02, percent-clipped=0.0 2024-09-18 11:25:46,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=753907.5, ans=0.1 2024-09-18 11:25:58,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=753935.8333333334, ans=0.0 2024-09-18 11:26:00,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=753935.8333333334, ans=0.125 2024-09-18 11:26:29,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=753992.5, ans=0.2 2024-09-18 11:26:38,536 INFO [train.py:1198] (0/2) Epoch 42, batch 4100, loss[loss=0.248, ctc_loss=0.1661, cr_loss=0.41, over 20631.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1449, cr_loss=0.37, over 4081827.30 frames. ], batch size: 71, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:26:47,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=754020.8333333334, ans=0.125 2024-09-18 11:26:47,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=754020.8333333334, ans=0.125 2024-09-18 11:27:10,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=754077.5, ans=0.0 2024-09-18 11:27:59,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=754162.5, ans=0.125 2024-09-18 11:28:00,913 INFO [train.py:1198] (0/2) Epoch 42, batch 4150, loss[loss=0.232, ctc_loss=0.1565, cr_loss=0.3773, over 20703.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1447, cr_loss=0.3693, over 4084851.21 frames. ], batch size: 71, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:28:06,907 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.211e+02 2.332e+02 2.484e+02 4.088e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-18 11:28:41,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=754219.1666666666, ans=0.125 2024-09-18 11:28:47,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-09-18 11:29:10,393 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:29:17,592 INFO [train.py:1198] (0/2) Epoch 42, batch 4200, loss[loss=0.2522, ctc_loss=0.1668, cr_loss=0.4273, over 21056.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1447, cr_loss=0.3694, over 4092762.02 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:29:22,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=754304.1666666666, ans=0.0 2024-09-18 11:29:39,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=754332.5, ans=0.0 2024-09-18 11:29:59,140 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:30:18,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=754417.5, ans=0.125 2024-09-18 11:30:26,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=754417.5, ans=10.0 2024-09-18 11:30:33,467 INFO [train.py:1198] (0/2) Epoch 42, batch 4250, loss[loss=0.1978, ctc_loss=0.1316, cr_loss=0.3309, over 20934.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1442, cr_loss=0.3684, over 4087821.65 frames. ], batch size: 67, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:30:33,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=754445.8333333334, ans=0.125 2024-09-18 11:30:38,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-18 11:30:39,572 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.235e+02 2.374e+02 2.515e+02 4.071e+02, threshold=4.749e+02, percent-clipped=0.0 2024-09-18 11:30:49,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.01 vs. limit=6.0 2024-09-18 11:31:26,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=754530.8333333334, ans=0.1 2024-09-18 11:31:29,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=754530.8333333334, ans=0.2 2024-09-18 11:31:34,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=754559.1666666666, ans=0.125 2024-09-18 11:31:37,580 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-09-18 11:31:39,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.74 vs. limit=10.0 2024-09-18 11:31:40,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-09-18 11:31:48,905 INFO [train.py:1198] (0/2) Epoch 42, batch 4300, loss[loss=0.2108, ctc_loss=0.14, cr_loss=0.3543, over 20234.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1447, cr_loss=0.3686, over 4073690.80 frames. ], batch size: 45, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:32:02,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=754615.8333333334, ans=0.125 2024-09-18 11:32:10,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-18 11:32:12,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-18 11:32:23,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=754644.1666666666, ans=0.5 2024-09-18 11:32:33,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=754672.5, ans=0.0 2024-09-18 11:32:48,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=754672.5, ans=0.125 2024-09-18 11:33:03,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=754700.8333333334, ans=0.0 2024-09-18 11:33:07,185 INFO [train.py:1198] (0/2) Epoch 42, batch 4350, loss[loss=0.2309, ctc_loss=0.1542, cr_loss=0.3839, over 20923.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.144, cr_loss=0.368, over 4082307.75 frames. ], batch size: 60, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:33:13,282 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.238e+02 2.376e+02 2.514e+02 4.883e+02, threshold=4.753e+02, percent-clipped=1.0 2024-09-18 11:33:18,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=754729.1666666666, ans=0.0 2024-09-18 11:33:32,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=754757.5, ans=0.125 2024-09-18 11:33:46,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=754785.8333333334, ans=0.125 2024-09-18 11:34:25,889 INFO [train.py:1198] (0/2) Epoch 42, batch 4400, loss[loss=0.2344, ctc_loss=0.1585, cr_loss=0.3797, over 20770.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1445, cr_loss=0.3685, over 4086725.19 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:34:44,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-18 11:34:52,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=754899.1666666666, ans=0.07 2024-09-18 11:35:25,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=15.0 2024-09-18 11:35:26,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=754984.1666666666, ans=0.125 2024-09-18 11:35:34,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=754984.1666666666, ans=0.04949747468305833 2024-09-18 11:35:34,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=754984.1666666666, ans=0.0 2024-09-18 11:35:41,512 INFO [train.py:1198] (0/2) Epoch 42, batch 4450, loss[loss=0.2206, ctc_loss=0.1459, cr_loss=0.3735, over 20253.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1445, cr_loss=0.3687, over 4090182.89 frames. ], batch size: 74, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:35:47,562 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.246e+02 2.370e+02 2.552e+02 4.794e+02, threshold=4.740e+02, percent-clipped=1.0 2024-09-18 11:36:00,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=22.5 2024-09-18 11:36:46,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-18 11:36:57,192 INFO [train.py:1198] (0/2) Epoch 42, batch 4500, loss[loss=0.2161, ctc_loss=0.1428, cr_loss=0.3665, over 20959.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3688, over 4099533.32 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:37:13,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-09-18 11:37:21,091 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.42 vs. limit=15.0 2024-09-18 11:37:21,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=755182.5, ans=0.025 2024-09-18 11:38:13,891 INFO [train.py:1198] (0/2) Epoch 42, batch 4550, loss[loss=0.2258, ctc_loss=0.1488, cr_loss=0.385, over 20030.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1448, cr_loss=0.3701, over 4090885.56 frames. ], batch size: 80, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:38:19,944 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.212e+02 2.352e+02 2.623e+02 4.675e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 11:38:25,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=755295.8333333334, ans=0.125 2024-09-18 11:38:35,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=755324.1666666666, ans=0.025 2024-09-18 11:38:58,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=755352.5, ans=0.125 2024-09-18 11:39:10,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=755380.8333333334, ans=0.125 2024-09-18 11:39:14,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=755380.8333333334, ans=0.125 2024-09-18 11:39:35,509 INFO [train.py:1198] (0/2) Epoch 42, batch 4600, loss[loss=0.1705, ctc_loss=0.1095, cr_loss=0.3053, over 20957.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1446, cr_loss=0.37, over 4099592.81 frames. ], batch size: 49, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:39:58,997 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:40:15,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=755494.1666666666, ans=0.0 2024-09-18 11:40:51,813 INFO [train.py:1198] (0/2) Epoch 42, batch 4650, loss[loss=0.2014, ctc_loss=0.1296, cr_loss=0.359, over 20879.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.144, cr_loss=0.3687, over 4099404.80 frames. ], batch size: 54, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:40:55,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=755579.1666666666, ans=0.09899494936611666 2024-09-18 11:40:57,751 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.215e+02 2.347e+02 2.486e+02 3.653e+02, threshold=4.694e+02, percent-clipped=0.0 2024-09-18 11:41:00,342 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.21 vs. limit=12.0 2024-09-18 11:41:03,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=12.0 2024-09-18 11:41:13,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=755607.5, ans=0.2 2024-09-18 11:41:22,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=755635.8333333334, ans=0.125 2024-09-18 11:41:36,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2024-09-18 11:42:00,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=755692.5, ans=0.5 2024-09-18 11:42:00,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-09-18 11:42:07,861 INFO [train.py:1198] (0/2) Epoch 42, batch 4700, loss[loss=0.1802, ctc_loss=0.1151, cr_loss=0.3254, over 20967.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1448, cr_loss=0.3698, over 4082820.10 frames. ], batch size: 51, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:43:23,396 INFO [train.py:1198] (0/2) Epoch 42, batch 4750, loss[loss=0.1905, ctc_loss=0.1224, cr_loss=0.3408, over 21008.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1445, cr_loss=0.3701, over 4087988.30 frames. ], batch size: 52, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:43:29,265 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.249e+02 2.385e+02 2.589e+02 4.538e+02, threshold=4.770e+02, percent-clipped=0.0 2024-09-18 11:44:33,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=755975.8333333334, ans=0.5 2024-09-18 11:44:41,859 INFO [train.py:1198] (0/2) Epoch 42, batch 4800, loss[loss=0.2325, ctc_loss=0.1541, cr_loss=0.3917, over 20831.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1443, cr_loss=0.3698, over 4099805.87 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:44:55,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=756032.5, ans=0.125 2024-09-18 11:45:21,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=756060.8333333334, ans=0.125 2024-09-18 11:45:32,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=756089.1666666666, ans=0.5 2024-09-18 11:45:38,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=756089.1666666666, ans=15.0 2024-09-18 11:45:48,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=756117.5, ans=0.95 2024-09-18 11:46:00,376 INFO [train.py:1198] (0/2) Epoch 42, batch 4850, loss[loss=0.2322, ctc_loss=0.1544, cr_loss=0.389, over 20316.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.369, over 4094112.24 frames. ], batch size: 74, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:46:06,354 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.238e+02 2.388e+02 2.577e+02 3.502e+02, threshold=4.776e+02, percent-clipped=0.0 2024-09-18 11:46:17,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=756174.1666666666, ans=0.0 2024-09-18 11:46:23,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756174.1666666666, ans=0.1 2024-09-18 11:46:33,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=756202.5, ans=0.125 2024-09-18 11:46:40,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=756202.5, ans=0.0 2024-09-18 11:47:11,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756259.1666666666, ans=0.1 2024-09-18 11:47:11,802 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2024-09-18 11:47:15,411 INFO [train.py:1198] (0/2) Epoch 42, batch 4900, loss[loss=0.221, ctc_loss=0.1436, cr_loss=0.3871, over 20882.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1444, cr_loss=0.3692, over 4078713.83 frames. ], batch size: 57, lr: 1.96e-03, grad_scale: 32.0 2024-09-18 11:47:17,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=756287.5, ans=0.125 2024-09-18 11:47:31,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2024-09-18 11:47:36,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.00 vs. limit=10.0 2024-09-18 11:47:45,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=756344.1666666666, ans=0.0 2024-09-18 11:47:47,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=15.0 2024-09-18 11:47:48,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=756344.1666666666, ans=15.0 2024-09-18 11:48:09,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=756372.5, ans=0.2 2024-09-18 11:48:18,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756400.8333333334, ans=0.1 2024-09-18 11:48:30,375 INFO [train.py:1198] (0/2) Epoch 42, batch 4950, loss[loss=0.2104, ctc_loss=0.1374, cr_loss=0.3652, over 20410.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3706, over 4065041.08 frames. ], batch size: 74, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:48:30,830 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:48:36,216 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.228e+02 2.343e+02 2.516e+02 5.057e+02, threshold=4.687e+02, percent-clipped=1.0 2024-09-18 11:49:18,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756514.1666666666, ans=0.1 2024-09-18 11:49:44,174 INFO [train.py:1198] (0/2) Epoch 42, batch 5000, loss[loss=0.1938, ctc_loss=0.126, cr_loss=0.3387, over 20936.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.145, cr_loss=0.3702, over 4060169.88 frames. ], batch size: 48, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:49:44,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=756570.8333333334, ans=0.05 2024-09-18 11:49:56,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=756570.8333333334, ans=0.125 2024-09-18 11:50:07,152 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:50:23,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=756627.5, ans=0.125 2024-09-18 11:50:59,007 INFO [train.py:1198] (0/2) Epoch 42, batch 5050, loss[loss=0.1712, ctc_loss=0.1069, cr_loss=0.3215, over 20986.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1444, cr_loss=0.3698, over 4071852.08 frames. ], batch size: 50, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:51:04,927 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.162e+02 2.307e+02 2.467e+02 3.044e+02, threshold=4.614e+02, percent-clipped=0.0 2024-09-18 11:51:41,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=756769.1666666666, ans=0.125 2024-09-18 11:52:08,376 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2024-09-18 11:52:12,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=756854.1666666666, ans=0.0 2024-09-18 11:52:13,342 INFO [train.py:1198] (0/2) Epoch 42, batch 5100, loss[loss=0.2018, ctc_loss=0.1308, cr_loss=0.3551, over 20770.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3687, over 4087171.58 frames. ], batch size: 53, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:52:17,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756854.1666666666, ans=0.1 2024-09-18 11:52:44,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=756910.8333333334, ans=0.125 2024-09-18 11:53:30,725 INFO [train.py:1198] (0/2) Epoch 42, batch 5150, loss[loss=0.2308, ctc_loss=0.1512, cr_loss=0.3981, over 20810.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3685, over 4092355.31 frames. ], batch size: 53, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:53:36,704 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.200e+02 2.336e+02 2.475e+02 3.055e+02, threshold=4.671e+02, percent-clipped=0.0 2024-09-18 11:53:56,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-18 11:54:09,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=757052.5, ans=0.0 2024-09-18 11:54:11,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=757052.5, ans=15.0 2024-09-18 11:54:40,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=757109.1666666666, ans=0.125 2024-09-18 11:54:47,775 INFO [train.py:1198] (0/2) Epoch 42, batch 5200, loss[loss=0.1849, ctc_loss=0.1194, cr_loss=0.3276, over 21067.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1435, cr_loss=0.3676, over 4091770.78 frames. ], batch size: 53, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:54:55,569 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:55:05,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=757165.8333333334, ans=0.125 2024-09-18 11:55:10,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=757165.8333333334, ans=0.0 2024-09-18 11:55:11,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=757165.8333333334, ans=0.125 2024-09-18 11:55:39,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=757222.5, ans=0.1 2024-09-18 11:56:01,685 INFO [train.py:1198] (0/2) Epoch 42, batch 5250, loss[loss=0.223, ctc_loss=0.1473, cr_loss=0.3786, over 20858.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1442, cr_loss=0.3685, over 4082702.38 frames. ], batch size: 59, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:56:07,762 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.203e+02 2.338e+02 2.493e+02 9.904e+02, threshold=4.677e+02, percent-clipped=1.0 2024-09-18 11:56:37,509 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:56:42,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2024-09-18 11:56:49,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=757364.1666666666, ans=0.2 2024-09-18 11:56:49,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=757364.1666666666, ans=0.04949747468305833 2024-09-18 11:56:52,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=757364.1666666666, ans=0.2 2024-09-18 11:57:14,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=757420.8333333334, ans=0.0 2024-09-18 11:57:15,611 INFO [train.py:1198] (0/2) Epoch 42, batch 5300, loss[loss=0.2077, ctc_loss=0.1344, cr_loss=0.3665, over 20956.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1441, cr_loss=0.3687, over 4093263.20 frames. ], batch size: 55, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:58:03,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=757505.8333333334, ans=0.125 2024-09-18 11:58:12,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=757505.8333333334, ans=0.125 2024-09-18 11:58:21,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=757534.1666666666, ans=0.1 2024-09-18 11:58:29,995 INFO [train.py:1198] (0/2) Epoch 42, batch 5350, loss[loss=0.184, ctc_loss=0.1179, cr_loss=0.3305, over 20969.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.144, cr_loss=0.3685, over 4098017.43 frames. ], batch size: 48, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 11:58:35,857 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.227e+02 2.362e+02 2.544e+02 3.275e+02, threshold=4.724e+02, percent-clipped=0.0 2024-09-18 11:58:58,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=757619.1666666666, ans=0.125 2024-09-18 11:59:20,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=757647.5, ans=0.125 2024-09-18 11:59:25,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=757647.5, ans=0.0 2024-09-18 11:59:32,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=757675.8333333334, ans=0.0 2024-09-18 11:59:44,552 INFO [train.py:1198] (0/2) Epoch 42, batch 5400, loss[loss=0.2321, ctc_loss=0.156, cr_loss=0.3808, over 20827.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1443, cr_loss=0.369, over 4098256.81 frames. ], batch size: 65, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:00:15,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=757760.8333333334, ans=0.0 2024-09-18 12:00:16,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=757760.8333333334, ans=0.125 2024-09-18 12:00:32,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2024-09-18 12:00:59,376 INFO [train.py:1198] (0/2) Epoch 42, batch 5450, loss[loss=0.252, ctc_loss=0.1684, cr_loss=0.4182, over 19978.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1443, cr_loss=0.3696, over 4097584.92 frames. ], batch size: 80, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:01:05,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.223e+02 2.334e+02 2.480e+02 3.545e+02, threshold=4.669e+02, percent-clipped=0.0 2024-09-18 12:01:14,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=757874.1666666666, ans=0.125 2024-09-18 12:01:46,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=757930.8333333334, ans=0.1 2024-09-18 12:01:49,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=757930.8333333334, ans=0.0 2024-09-18 12:01:53,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=757930.8333333334, ans=0.07 2024-09-18 12:01:56,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=757930.8333333334, ans=0.0 2024-09-18 12:01:56,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2024-09-18 12:02:05,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=757959.1666666666, ans=0.125 2024-09-18 12:02:15,415 INFO [train.py:1198] (0/2) Epoch 42, batch 5500, loss[loss=0.236, ctc_loss=0.1585, cr_loss=0.3877, over 20704.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1451, cr_loss=0.3702, over 4088333.67 frames. ], batch size: 71, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:02:23,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=757987.5, ans=0.0 2024-09-18 12:02:29,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=758015.8333333334, ans=0.2 2024-09-18 12:03:01,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2024-09-18 12:03:17,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=758100.8333333334, ans=0.125 2024-09-18 12:03:25,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758100.8333333334, ans=0.1 2024-09-18 12:03:32,555 INFO [train.py:1198] (0/2) Epoch 42, batch 5550, loss[loss=0.1935, ctc_loss=0.1282, cr_loss=0.3263, over 20982.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1444, cr_loss=0.3691, over 4085188.38 frames. ], batch size: 52, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:03:38,465 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.215e+02 2.333e+02 2.541e+02 3.490e+02, threshold=4.666e+02, percent-clipped=0.0 2024-09-18 12:03:53,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=758157.5, ans=0.1 2024-09-18 12:03:53,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=758157.5, ans=0.125 2024-09-18 12:04:09,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=758185.8333333334, ans=0.2 2024-09-18 12:04:17,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=758214.1666666666, ans=0.125 2024-09-18 12:04:46,833 INFO [train.py:1198] (0/2) Epoch 42, batch 5600, loss[loss=0.1836, ctc_loss=0.1204, cr_loss=0.3161, over 20990.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.144, cr_loss=0.3689, over 4079306.03 frames. ], batch size: 48, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:04:49,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=758270.8333333334, ans=0.5 2024-09-18 12:04:54,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=758270.8333333334, ans=0.1 2024-09-18 12:05:15,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=758327.5, ans=0.2 2024-09-18 12:06:00,837 INFO [train.py:1198] (0/2) Epoch 42, batch 5650, loss[loss=0.2463, ctc_loss=0.1658, cr_loss=0.4027, over 20652.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.368, over 4082607.35 frames. ], batch size: 66, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:06:06,681 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.227e+02 2.333e+02 2.449e+02 4.721e+02, threshold=4.665e+02, percent-clipped=1.0 2024-09-18 12:06:07,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=758412.5, ans=0.125 2024-09-18 12:06:20,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=758440.8333333334, ans=0.125 2024-09-18 12:06:48,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=758497.5, ans=0.0 2024-09-18 12:06:51,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=758497.5, ans=0.2 2024-09-18 12:06:57,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=758497.5, ans=0.125 2024-09-18 12:07:00,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=758525.8333333334, ans=0.04949747468305833 2024-09-18 12:07:12,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=758525.8333333334, ans=0.125 2024-09-18 12:07:14,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=758554.1666666666, ans=0.125 2024-09-18 12:07:15,417 INFO [train.py:1198] (0/2) Epoch 42, batch 5700, loss[loss=0.2193, ctc_loss=0.145, cr_loss=0.3712, over 20367.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1435, cr_loss=0.3675, over 4091670.91 frames. ], batch size: 74, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:07:26,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=758554.1666666666, ans=0.0 2024-09-18 12:07:32,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=22.5 2024-09-18 12:07:40,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=758582.5, ans=0.0 2024-09-18 12:08:30,142 INFO [train.py:1198] (0/2) Epoch 42, batch 5750, loss[loss=0.2194, ctc_loss=0.1461, cr_loss=0.3666, over 20791.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1437, cr_loss=0.3675, over 4092314.44 frames. ], batch size: 53, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:08:36,185 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.229e+02 2.357e+02 2.498e+02 3.142e+02, threshold=4.714e+02, percent-clipped=0.0 2024-09-18 12:08:42,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=758695.8333333334, ans=0.2 2024-09-18 12:09:45,180 INFO [train.py:1198] (0/2) Epoch 42, batch 5800, loss[loss=0.2032, ctc_loss=0.1337, cr_loss=0.3475, over 20903.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1426, cr_loss=0.366, over 4095970.13 frames. ], batch size: 54, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:09:50,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-18 12:09:54,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-09-18 12:10:03,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=758865.8333333334, ans=0.1 2024-09-18 12:10:14,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=758865.8333333334, ans=0.125 2024-09-18 12:10:36,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=758922.5, ans=0.0 2024-09-18 12:10:49,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=758950.8333333334, ans=0.125 2024-09-18 12:11:01,288 INFO [train.py:1198] (0/2) Epoch 42, batch 5850, loss[loss=0.2198, ctc_loss=0.145, cr_loss=0.374, over 20966.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.143, cr_loss=0.3671, over 4096566.18 frames. ], batch size: 64, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:11:07,132 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.208e+02 2.371e+02 2.563e+02 6.085e+02, threshold=4.743e+02, percent-clipped=1.0 2024-09-18 12:11:14,062 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2024-09-18 12:11:16,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=759007.5, ans=0.125 2024-09-18 12:11:16,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=759007.5, ans=0.125 2024-09-18 12:11:23,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=759007.5, ans=0.0 2024-09-18 12:11:57,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=759064.1666666666, ans=0.125 2024-09-18 12:12:18,067 INFO [train.py:1198] (0/2) Epoch 42, batch 5900, loss[loss=0.1767, ctc_loss=0.1144, cr_loss=0.3115, over 20960.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.3678, over 4098575.43 frames. ], batch size: 48, lr: 1.95e-03, grad_scale: 16.0 2024-09-18 12:13:26,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=759234.1666666666, ans=0.125 2024-09-18 12:13:32,322 INFO [train.py:1198] (0/2) Epoch 42, batch 5950, loss[loss=0.2384, ctc_loss=0.1592, cr_loss=0.396, over 19362.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1437, cr_loss=0.3697, over 4101344.89 frames. ], batch size: 90, lr: 1.95e-03, grad_scale: 16.0 2024-09-18 12:13:35,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=759262.5, ans=0.025 2024-09-18 12:13:40,006 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.274e+02 2.405e+02 2.547e+02 4.054e+02, threshold=4.810e+02, percent-clipped=0.0 2024-09-18 12:14:04,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=759319.1666666666, ans=0.07 2024-09-18 12:14:08,328 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-268000.pt 2024-09-18 12:14:37,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=759375.8333333334, ans=0.09899494936611666 2024-09-18 12:14:48,090 INFO [train.py:1198] (0/2) Epoch 42, batch 6000, loss[loss=0.2329, ctc_loss=0.1553, cr_loss=0.3884, over 19292.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1449, cr_loss=0.372, over 4089985.68 frames. ], batch size: 90, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:14:48,091 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 12:15:13,586 INFO [train.py:1230] (0/2) Epoch 42, validation: loss=0.03968, ctc_loss=0.03968, cr_loss=1.49e-14, over 944034.00 frames. 2024-09-18 12:15:13,587 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 12:15:32,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=759432.5, ans=0.0 2024-09-18 12:15:32,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=759432.5, ans=0.0 2024-09-18 12:16:01,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=759489.1666666666, ans=15.0 2024-09-18 12:16:14,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=759517.5, ans=0.125 2024-09-18 12:16:19,695 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-09-18 12:16:23,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=22.5 2024-09-18 12:16:29,686 INFO [train.py:1198] (0/2) Epoch 42, batch 6050, loss[loss=0.2257, ctc_loss=0.1478, cr_loss=0.3895, over 20976.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1442, cr_loss=0.3704, over 4084955.08 frames. ], batch size: 55, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:16:36,973 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.236e+02 2.362e+02 2.572e+02 5.022e+02, threshold=4.725e+02, percent-clipped=1.0 2024-09-18 12:16:37,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=759545.8333333334, ans=0.125 2024-09-18 12:16:46,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=759574.1666666666, ans=0.025 2024-09-18 12:16:56,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=759574.1666666666, ans=0.125 2024-09-18 12:17:08,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=759602.5, ans=0.035 2024-09-18 12:17:36,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=759659.1666666666, ans=0.025 2024-09-18 12:17:43,965 INFO [train.py:1198] (0/2) Epoch 42, batch 6100, loss[loss=0.1987, ctc_loss=0.1271, cr_loss=0.3581, over 20893.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.144, cr_loss=0.37, over 4086862.17 frames. ], batch size: 54, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:17:56,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759687.5, ans=0.1 2024-09-18 12:18:56,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-18 12:19:00,103 INFO [train.py:1198] (0/2) Epoch 42, batch 6150, loss[loss=0.2354, ctc_loss=0.1539, cr_loss=0.4075, over 20648.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1434, cr_loss=0.3685, over 4086051.26 frames. ], batch size: 68, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:19:07,464 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.231e+02 2.384e+02 2.516e+02 6.261e+02, threshold=4.768e+02, percent-clipped=1.0 2024-09-18 12:19:11,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.01 vs. limit=15.0 2024-09-18 12:19:32,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=759885.8333333334, ans=0.0 2024-09-18 12:19:40,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=759885.8333333334, ans=0.125 2024-09-18 12:19:49,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=759914.1666666666, ans=0.125 2024-09-18 12:20:09,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=759942.5, ans=0.125 2024-09-18 12:20:15,527 INFO [train.py:1198] (0/2) Epoch 42, batch 6200, loss[loss=0.2064, ctc_loss=0.1342, cr_loss=0.3611, over 21080.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1431, cr_loss=0.367, over 4081430.75 frames. ], batch size: 56, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:20:21,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=759970.8333333334, ans=0.125 2024-09-18 12:20:21,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=759970.8333333334, ans=0.0 2024-09-18 12:20:23,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=759970.8333333334, ans=0.0 2024-09-18 12:20:41,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=759999.1666666666, ans=0.0 2024-09-18 12:20:43,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=12.0 2024-09-18 12:21:06,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=22.5 2024-09-18 12:21:27,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=760084.1666666666, ans=0.125 2024-09-18 12:21:29,505 INFO [train.py:1198] (0/2) Epoch 42, batch 6250, loss[loss=0.2572, ctc_loss=0.1713, cr_loss=0.4299, over 18120.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1439, cr_loss=0.3683, over 4042981.50 frames. ], batch size: 108, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:21:31,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=760112.5, ans=0.125 2024-09-18 12:21:36,544 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.273e+02 2.403e+02 2.616e+02 3.818e+02, threshold=4.806e+02, percent-clipped=0.0 2024-09-18 12:22:37,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=760225.8333333334, ans=0.0 2024-09-18 12:22:44,271 INFO [train.py:1198] (0/2) Epoch 42, batch 6300, loss[loss=0.1822, ctc_loss=0.118, cr_loss=0.321, over 20953.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1427, cr_loss=0.3651, over 4024978.64 frames. ], batch size: 48, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:22:45,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=760254.1666666666, ans=0.2 2024-09-18 12:23:55,950 INFO [train.py:1198] (0/2) Epoch 42, batch 6350, loss[loss=0.2625, ctc_loss=0.1829, cr_loss=0.398, over 14795.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1465, cr_loss=0.3679, over 3870787.28 frames. ], batch size: 150, lr: 1.95e-03, grad_scale: 32.0 2024-09-18 12:24:00,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=760395.8333333334, ans=0.0 2024-09-18 12:24:02,925 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.356e+02 2.647e+02 2.907e+02 3.900e+02, threshold=5.294e+02, percent-clipped=0.0 2024-09-18 12:24:18,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2024-09-18 12:24:48,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=760480.8333333334, ans=0.125 2024-09-18 12:24:48,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=760480.8333333334, ans=0.125 2024-09-18 12:24:55,072 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-42.pt 2024-09-18 12:25:44,430 INFO [train.py:1198] (0/2) Epoch 43, batch 0, loss[loss=0.2104, ctc_loss=0.138, cr_loss=0.362, over 20991.00 frames. ], tot_loss[loss=0.2104, ctc_loss=0.138, cr_loss=0.362, over 20991.00 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:25:44,431 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 12:25:57,778 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7232, 4.2815, 4.5276, 4.1279], device='cuda:0') 2024-09-18 12:26:02,555 INFO [train.py:1230] (0/2) Epoch 43, validation: loss=0.03933, ctc_loss=0.03933, cr_loss=1.486e-14, over 944034.00 frames. 2024-09-18 12:26:02,555 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 12:26:02,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=760512.0, ans=0.07 2024-09-18 12:26:04,998 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2024-09-18 12:26:18,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=760512.0, ans=0.2 2024-09-18 12:26:42,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=760568.6666666666, ans=0.2 2024-09-18 12:26:56,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-09-18 12:27:09,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-18 12:27:16,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2024-09-18 12:27:21,270 INFO [train.py:1198] (0/2) Epoch 43, batch 50, loss[loss=0.2053, ctc_loss=0.1349, cr_loss=0.3519, over 20872.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1441, cr_loss=0.3703, over 913169.53 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:27:31,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-18 12:27:42,608 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.311e+02 2.515e+02 2.748e+02 4.191e+02, threshold=5.029e+02, percent-clipped=0.0 2024-09-18 12:28:08,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760738.6666666666, ans=0.1 2024-09-18 12:28:37,381 INFO [train.py:1198] (0/2) Epoch 43, batch 100, loss[loss=0.1851, ctc_loss=0.1208, cr_loss=0.3213, over 20955.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1439, cr_loss=0.3703, over 1625746.80 frames. ], batch size: 55, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:29:51,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=760937.0, ans=0.2 2024-09-18 12:29:52,667 INFO [train.py:1198] (0/2) Epoch 43, batch 150, loss[loss=0.2644, ctc_loss=0.1854, cr_loss=0.3951, over 14076.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3689, over 2170126.41 frames. ], batch size: 149, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:29:57,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2024-09-18 12:29:58,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760937.0, ans=0.1 2024-09-18 12:30:00,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=760937.0, ans=0.125 2024-09-18 12:30:13,854 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.219e+02 2.315e+02 2.493e+02 7.656e+02, threshold=4.631e+02, percent-clipped=1.0 2024-09-18 12:30:39,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=761022.0, ans=0.2 2024-09-18 12:30:41,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=761022.0, ans=0.025 2024-09-18 12:31:11,947 INFO [train.py:1198] (0/2) Epoch 43, batch 200, loss[loss=0.1891, ctc_loss=0.1209, cr_loss=0.341, over 20975.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1436, cr_loss=0.3691, over 2593894.60 frames. ], batch size: 51, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:31:13,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=761078.6666666666, ans=0.05 2024-09-18 12:31:20,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-18 12:31:21,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=761078.6666666666, ans=0.125 2024-09-18 12:31:24,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761078.6666666666, ans=0.1 2024-09-18 12:31:26,488 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.97 vs. limit=10.0 2024-09-18 12:31:58,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=761163.6666666666, ans=0.125 2024-09-18 12:32:18,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=761192.0, ans=0.025 2024-09-18 12:32:30,288 INFO [train.py:1198] (0/2) Epoch 43, batch 250, loss[loss=0.195, ctc_loss=0.1275, cr_loss=0.3375, over 20784.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3686, over 2935331.75 frames. ], batch size: 53, lr: 1.93e-03, grad_scale: 16.0 2024-09-18 12:32:53,046 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.235e+02 2.347e+02 2.466e+02 3.759e+02, threshold=4.695e+02, percent-clipped=0.0 2024-09-18 12:33:02,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-18 12:33:05,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2024-09-18 12:33:13,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=22.5 2024-09-18 12:33:45,869 INFO [train.py:1198] (0/2) Epoch 43, batch 300, loss[loss=0.1978, ctc_loss=0.1298, cr_loss=0.3402, over 19872.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3685, over 3184606.51 frames. ], batch size: 44, lr: 1.93e-03, grad_scale: 16.0 2024-09-18 12:33:58,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=761362.0, ans=0.07 2024-09-18 12:34:24,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761418.6666666666, ans=0.1 2024-09-18 12:34:47,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=761475.3333333334, ans=0.125 2024-09-18 12:34:55,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=761475.3333333334, ans=0.2 2024-09-18 12:35:01,553 INFO [train.py:1198] (0/2) Epoch 43, batch 350, loss[loss=0.2133, ctc_loss=0.141, cr_loss=0.362, over 20881.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1425, cr_loss=0.3666, over 3383076.96 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 16.0 2024-09-18 12:35:24,146 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.225e+02 2.365e+02 2.483e+02 3.085e+02, threshold=4.731e+02, percent-clipped=0.0 2024-09-18 12:35:26,091 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:35:43,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=761560.3333333334, ans=0.2 2024-09-18 12:35:48,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=761588.6666666666, ans=0.125 2024-09-18 12:36:06,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=761617.0, ans=0.125 2024-09-18 12:36:19,883 INFO [train.py:1198] (0/2) Epoch 43, batch 400, loss[loss=0.2307, ctc_loss=0.1561, cr_loss=0.3731, over 20371.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.368, over 3544146.09 frames. ], batch size: 74, lr: 1.93e-03, grad_scale: 32.0 2024-09-18 12:36:24,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=761645.3333333334, ans=0.0 2024-09-18 12:37:35,661 INFO [train.py:1198] (0/2) Epoch 43, batch 450, loss[loss=0.2464, ctc_loss=0.1711, cr_loss=0.3765, over 14969.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3673, over 3673939.56 frames. ], batch size: 149, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:37:38,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=761787.0, ans=0.125 2024-09-18 12:38:02,186 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.276e+02 2.380e+02 2.495e+02 3.272e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 12:38:12,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=761843.6666666666, ans=0.125 2024-09-18 12:38:40,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=761900.3333333334, ans=0.07 2024-09-18 12:38:44,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=761900.3333333334, ans=0.125 2024-09-18 12:38:55,025 INFO [train.py:1198] (0/2) Epoch 43, batch 500, loss[loss=0.2397, ctc_loss=0.1605, cr_loss=0.3962, over 20124.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1414, cr_loss=0.3653, over 3768623.52 frames. ], batch size: 80, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:39:13,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=761957.0, ans=0.125 2024-09-18 12:39:27,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2024-09-18 12:39:40,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2024-09-18 12:39:41,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762013.6666666666, ans=0.1 2024-09-18 12:39:49,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.82 vs. limit=10.0 2024-09-18 12:39:50,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-18 12:40:11,553 INFO [train.py:1198] (0/2) Epoch 43, batch 550, loss[loss=0.1818, ctc_loss=0.1192, cr_loss=0.3129, over 20977.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1415, cr_loss=0.3655, over 3839905.35 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:40:11,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=762070.3333333334, ans=0.0 2024-09-18 12:40:12,009 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:40:13,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=762070.3333333334, ans=0.025 2024-09-18 12:40:18,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=762070.3333333334, ans=0.125 2024-09-18 12:40:36,314 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.265e+02 2.358e+02 2.495e+02 4.134e+02, threshold=4.716e+02, percent-clipped=0.0 2024-09-18 12:40:48,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-18 12:41:14,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762183.6666666666, ans=0.1 2024-09-18 12:41:16,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=762183.6666666666, ans=0.5 2024-09-18 12:41:28,291 INFO [train.py:1198] (0/2) Epoch 43, batch 600, loss[loss=0.2065, ctc_loss=0.1365, cr_loss=0.3497, over 20884.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1419, cr_loss=0.3665, over 3905695.82 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:41:55,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.95 vs. limit=12.0 2024-09-18 12:42:11,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=762268.6666666666, ans=0.025 2024-09-18 12:42:12,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=762268.6666666666, ans=0.125 2024-09-18 12:42:35,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=762325.3333333334, ans=0.125 2024-09-18 12:42:46,889 INFO [train.py:1198] (0/2) Epoch 43, batch 650, loss[loss=0.2342, ctc_loss=0.1575, cr_loss=0.3833, over 20677.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1427, cr_loss=0.3679, over 3951497.48 frames. ], batch size: 66, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:42:48,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762353.6666666666, ans=0.1 2024-09-18 12:43:10,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=762382.0, ans=0.07 2024-09-18 12:43:11,295 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.202e+02 2.371e+02 2.474e+02 3.120e+02, threshold=4.742e+02, percent-clipped=0.0 2024-09-18 12:43:13,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=762382.0, ans=0.125 2024-09-18 12:43:28,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=762410.3333333334, ans=0.125 2024-09-18 12:43:35,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762438.6666666666, ans=0.1 2024-09-18 12:43:35,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=762438.6666666666, ans=0.0 2024-09-18 12:43:40,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=762438.6666666666, ans=0.0 2024-09-18 12:43:43,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=762438.6666666666, ans=0.0 2024-09-18 12:44:06,131 INFO [train.py:1198] (0/2) Epoch 43, batch 700, loss[loss=0.1876, ctc_loss=0.1216, cr_loss=0.3298, over 20944.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.143, cr_loss=0.368, over 3992284.41 frames. ], batch size: 50, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:44:32,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=762523.6666666666, ans=0.125 2024-09-18 12:44:59,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=762580.3333333334, ans=0.2 2024-09-18 12:45:12,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=762608.6666666666, ans=0.0 2024-09-18 12:45:17,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=762608.6666666666, ans=0.0 2024-09-18 12:45:19,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=762608.6666666666, ans=0.0 2024-09-18 12:45:21,823 INFO [train.py:1198] (0/2) Epoch 43, batch 750, loss[loss=0.1884, ctc_loss=0.1205, cr_loss=0.3396, over 20952.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3677, over 4023302.54 frames. ], batch size: 50, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 12:45:46,102 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.195e+02 2.345e+02 2.486e+02 3.872e+02, threshold=4.690e+02, percent-clipped=0.0 2024-09-18 12:45:57,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=762693.6666666666, ans=0.125 2024-09-18 12:46:38,167 INFO [train.py:1198] (0/2) Epoch 43, batch 800, loss[loss=0.2277, ctc_loss=0.1501, cr_loss=0.3879, over 20872.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3671, over 4054420.73 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:46:41,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=762778.6666666666, ans=0.125 2024-09-18 12:46:55,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=762807.0, ans=0.125 2024-09-18 12:47:01,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=762807.0, ans=0.125 2024-09-18 12:47:04,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762807.0, ans=0.1 2024-09-18 12:47:33,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2024-09-18 12:47:34,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=12.0 2024-09-18 12:47:39,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=762892.0, ans=0.0 2024-09-18 12:47:51,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=762892.0, ans=0.0 2024-09-18 12:47:55,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=762920.3333333334, ans=0.0 2024-09-18 12:47:56,964 INFO [train.py:1198] (0/2) Epoch 43, batch 850, loss[loss=0.2521, ctc_loss=0.1725, cr_loss=0.3981, over 18246.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.368, over 4063778.17 frames. ], batch size: 108, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:48:09,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=762920.3333333334, ans=0.2 2024-09-18 12:48:21,243 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.267e+02 2.379e+02 2.498e+02 4.743e+02, threshold=4.757e+02, percent-clipped=1.0 2024-09-18 12:49:09,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=763033.6666666666, ans=0.07 2024-09-18 12:49:13,308 INFO [train.py:1198] (0/2) Epoch 43, batch 900, loss[loss=0.2189, ctc_loss=0.1425, cr_loss=0.3822, over 21011.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1425, cr_loss=0.3679, over 4085742.63 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:49:13,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=763062.0, ans=0.05 2024-09-18 12:49:20,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.78 vs. limit=15.0 2024-09-18 12:50:00,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=763147.0, ans=0.2 2024-09-18 12:50:17,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-09-18 12:50:22,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=763175.3333333334, ans=0.0 2024-09-18 12:50:25,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=763175.3333333334, ans=0.2 2024-09-18 12:50:31,575 INFO [train.py:1198] (0/2) Epoch 43, batch 950, loss[loss=0.1911, ctc_loss=0.1237, cr_loss=0.337, over 20984.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.3681, over 4092175.63 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:50:32,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=22.5 2024-09-18 12:50:56,110 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.275e+02 2.427e+02 2.617e+02 7.443e+02, threshold=4.854e+02, percent-clipped=1.0 2024-09-18 12:51:47,511 INFO [train.py:1198] (0/2) Epoch 43, batch 1000, loss[loss=0.1939, ctc_loss=0.1279, cr_loss=0.3304, over 21058.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3687, over 4088203.94 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:51:55,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=763345.3333333334, ans=0.0 2024-09-18 12:52:22,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=763402.0, ans=0.125 2024-09-18 12:52:33,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=763430.3333333334, ans=0.2 2024-09-18 12:52:59,049 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=7.95 vs. limit=22.5 2024-09-18 12:53:01,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763487.0, ans=0.1 2024-09-18 12:53:02,637 INFO [train.py:1198] (0/2) Epoch 43, batch 1050, loss[loss=0.1934, ctc_loss=0.1269, cr_loss=0.3328, over 20851.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3688, over 4098200.85 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:53:03,316 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2024-09-18 12:53:26,782 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.289e+02 2.431e+02 2.556e+02 3.579e+02, threshold=4.862e+02, percent-clipped=0.0 2024-09-18 12:53:46,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=763543.6666666666, ans=0.125 2024-09-18 12:54:19,780 INFO [train.py:1198] (0/2) Epoch 43, batch 1100, loss[loss=0.2466, ctc_loss=0.1637, cr_loss=0.4145, over 20311.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1447, cr_loss=0.371, over 4090747.37 frames. ], batch size: 74, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:54:20,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=763628.6666666666, ans=0.0 2024-09-18 12:54:55,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=763685.3333333334, ans=0.035 2024-09-18 12:54:56,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=763685.3333333334, ans=0.125 2024-09-18 12:55:18,002 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=22.5 2024-09-18 12:55:28,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=763742.0, ans=0.0 2024-09-18 12:55:28,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=763742.0, ans=0.2 2024-09-18 12:55:38,340 INFO [train.py:1198] (0/2) Epoch 43, batch 1150, loss[loss=0.2002, ctc_loss=0.128, cr_loss=0.361, over 20906.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1443, cr_loss=0.3705, over 4105917.25 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:55:44,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=763770.3333333334, ans=0.025 2024-09-18 12:55:55,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=763798.6666666666, ans=0.0 2024-09-18 12:55:58,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=763798.6666666666, ans=0.125 2024-09-18 12:56:02,125 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.226e+02 2.401e+02 2.541e+02 3.971e+02, threshold=4.803e+02, percent-clipped=0.0 2024-09-18 12:56:53,608 INFO [train.py:1198] (0/2) Epoch 43, batch 1200, loss[loss=0.21, ctc_loss=0.1379, cr_loss=0.3608, over 21019.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1442, cr_loss=0.3704, over 4108456.65 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:56:56,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=763912.0, ans=0.125 2024-09-18 12:57:21,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=763940.3333333334, ans=0.2 2024-09-18 12:57:45,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=763997.0, ans=0.125 2024-09-18 12:58:09,441 INFO [train.py:1198] (0/2) Epoch 43, batch 1250, loss[loss=0.2456, ctc_loss=0.1647, cr_loss=0.4043, over 20986.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3695, over 4121112.27 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:58:15,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=764053.6666666666, ans=0.125 2024-09-18 12:58:23,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=764082.0, ans=0.125 2024-09-18 12:58:28,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=764082.0, ans=0.125 2024-09-18 12:58:34,046 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.215e+02 2.350e+02 2.505e+02 4.533e+02, threshold=4.700e+02, percent-clipped=0.0 2024-09-18 12:58:34,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=764082.0, ans=0.2 2024-09-18 12:59:28,289 INFO [train.py:1198] (0/2) Epoch 43, batch 1300, loss[loss=0.2589, ctc_loss=0.1782, cr_loss=0.4038, over 14197.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3688, over 4118910.93 frames. ], batch size: 149, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 12:59:31,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=15.0 2024-09-18 12:59:43,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=764223.6666666666, ans=0.0 2024-09-18 12:59:47,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=764223.6666666666, ans=0.125 2024-09-18 12:59:47,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=764223.6666666666, ans=0.07 2024-09-18 12:59:57,153 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-18 13:00:11,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=764280.3333333334, ans=15.0 2024-09-18 13:00:43,164 INFO [train.py:1198] (0/2) Epoch 43, batch 1350, loss[loss=0.2258, ctc_loss=0.1491, cr_loss=0.3831, over 20988.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.3693, over 4120202.08 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:00:45,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=764337.0, ans=0.0 2024-09-18 13:00:52,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=764337.0, ans=0.04949747468305833 2024-09-18 13:01:10,555 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.271e+02 2.416e+02 2.564e+02 3.017e+02, threshold=4.832e+02, percent-clipped=0.0 2024-09-18 13:01:17,421 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=80.55 vs. limit=15.0 2024-09-18 13:01:32,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=764422.0, ans=0.2 2024-09-18 13:01:34,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=764422.0, ans=0.0 2024-09-18 13:01:49,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=764450.3333333334, ans=0.2 2024-09-18 13:01:56,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764450.3333333334, ans=0.1 2024-09-18 13:02:02,022 INFO [train.py:1198] (0/2) Epoch 43, batch 1400, loss[loss=0.2235, ctc_loss=0.1472, cr_loss=0.3818, over 20862.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3684, over 4125396.46 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:02:02,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=764478.6666666666, ans=0.125 2024-09-18 13:02:13,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=764478.6666666666, ans=0.0 2024-09-18 13:02:50,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=764563.6666666666, ans=0.125 2024-09-18 13:02:50,718 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:03:18,081 INFO [train.py:1198] (0/2) Epoch 43, batch 1450, loss[loss=0.2465, ctc_loss=0.1639, cr_loss=0.4127, over 20341.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.369, over 4122701.19 frames. ], batch size: 74, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:03:41,936 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.233e+02 2.357e+02 2.504e+02 4.834e+02, threshold=4.713e+02, percent-clipped=1.0 2024-09-18 13:03:42,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=764648.6666666666, ans=0.125 2024-09-18 13:03:50,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-09-18 13:04:11,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=764705.3333333334, ans=0.125 2024-09-18 13:04:33,331 INFO [train.py:1198] (0/2) Epoch 43, batch 1500, loss[loss=0.2493, ctc_loss=0.1732, cr_loss=0.3803, over 14088.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3677, over 4113153.44 frames. ], batch size: 149, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:04:44,196 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:04:56,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=764790.3333333334, ans=0.0 2024-09-18 13:05:38,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=764875.3333333334, ans=0.0 2024-09-18 13:05:44,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=764875.3333333334, ans=0.0 2024-09-18 13:05:50,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=764903.6666666666, ans=0.125 2024-09-18 13:05:51,702 INFO [train.py:1198] (0/2) Epoch 43, batch 1550, loss[loss=0.2532, ctc_loss=0.1733, cr_loss=0.3997, over 18625.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.3699, over 4106742.33 frames. ], batch size: 108, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:05:54,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-09-18 13:06:15,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.255e+02 2.371e+02 2.529e+02 3.069e+02, threshold=4.742e+02, percent-clipped=0.0 2024-09-18 13:06:17,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=764932.0, ans=0.0 2024-09-18 13:06:42,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=764988.6666666666, ans=0.2 2024-09-18 13:06:55,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=765017.0, ans=0.125 2024-09-18 13:07:10,823 INFO [train.py:1198] (0/2) Epoch 43, batch 1600, loss[loss=0.2262, ctc_loss=0.1497, cr_loss=0.3825, over 20931.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3692, over 4119256.66 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:07:15,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=765045.3333333334, ans=0.125 2024-09-18 13:07:50,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=765102.0, ans=0.125 2024-09-18 13:07:59,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=765130.3333333334, ans=0.025 2024-09-18 13:08:26,681 INFO [train.py:1198] (0/2) Epoch 43, batch 1650, loss[loss=0.2016, ctc_loss=0.1308, cr_loss=0.3536, over 20781.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3692, over 4120599.42 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:08:45,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=765215.3333333334, ans=0.2 2024-09-18 13:08:50,931 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.206e+02 2.330e+02 2.465e+02 8.182e+02, threshold=4.660e+02, percent-clipped=1.0 2024-09-18 13:08:55,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=765243.6666666666, ans=0.125 2024-09-18 13:09:41,800 INFO [train.py:1198] (0/2) Epoch 43, batch 1700, loss[loss=0.1669, ctc_loss=0.1054, cr_loss=0.3076, over 20336.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1435, cr_loss=0.369, over 4123564.89 frames. ], batch size: 45, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:10:01,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=765357.0, ans=0.125 2024-09-18 13:10:24,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-18 13:10:29,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=765413.6666666666, ans=0.2 2024-09-18 13:11:00,969 INFO [train.py:1198] (0/2) Epoch 43, batch 1750, loss[loss=0.2025, ctc_loss=0.1321, cr_loss=0.3521, over 20912.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1439, cr_loss=0.3696, over 4121691.75 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:11:08,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=765470.3333333334, ans=0.0 2024-09-18 13:11:26,562 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.225e+02 2.337e+02 2.490e+02 3.252e+02, threshold=4.674e+02, percent-clipped=0.0 2024-09-18 13:12:16,542 INFO [train.py:1198] (0/2) Epoch 43, batch 1800, loss[loss=0.1813, ctc_loss=0.117, cr_loss=0.3214, over 20963.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.3692, over 4124046.78 frames. ], batch size: 48, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:12:40,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765640.3333333334, ans=0.1 2024-09-18 13:12:45,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765640.3333333334, ans=0.1 2024-09-18 13:13:02,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=765668.6666666666, ans=0.125 2024-09-18 13:13:22,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=765725.3333333334, ans=0.125 2024-09-18 13:13:23,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-18 13:13:28,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=765725.3333333334, ans=0.0 2024-09-18 13:13:35,117 INFO [train.py:1198] (0/2) Epoch 43, batch 1850, loss[loss=0.225, ctc_loss=0.1487, cr_loss=0.3818, over 21066.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.369, over 4108124.95 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:14:00,429 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 2.224e+02 2.325e+02 2.496e+02 3.243e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-18 13:14:04,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-18 13:14:17,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=765810.3333333334, ans=0.05 2024-09-18 13:14:25,393 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.74 vs. limit=6.0 2024-09-18 13:14:49,933 INFO [train.py:1198] (0/2) Epoch 43, batch 1900, loss[loss=0.2266, ctc_loss=0.1476, cr_loss=0.3955, over 21054.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.3694, over 4120287.09 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:15:31,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=765952.0, ans=0.0 2024-09-18 13:15:59,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=766008.6666666666, ans=0.1 2024-09-18 13:16:05,624 INFO [train.py:1198] (0/2) Epoch 43, batch 1950, loss[loss=0.2277, ctc_loss=0.152, cr_loss=0.378, over 20313.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3685, over 4133973.78 frames. ], batch size: 74, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:16:31,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=766065.3333333334, ans=0.0 2024-09-18 13:16:34,184 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.258e+02 2.401e+02 2.571e+02 3.138e+02, threshold=4.801e+02, percent-clipped=0.0 2024-09-18 13:16:37,680 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:17:10,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=766150.3333333334, ans=0.0 2024-09-18 13:17:11,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2024-09-18 13:17:24,135 INFO [train.py:1198] (0/2) Epoch 43, batch 2000, loss[loss=0.2155, ctc_loss=0.1406, cr_loss=0.3746, over 20879.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.3678, over 4103166.91 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:17:53,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766235.3333333334, ans=0.1 2024-09-18 13:18:05,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=766235.3333333334, ans=0.95 2024-09-18 13:18:06,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=22.5 2024-09-18 13:18:20,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=766263.6666666666, ans=10.0 2024-09-18 13:18:42,751 INFO [train.py:1198] (0/2) Epoch 43, batch 2050, loss[loss=0.2341, ctc_loss=0.1551, cr_loss=0.3952, over 21033.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1425, cr_loss=0.3669, over 4099387.79 frames. ], batch size: 62, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:18:49,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=766320.3333333334, ans=0.025 2024-09-18 13:19:01,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=766348.6666666666, ans=0.2 2024-09-18 13:19:07,411 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:19:09,890 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.229e+02 2.367e+02 2.513e+02 4.426e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-18 13:19:39,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=15.0 2024-09-18 13:19:40,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=766405.3333333334, ans=0.1 2024-09-18 13:19:58,244 INFO [train.py:1198] (0/2) Epoch 43, batch 2100, loss[loss=0.2266, ctc_loss=0.151, cr_loss=0.3781, over 20252.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1439, cr_loss=0.3694, over 4102341.26 frames. ], batch size: 74, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:21:14,250 INFO [train.py:1198] (0/2) Epoch 43, batch 2150, loss[loss=0.1946, ctc_loss=0.1245, cr_loss=0.3506, over 19792.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.3681, over 4109933.21 frames. ], batch size: 44, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:21:41,424 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.228e+02 2.360e+02 2.498e+02 9.463e+02, threshold=4.720e+02, percent-clipped=1.0 2024-09-18 13:22:07,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=766688.6666666666, ans=0.125 2024-09-18 13:22:19,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-18 13:22:32,907 INFO [train.py:1198] (0/2) Epoch 43, batch 2200, loss[loss=0.2409, ctc_loss=0.1589, cr_loss=0.4098, over 20674.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.144, cr_loss=0.37, over 4117655.54 frames. ], batch size: 66, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:22:44,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=766745.3333333334, ans=0.125 2024-09-18 13:22:49,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=22.5 2024-09-18 13:23:12,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=766802.0, ans=0.125 2024-09-18 13:23:36,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=766858.6666666666, ans=0.125 2024-09-18 13:23:48,834 INFO [train.py:1198] (0/2) Epoch 43, batch 2250, loss[loss=0.2136, ctc_loss=0.1402, cr_loss=0.3673, over 21052.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3697, over 4120824.27 frames. ], batch size: 63, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:24:19,056 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.235e+02 2.370e+02 2.558e+02 3.752e+02, threshold=4.740e+02, percent-clipped=0.0 2024-09-18 13:24:21,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=766943.6666666666, ans=0.0 2024-09-18 13:25:00,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=767000.3333333334, ans=0.125 2024-09-18 13:25:08,188 INFO [train.py:1198] (0/2) Epoch 43, batch 2300, loss[loss=0.2283, ctc_loss=0.1517, cr_loss=0.3829, over 20601.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.3691, over 4120403.86 frames. ], batch size: 75, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:26:24,626 INFO [train.py:1198] (0/2) Epoch 43, batch 2350, loss[loss=0.1991, ctc_loss=0.1281, cr_loss=0.3553, over 20992.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3684, over 4123550.79 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:26:51,361 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.207e+02 2.330e+02 2.481e+02 3.401e+02, threshold=4.661e+02, percent-clipped=0.0 2024-09-18 13:26:55,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-18 13:27:12,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=767255.3333333334, ans=0.125 2024-09-18 13:27:39,653 INFO [train.py:1198] (0/2) Epoch 43, batch 2400, loss[loss=0.2362, ctc_loss=0.1543, cr_loss=0.4097, over 20893.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1443, cr_loss=0.3703, over 4107935.03 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:27:50,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=767312.0, ans=0.125 2024-09-18 13:28:55,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=22.5 2024-09-18 13:28:58,192 INFO [train.py:1198] (0/2) Epoch 43, batch 2450, loss[loss=0.2607, ctc_loss=0.1785, cr_loss=0.4112, over 14271.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1444, cr_loss=0.3708, over 4098558.58 frames. ], batch size: 151, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:29:00,443 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2024-09-18 13:29:24,806 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.236e+02 2.402e+02 2.533e+02 4.948e+02, threshold=4.804e+02, percent-clipped=1.0 2024-09-18 13:29:37,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=8.0 2024-09-18 13:30:15,862 INFO [train.py:1198] (0/2) Epoch 43, batch 2500, loss[loss=0.2156, ctc_loss=0.1435, cr_loss=0.3608, over 20772.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.3697, over 4113719.91 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:30:17,800 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:30:52,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=767652.0, ans=0.015 2024-09-18 13:30:55,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=767652.0, ans=0.125 2024-09-18 13:31:31,299 INFO [train.py:1198] (0/2) Epoch 43, batch 2550, loss[loss=0.195, ctc_loss=0.1275, cr_loss=0.3377, over 19764.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1442, cr_loss=0.3698, over 4108551.81 frames. ], batch size: 44, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:31:33,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=767737.0, ans=0.0 2024-09-18 13:31:58,435 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.072e+02 2.306e+02 2.404e+02 2.637e+02 4.725e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-18 13:32:21,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=767822.0, ans=0.0 2024-09-18 13:32:26,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=767822.0, ans=0.125 2024-09-18 13:32:46,781 INFO [train.py:1198] (0/2) Epoch 43, batch 2600, loss[loss=0.2164, ctc_loss=0.1442, cr_loss=0.3611, over 19326.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1442, cr_loss=0.3696, over 4106250.19 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:33:23,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=767935.3333333334, ans=0.2 2024-09-18 13:34:03,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=767992.0, ans=0.0 2024-09-18 13:34:05,734 INFO [train.py:1198] (0/2) Epoch 43, batch 2650, loss[loss=0.1951, ctc_loss=0.1229, cr_loss=0.3613, over 21053.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1444, cr_loss=0.3705, over 4104239.02 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:34:19,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768048.6666666666, ans=0.1 2024-09-18 13:34:34,600 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.262e+02 2.370e+02 2.506e+02 3.502e+02, threshold=4.740e+02, percent-clipped=0.0 2024-09-18 13:34:35,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=768077.0, ans=0.0 2024-09-18 13:34:42,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=768077.0, ans=0.0 2024-09-18 13:35:21,842 INFO [train.py:1198] (0/2) Epoch 43, batch 2700, loss[loss=0.222, ctc_loss=0.1453, cr_loss=0.3838, over 20973.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.3703, over 4089189.05 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:36:14,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=768247.0, ans=0.125 2024-09-18 13:36:40,458 INFO [train.py:1198] (0/2) Epoch 43, batch 2750, loss[loss=0.2116, ctc_loss=0.138, cr_loss=0.3678, over 21012.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1447, cr_loss=0.3704, over 4090624.08 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 16.0 2024-09-18 13:37:09,250 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.240e+02 2.347e+02 2.535e+02 3.931e+02, threshold=4.694e+02, percent-clipped=0.0 2024-09-18 13:37:30,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=768388.6666666666, ans=0.125 2024-09-18 13:37:56,208 INFO [train.py:1198] (0/2) Epoch 43, batch 2800, loss[loss=0.2312, ctc_loss=0.1524, cr_loss=0.3944, over 20860.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1449, cr_loss=0.3705, over 4084294.35 frames. ], batch size: 65, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:37:56,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=768445.3333333334, ans=0.2 2024-09-18 13:38:21,682 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:38:23,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=768473.6666666666, ans=0.0 2024-09-18 13:38:34,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-09-18 13:39:01,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=768558.6666666666, ans=0.0 2024-09-18 13:39:12,093 INFO [train.py:1198] (0/2) Epoch 43, batch 2850, loss[loss=0.2034, ctc_loss=0.131, cr_loss=0.3624, over 20994.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1457, cr_loss=0.3722, over 4087816.35 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:39:28,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=768615.3333333334, ans=0.2 2024-09-18 13:39:43,053 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.235e+02 2.352e+02 2.505e+02 3.686e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 13:39:46,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=768643.6666666666, ans=10.0 2024-09-18 13:40:17,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=768700.3333333334, ans=10.0 2024-09-18 13:40:30,157 INFO [train.py:1198] (0/2) Epoch 43, batch 2900, loss[loss=0.2389, ctc_loss=0.1553, cr_loss=0.4181, over 20949.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1455, cr_loss=0.3723, over 4095970.73 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:41:05,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2024-09-18 13:41:37,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=768842.0, ans=0.0 2024-09-18 13:41:48,990 INFO [train.py:1198] (0/2) Epoch 43, batch 2950, loss[loss=0.2689, ctc_loss=0.1829, cr_loss=0.4297, over 18087.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1456, cr_loss=0.3724, over 4084270.37 frames. ], batch size: 108, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:42:17,702 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.226e+02 2.369e+02 2.532e+02 3.433e+02, threshold=4.738e+02, percent-clipped=0.0 2024-09-18 13:42:19,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=768927.0, ans=0.125 2024-09-18 13:42:30,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=22.5 2024-09-18 13:42:35,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-09-18 13:42:40,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=768955.3333333334, ans=0.0 2024-09-18 13:42:43,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=768955.3333333334, ans=0.125 2024-09-18 13:43:04,468 INFO [train.py:1198] (0/2) Epoch 43, batch 3000, loss[loss=0.192, ctc_loss=0.1235, cr_loss=0.3424, over 20957.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1445, cr_loss=0.3706, over 4091935.70 frames. ], batch size: 50, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:43:04,469 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 13:43:25,972 INFO [train.py:1230] (0/2) Epoch 43, validation: loss=0.03935, ctc_loss=0.03935, cr_loss=1.469e-14, over 944034.00 frames. 2024-09-18 13:43:25,973 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 13:43:27,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=769012.0, ans=0.0 2024-09-18 13:43:48,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.59 vs. limit=22.5 2024-09-18 13:44:00,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=769068.6666666666, ans=0.2 2024-09-18 13:44:15,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769097.0, ans=0.1 2024-09-18 13:44:33,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=769125.3333333334, ans=0.2 2024-09-18 13:44:42,302 INFO [train.py:1198] (0/2) Epoch 43, batch 3050, loss[loss=0.2255, ctc_loss=0.1496, cr_loss=0.3793, over 20905.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1444, cr_loss=0.3704, over 4103171.56 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:44:48,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=769153.6666666666, ans=0.0 2024-09-18 13:44:56,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=769182.0, ans=0.2 2024-09-18 13:44:57,960 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:45:08,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=769182.0, ans=0.125 2024-09-18 13:45:10,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=769182.0, ans=0.125 2024-09-18 13:45:14,538 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.197e+02 2.343e+02 2.553e+02 3.195e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-18 13:45:29,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=769238.6666666666, ans=0.1 2024-09-18 13:46:00,955 INFO [train.py:1198] (0/2) Epoch 43, batch 3100, loss[loss=0.2147, ctc_loss=0.1407, cr_loss=0.3702, over 20978.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1443, cr_loss=0.3698, over 4089656.51 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:46:04,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=769295.3333333334, ans=0.125 2024-09-18 13:46:51,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=769380.3333333334, ans=0.2 2024-09-18 13:47:19,105 INFO [train.py:1198] (0/2) Epoch 43, batch 3150, loss[loss=0.2213, ctc_loss=0.1445, cr_loss=0.3842, over 20823.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1452, cr_loss=0.3709, over 4079674.20 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:47:47,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.286e+02 2.429e+02 2.564e+02 3.660e+02, threshold=4.857e+02, percent-clipped=0.0 2024-09-18 13:48:35,304 INFO [train.py:1198] (0/2) Epoch 43, batch 3200, loss[loss=0.2068, ctc_loss=0.1357, cr_loss=0.3553, over 20966.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.144, cr_loss=0.3687, over 4072275.51 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 32.0 2024-09-18 13:48:41,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=769578.6666666666, ans=0.025 2024-09-18 13:49:51,029 INFO [train.py:1198] (0/2) Epoch 43, batch 3250, loss[loss=0.1843, ctc_loss=0.1198, cr_loss=0.3225, over 20958.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.143, cr_loss=0.3669, over 4082652.39 frames. ], batch size: 49, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:49:54,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=769720.3333333334, ans=0.2 2024-09-18 13:50:15,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=769748.6666666666, ans=0.125 2024-09-18 13:50:19,909 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.252e+02 2.376e+02 2.544e+02 3.528e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-18 13:50:28,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-18 13:50:51,875 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:50:57,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=769833.6666666666, ans=0.125 2024-09-18 13:51:02,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=769833.6666666666, ans=0.125 2024-09-18 13:51:09,686 INFO [train.py:1198] (0/2) Epoch 43, batch 3300, loss[loss=0.2051, ctc_loss=0.1357, cr_loss=0.347, over 21015.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1426, cr_loss=0.3669, over 4089270.69 frames. ], batch size: 61, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:51:13,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=769862.0, ans=0.125 2024-09-18 13:51:25,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769890.3333333334, ans=0.1 2024-09-18 13:51:40,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=769918.6666666666, ans=0.025 2024-09-18 13:51:49,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=769918.6666666666, ans=0.125 2024-09-18 13:51:54,617 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.95 vs. limit=15.0 2024-09-18 13:52:25,241 INFO [train.py:1198] (0/2) Epoch 43, batch 3350, loss[loss=0.2227, ctc_loss=0.1439, cr_loss=0.394, over 20958.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.3679, over 4096191.28 frames. ], batch size: 51, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:52:36,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=770003.6666666666, ans=0.2 2024-09-18 13:52:56,922 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.168e+02 2.314e+02 2.443e+02 3.479e+02, threshold=4.628e+02, percent-clipped=0.0 2024-09-18 13:53:36,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=770117.0, ans=0.0 2024-09-18 13:53:38,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-18 13:53:44,226 INFO [train.py:1198] (0/2) Epoch 43, batch 3400, loss[loss=0.2324, ctc_loss=0.1508, cr_loss=0.408, over 20990.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3681, over 4102478.97 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:53:47,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=770145.3333333334, ans=0.0 2024-09-18 13:53:47,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=770145.3333333334, ans=0.125 2024-09-18 13:53:56,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=770145.3333333334, ans=0.125 2024-09-18 13:54:20,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2024-09-18 13:54:31,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=770230.3333333334, ans=0.0 2024-09-18 13:54:59,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-09-18 13:54:59,962 INFO [train.py:1198] (0/2) Epoch 43, batch 3450, loss[loss=0.2489, ctc_loss=0.1658, cr_loss=0.4157, over 20018.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1433, cr_loss=0.3685, over 4104432.51 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:55:00,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=770287.0, ans=0.025 2024-09-18 13:55:28,748 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.184e+02 2.337e+02 2.487e+02 5.895e+02, threshold=4.675e+02, percent-clipped=1.0 2024-09-18 13:55:35,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=770343.6666666666, ans=0.125 2024-09-18 13:55:39,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=770343.6666666666, ans=0.125 2024-09-18 13:56:15,442 INFO [train.py:1198] (0/2) Epoch 43, batch 3500, loss[loss=0.2468, ctc_loss=0.1676, cr_loss=0.3957, over 18261.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1434, cr_loss=0.3684, over 4092272.93 frames. ], batch size: 108, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:56:45,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=770457.0, ans=0.0 2024-09-18 13:56:51,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770485.3333333334, ans=0.1 2024-09-18 13:56:54,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=770485.3333333334, ans=0.125 2024-09-18 13:56:57,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=770485.3333333334, ans=0.0 2024-09-18 13:57:34,523 INFO [train.py:1198] (0/2) Epoch 43, batch 3550, loss[loss=0.2421, ctc_loss=0.1628, cr_loss=0.3963, over 19894.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1438, cr_loss=0.37, over 4104292.11 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:57:41,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=22.5 2024-09-18 13:57:45,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2024-09-18 13:57:46,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=770570.3333333334, ans=0.0 2024-09-18 13:58:02,960 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.200e+02 2.321e+02 2.480e+02 3.083e+02, threshold=4.642e+02, percent-clipped=0.0 2024-09-18 13:58:24,596 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-272000.pt 2024-09-18 13:58:54,120 INFO [train.py:1198] (0/2) Epoch 43, batch 3600, loss[loss=0.2112, ctc_loss=0.1403, cr_loss=0.3545, over 20872.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.3694, over 4092050.43 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 13:58:58,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=770712.0, ans=0.125 2024-09-18 13:59:15,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=770740.3333333334, ans=0.125 2024-09-18 13:59:32,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=770768.6666666666, ans=10.0 2024-09-18 13:59:59,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=770825.3333333334, ans=0.0 2024-09-18 14:00:09,881 INFO [train.py:1198] (0/2) Epoch 43, batch 3650, loss[loss=0.1657, ctc_loss=0.1072, cr_loss=0.2923, over 20357.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1436, cr_loss=0.3694, over 4095210.95 frames. ], batch size: 45, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:00:29,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=770882.0, ans=0.125 2024-09-18 14:00:38,659 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.228e+02 2.379e+02 2.572e+02 3.473e+02, threshold=4.759e+02, percent-clipped=0.0 2024-09-18 14:01:09,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=770967.0, ans=0.125 2024-09-18 14:01:25,981 INFO [train.py:1198] (0/2) Epoch 43, batch 3700, loss[loss=0.2125, ctc_loss=0.1378, cr_loss=0.3736, over 20767.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3695, over 4101044.00 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:02:44,078 INFO [train.py:1198] (0/2) Epoch 43, batch 3750, loss[loss=0.2646, ctc_loss=0.1814, cr_loss=0.4158, over 18117.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1435, cr_loss=0.3689, over 4100017.66 frames. ], batch size: 108, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:02:46,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2024-09-18 14:03:10,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=771165.3333333334, ans=0.0 2024-09-18 14:03:13,050 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.209e+02 2.315e+02 2.491e+02 3.607e+02, threshold=4.629e+02, percent-clipped=0.0 2024-09-18 14:03:36,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=771222.0, ans=0.0 2024-09-18 14:03:49,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=22.5 2024-09-18 14:03:49,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=771250.3333333334, ans=0.0 2024-09-18 14:03:59,952 INFO [train.py:1198] (0/2) Epoch 43, batch 3800, loss[loss=0.2455, ctc_loss=0.1638, cr_loss=0.4086, over 20681.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.3693, over 4106217.91 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:04:26,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=771307.0, ans=0.125 2024-09-18 14:04:48,701 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:04:56,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771363.6666666666, ans=0.1 2024-09-18 14:05:13,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2024-09-18 14:05:18,839 INFO [train.py:1198] (0/2) Epoch 43, batch 3850, loss[loss=0.2564, ctc_loss=0.1722, cr_loss=0.4209, over 20844.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3688, over 4101637.86 frames. ], batch size: 65, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:05:22,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=771420.3333333334, ans=0.125 2024-09-18 14:05:26,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=771420.3333333334, ans=0.0 2024-09-18 14:05:49,392 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.271e+02 2.397e+02 2.599e+02 4.820e+02, threshold=4.793e+02, percent-clipped=1.0 2024-09-18 14:06:30,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=771533.6666666666, ans=0.125 2024-09-18 14:06:34,755 INFO [train.py:1198] (0/2) Epoch 43, batch 3900, loss[loss=0.212, ctc_loss=0.1421, cr_loss=0.3494, over 20783.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1436, cr_loss=0.3697, over 4102025.26 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:06:35,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=771562.0, ans=0.2 2024-09-18 14:06:42,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=771562.0, ans=0.125 2024-09-18 14:06:58,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=771590.3333333334, ans=0.125 2024-09-18 14:07:06,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=771618.6666666666, ans=0.125 2024-09-18 14:07:50,646 INFO [train.py:1198] (0/2) Epoch 43, batch 3950, loss[loss=0.2068, ctc_loss=0.1322, cr_loss=0.373, over 21059.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.144, cr_loss=0.3706, over 4107757.85 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:08:13,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=771732.0, ans=0.0 2024-09-18 14:08:23,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.259e+02 2.381e+02 2.565e+02 3.615e+02, threshold=4.762e+02, percent-clipped=0.0 2024-09-18 14:08:50,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=771788.6666666666, ans=0.5 2024-09-18 14:09:09,459 INFO [train.py:1198] (0/2) Epoch 43, batch 4000, loss[loss=0.1886, ctc_loss=0.1242, cr_loss=0.3219, over 21067.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1444, cr_loss=0.3708, over 4104751.27 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:09:11,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=771845.3333333334, ans=0.125 2024-09-18 14:09:27,233 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-09-18 14:09:34,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=771873.6666666666, ans=0.2 2024-09-18 14:09:54,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-18 14:10:28,026 INFO [train.py:1198] (0/2) Epoch 43, batch 4050, loss[loss=0.2117, ctc_loss=0.1387, cr_loss=0.3652, over 20977.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1441, cr_loss=0.3706, over 4103699.94 frames. ], batch size: 50, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:10:49,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=772015.3333333334, ans=0.125 2024-09-18 14:10:58,423 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.209e+02 2.348e+02 2.489e+02 3.597e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 14:11:02,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=772043.6666666666, ans=0.0 2024-09-18 14:11:33,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772100.3333333334, ans=0.1 2024-09-18 14:11:43,630 INFO [train.py:1198] (0/2) Epoch 43, batch 4100, loss[loss=0.2124, ctc_loss=0.1438, cr_loss=0.3434, over 20283.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1448, cr_loss=0.3714, over 4098543.68 frames. ], batch size: 74, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:11:44,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=772128.6666666666, ans=0.2 2024-09-18 14:11:51,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2024-09-18 14:12:00,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=772157.0, ans=0.125 2024-09-18 14:12:24,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=772185.3333333334, ans=0.125 2024-09-18 14:12:35,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=772213.6666666666, ans=0.125 2024-09-18 14:12:46,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=772242.0, ans=0.2 2024-09-18 14:12:59,575 INFO [train.py:1198] (0/2) Epoch 43, batch 4150, loss[loss=0.2248, ctc_loss=0.145, cr_loss=0.3989, over 20878.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1446, cr_loss=0.3718, over 4098233.02 frames. ], batch size: 54, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:13:01,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=772270.3333333334, ans=0.0 2024-09-18 14:13:08,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=772270.3333333334, ans=0.0 2024-09-18 14:13:14,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=772298.6666666666, ans=0.125 2024-09-18 14:13:15,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.87 vs. limit=5.0 2024-09-18 14:13:29,775 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.236e+02 2.358e+02 2.482e+02 3.933e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-18 14:13:49,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=772355.3333333334, ans=0.2 2024-09-18 14:13:57,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=772355.3333333334, ans=0.125 2024-09-18 14:14:18,342 INFO [train.py:1198] (0/2) Epoch 43, batch 4200, loss[loss=0.2422, ctc_loss=0.1626, cr_loss=0.3981, over 20660.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1441, cr_loss=0.3708, over 4102363.18 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:14:30,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=772412.0, ans=0.025 2024-09-18 14:14:34,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=772440.3333333334, ans=0.025 2024-09-18 14:14:38,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=772440.3333333334, ans=0.125 2024-09-18 14:15:25,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=772525.3333333334, ans=0.125 2024-09-18 14:15:33,999 INFO [train.py:1198] (0/2) Epoch 43, batch 4250, loss[loss=0.2117, ctc_loss=0.1371, cr_loss=0.3729, over 20971.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1441, cr_loss=0.3704, over 4095213.40 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:15:55,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=772582.0, ans=0.95 2024-09-18 14:16:07,471 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.245e+02 2.375e+02 2.541e+02 3.751e+02, threshold=4.749e+02, percent-clipped=0.0 2024-09-18 14:16:25,796 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:16:30,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=772638.6666666666, ans=0.125 2024-09-18 14:16:39,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=772667.0, ans=0.07 2024-09-18 14:16:41,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=12.0 2024-09-18 14:16:52,658 INFO [train.py:1198] (0/2) Epoch 43, batch 4300, loss[loss=0.2194, ctc_loss=0.146, cr_loss=0.3667, over 19506.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1446, cr_loss=0.3709, over 4091725.88 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:17:05,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772695.3333333334, ans=0.1 2024-09-18 14:17:14,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=772723.6666666666, ans=0.5 2024-09-18 14:17:15,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=772723.6666666666, ans=0.0 2024-09-18 14:18:07,992 INFO [train.py:1198] (0/2) Epoch 43, batch 4350, loss[loss=0.2651, ctc_loss=0.1748, cr_loss=0.4519, over 19996.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1445, cr_loss=0.3708, over 4084088.36 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:18:09,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=772837.0, ans=0.2 2024-09-18 14:18:11,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772837.0, ans=0.1 2024-09-18 14:18:11,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=772837.0, ans=0.125 2024-09-18 14:18:20,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=772837.0, ans=0.0 2024-09-18 14:18:20,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=772837.0, ans=0.0 2024-09-18 14:18:22,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=772865.3333333334, ans=15.0 2024-09-18 14:18:38,361 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.255e+02 2.375e+02 2.544e+02 5.936e+02, threshold=4.750e+02, percent-clipped=1.0 2024-09-18 14:19:00,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=772922.0, ans=0.0 2024-09-18 14:19:09,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=772950.3333333334, ans=0.2 2024-09-18 14:19:19,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=772950.3333333334, ans=0.125 2024-09-18 14:19:27,215 INFO [train.py:1198] (0/2) Epoch 43, batch 4400, loss[loss=0.1724, ctc_loss=0.1111, cr_loss=0.3063, over 20949.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1439, cr_loss=0.37, over 4086979.48 frames. ], batch size: 48, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:19:53,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.03 vs. limit=10.0 2024-09-18 14:19:54,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=773007.0, ans=0.0 2024-09-18 14:20:10,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=773035.3333333334, ans=0.2 2024-09-18 14:20:16,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=773063.6666666666, ans=0.125 2024-09-18 14:20:26,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=773092.0, ans=0.0 2024-09-18 14:20:28,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=773092.0, ans=0.125 2024-09-18 14:20:43,124 INFO [train.py:1198] (0/2) Epoch 43, batch 4450, loss[loss=0.2044, ctc_loss=0.1321, cr_loss=0.3613, over 20951.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.144, cr_loss=0.37, over 4090564.65 frames. ], batch size: 50, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:21:13,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.242e+02 2.358e+02 2.562e+02 3.848e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-18 14:21:14,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=773177.0, ans=0.125 2024-09-18 14:21:37,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=773205.3333333334, ans=0.125 2024-09-18 14:21:40,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=773205.3333333334, ans=0.125 2024-09-18 14:21:52,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=773233.6666666666, ans=0.025 2024-09-18 14:22:01,235 INFO [train.py:1198] (0/2) Epoch 43, batch 4500, loss[loss=0.2548, ctc_loss=0.1742, cr_loss=0.403, over 18227.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1449, cr_loss=0.3715, over 4092883.79 frames. ], batch size: 108, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:22:15,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773290.3333333334, ans=0.1 2024-09-18 14:22:27,141 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:22:31,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=773318.6666666666, ans=0.125 2024-09-18 14:22:32,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=773318.6666666666, ans=0.125 2024-09-18 14:23:16,396 INFO [train.py:1198] (0/2) Epoch 43, batch 4550, loss[loss=0.2621, ctc_loss=0.1766, cr_loss=0.4272, over 18533.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1447, cr_loss=0.3713, over 4085475.37 frames. ], batch size: 108, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:23:25,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=773403.6666666666, ans=0.125 2024-09-18 14:23:36,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=773432.0, ans=0.07 2024-09-18 14:23:46,700 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.187e+02 2.341e+02 2.498e+02 6.645e+02, threshold=4.683e+02, percent-clipped=1.0 2024-09-18 14:23:59,314 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:24:32,259 INFO [train.py:1198] (0/2) Epoch 43, batch 4600, loss[loss=0.2417, ctc_loss=0.1593, cr_loss=0.4118, over 20978.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1447, cr_loss=0.3709, over 4085186.49 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:24:44,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2024-09-18 14:24:58,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=773573.6666666666, ans=0.2 2024-09-18 14:25:04,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=773602.0, ans=0.125 2024-09-18 14:25:23,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=773630.3333333334, ans=0.125 2024-09-18 14:25:33,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=773630.3333333334, ans=0.125 2024-09-18 14:25:43,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=773658.6666666666, ans=0.125 2024-09-18 14:25:50,624 INFO [train.py:1198] (0/2) Epoch 43, batch 4650, loss[loss=0.2467, ctc_loss=0.1652, cr_loss=0.4078, over 20058.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1442, cr_loss=0.37, over 4088050.00 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:26:07,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=773715.3333333334, ans=0.0 2024-09-18 14:26:20,822 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.314e+02 2.429e+02 2.559e+02 3.308e+02, threshold=4.857e+02, percent-clipped=0.0 2024-09-18 14:26:33,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=773743.6666666666, ans=0.2 2024-09-18 14:26:45,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773772.0, ans=0.1 2024-09-18 14:26:47,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.99 vs. limit=10.0 2024-09-18 14:27:09,637 INFO [train.py:1198] (0/2) Epoch 43, batch 4700, loss[loss=0.2256, ctc_loss=0.1525, cr_loss=0.3659, over 21033.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1449, cr_loss=0.3715, over 4090245.24 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:28:03,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=22.5 2024-09-18 14:28:04,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=773913.6666666666, ans=0.125 2024-09-18 14:28:24,960 INFO [train.py:1198] (0/2) Epoch 43, batch 4750, loss[loss=0.2148, ctc_loss=0.1409, cr_loss=0.3694, over 20940.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3696, over 4100479.66 frames. ], batch size: 60, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:28:55,282 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.217e+02 2.357e+02 2.516e+02 3.017e+02, threshold=4.715e+02, percent-clipped=0.0 2024-09-18 14:29:40,666 INFO [train.py:1198] (0/2) Epoch 43, batch 4800, loss[loss=0.2276, ctc_loss=0.1514, cr_loss=0.3812, over 21015.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.3679, over 4104188.73 frames. ], batch size: 61, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:30:28,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-09-18 14:30:59,964 INFO [train.py:1198] (0/2) Epoch 43, batch 4850, loss[loss=0.1862, ctc_loss=0.1222, cr_loss=0.3202, over 20949.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1434, cr_loss=0.3679, over 4096334.14 frames. ], batch size: 51, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:31:16,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=22.5 2024-09-18 14:31:30,528 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.252e+02 2.349e+02 2.491e+02 3.780e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 14:31:45,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-18 14:31:56,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=774338.6666666666, ans=0.125 2024-09-18 14:32:13,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=774367.0, ans=0.125 2024-09-18 14:32:16,081 INFO [train.py:1198] (0/2) Epoch 43, batch 4900, loss[loss=0.2248, ctc_loss=0.1482, cr_loss=0.3832, over 20659.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3683, over 4091419.55 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:32:27,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=774395.3333333334, ans=0.0 2024-09-18 14:32:30,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2024-09-18 14:32:55,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=774452.0, ans=15.0 2024-09-18 14:33:03,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=774480.3333333334, ans=0.125 2024-09-18 14:33:08,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=774480.3333333334, ans=0.125 2024-09-18 14:33:30,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=774508.6666666666, ans=0.2 2024-09-18 14:33:34,765 INFO [train.py:1198] (0/2) Epoch 43, batch 4950, loss[loss=0.2017, ctc_loss=0.1313, cr_loss=0.352, over 20800.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1429, cr_loss=0.367, over 4103150.40 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:34:05,980 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.233e+02 2.344e+02 2.447e+02 3.997e+02, threshold=4.688e+02, percent-clipped=0.0 2024-09-18 14:34:26,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=774622.0, ans=0.2 2024-09-18 14:34:29,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=774622.0, ans=0.2 2024-09-18 14:34:43,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-09-18 14:34:50,431 INFO [train.py:1198] (0/2) Epoch 43, batch 5000, loss[loss=0.2302, ctc_loss=0.1534, cr_loss=0.384, over 18235.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1435, cr_loss=0.3683, over 4094617.55 frames. ], batch size: 108, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:35:05,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=774707.0, ans=0.0 2024-09-18 14:35:35,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=774763.6666666666, ans=0.125 2024-09-18 14:35:39,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=774763.6666666666, ans=0.2 2024-09-18 14:35:39,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=774763.6666666666, ans=0.125 2024-09-18 14:35:40,118 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-18 14:35:54,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=774792.0, ans=0.2 2024-09-18 14:36:04,296 INFO [train.py:1198] (0/2) Epoch 43, batch 5050, loss[loss=0.2512, ctc_loss=0.1718, cr_loss=0.3966, over 19511.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1433, cr_loss=0.3684, over 4086862.30 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:36:04,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2024-09-18 14:36:07,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=774820.3333333334, ans=0.2 2024-09-18 14:36:21,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-09-18 14:36:34,574 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:36:35,660 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.244e+02 2.351e+02 2.522e+02 4.464e+02, threshold=4.701e+02, percent-clipped=0.0 2024-09-18 14:37:03,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=774933.6666666666, ans=0.0 2024-09-18 14:37:14,596 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-09-18 14:37:18,367 INFO [train.py:1198] (0/2) Epoch 43, batch 5100, loss[loss=0.1934, ctc_loss=0.1262, cr_loss=0.3356, over 20986.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3674, over 4083140.46 frames. ], batch size: 49, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:37:27,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=774962.0, ans=0.0 2024-09-18 14:37:39,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=774990.3333333334, ans=0.04949747468305833 2024-09-18 14:38:20,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=775075.3333333334, ans=0.125 2024-09-18 14:38:23,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=775075.3333333334, ans=0.125 2024-09-18 14:38:30,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775103.6666666666, ans=0.1 2024-09-18 14:38:31,723 INFO [train.py:1198] (0/2) Epoch 43, batch 5150, loss[loss=0.2141, ctc_loss=0.1393, cr_loss=0.3742, over 20947.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1433, cr_loss=0.3681, over 4082905.43 frames. ], batch size: 60, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:38:55,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=775132.0, ans=0.025 2024-09-18 14:39:02,365 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.237e+02 2.354e+02 2.501e+02 3.053e+02, threshold=4.709e+02, percent-clipped=0.0 2024-09-18 14:39:02,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=775160.3333333334, ans=0.125 2024-09-18 14:39:04,631 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-18 14:39:05,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=775160.3333333334, ans=0.125 2024-09-18 14:39:44,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=775217.0, ans=0.2 2024-09-18 14:39:48,118 INFO [train.py:1198] (0/2) Epoch 43, batch 5200, loss[loss=0.1955, ctc_loss=0.1249, cr_loss=0.3527, over 20920.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3686, over 4084699.82 frames. ], batch size: 48, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:40:18,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.33 vs. limit=8.0 2024-09-18 14:40:25,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=775302.0, ans=0.125 2024-09-18 14:40:41,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=775330.3333333334, ans=0.125 2024-09-18 14:40:43,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=775330.3333333334, ans=0.0 2024-09-18 14:41:02,542 INFO [train.py:1198] (0/2) Epoch 43, batch 5250, loss[loss=0.2186, ctc_loss=0.1428, cr_loss=0.379, over 20972.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3683, over 4070077.85 frames. ], batch size: 58, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:41:35,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.242e+02 2.378e+02 2.558e+02 5.799e+02, threshold=4.757e+02, percent-clipped=1.0 2024-09-18 14:41:37,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=775443.6666666666, ans=0.125 2024-09-18 14:41:52,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=775472.0, ans=0.0 2024-09-18 14:42:13,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2024-09-18 14:42:17,320 INFO [train.py:1198] (0/2) Epoch 43, batch 5300, loss[loss=0.2375, ctc_loss=0.1567, cr_loss=0.4043, over 20076.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1435, cr_loss=0.3683, over 4083230.82 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:42:32,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775557.0, ans=0.1 2024-09-18 14:42:44,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=775557.0, ans=0.2 2024-09-18 14:43:27,695 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-18 14:43:34,286 INFO [train.py:1198] (0/2) Epoch 43, batch 5350, loss[loss=0.1746, ctc_loss=0.1172, cr_loss=0.2872, over 20985.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.369, over 4095090.32 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:43:37,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=775670.3333333334, ans=0.0 2024-09-18 14:43:55,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2024-09-18 14:44:04,686 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:44:07,184 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.243e+02 2.344e+02 2.525e+02 3.319e+02, threshold=4.688e+02, percent-clipped=0.0 2024-09-18 14:44:49,420 INFO [train.py:1198] (0/2) Epoch 43, batch 5400, loss[loss=0.2367, ctc_loss=0.1586, cr_loss=0.3902, over 20961.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3695, over 4089974.44 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:45:04,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=775840.3333333334, ans=0.0 2024-09-18 14:45:06,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=775840.3333333334, ans=0.125 2024-09-18 14:45:12,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=775840.3333333334, ans=0.0 2024-09-18 14:45:42,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=775897.0, ans=0.125 2024-09-18 14:45:59,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=775925.3333333334, ans=0.0 2024-09-18 14:46:03,613 INFO [train.py:1198] (0/2) Epoch 43, batch 5450, loss[loss=0.2158, ctc_loss=0.1399, cr_loss=0.3797, over 21036.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.145, cr_loss=0.3715, over 4087095.18 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:46:03,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=775953.6666666666, ans=0.0 2024-09-18 14:46:35,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2024-09-18 14:46:36,520 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.278e+02 2.397e+02 2.570e+02 3.769e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 14:46:53,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=776038.6666666666, ans=0.125 2024-09-18 14:47:18,174 INFO [train.py:1198] (0/2) Epoch 43, batch 5500, loss[loss=0.2403, ctc_loss=0.1619, cr_loss=0.392, over 19990.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1449, cr_loss=0.3711, over 4098210.38 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:47:30,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=776095.3333333334, ans=0.0 2024-09-18 14:48:17,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=776208.6666666666, ans=0.125 2024-09-18 14:48:35,632 INFO [train.py:1198] (0/2) Epoch 43, batch 5550, loss[loss=0.2113, ctc_loss=0.1381, cr_loss=0.3663, over 20110.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1447, cr_loss=0.3704, over 4094946.11 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:48:46,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=776237.0, ans=0.2 2024-09-18 14:48:50,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=776265.3333333334, ans=0.025 2024-09-18 14:49:08,153 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.237e+02 2.345e+02 2.506e+02 4.825e+02, threshold=4.689e+02, percent-clipped=1.0 2024-09-18 14:49:47,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=776350.3333333334, ans=0.0 2024-09-18 14:49:50,332 INFO [train.py:1198] (0/2) Epoch 43, batch 5600, loss[loss=0.2018, ctc_loss=0.1324, cr_loss=0.3469, over 21076.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.145, cr_loss=0.371, over 4097797.52 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:50:13,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.87 vs. limit=22.5 2024-09-18 14:50:15,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=776407.0, ans=0.125 2024-09-18 14:50:51,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=776492.0, ans=0.125 2024-09-18 14:51:04,572 INFO [train.py:1198] (0/2) Epoch 43, batch 5650, loss[loss=0.2197, ctc_loss=0.1439, cr_loss=0.3788, over 21032.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.145, cr_loss=0.3709, over 4098107.99 frames. ], batch size: 61, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:51:16,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=776520.3333333334, ans=0.0 2024-09-18 14:51:19,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=776548.6666666666, ans=0.0 2024-09-18 14:51:21,513 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-18 14:51:24,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=776548.6666666666, ans=0.1 2024-09-18 14:51:27,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=776548.6666666666, ans=0.125 2024-09-18 14:51:41,149 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.286e+02 2.408e+02 2.557e+02 4.624e+02, threshold=4.815e+02, percent-clipped=0.0 2024-09-18 14:51:50,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=776605.3333333334, ans=0.0 2024-09-18 14:51:59,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=776605.3333333334, ans=0.025 2024-09-18 14:52:09,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=776633.6666666666, ans=0.125 2024-09-18 14:52:14,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-18 14:52:20,639 INFO [train.py:1198] (0/2) Epoch 43, batch 5700, loss[loss=0.2927, ctc_loss=0.206, cr_loss=0.4335, over 14533.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1458, cr_loss=0.3719, over 4084457.06 frames. ], batch size: 149, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:52:25,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=776662.0, ans=0.0 2024-09-18 14:52:31,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.72 vs. limit=15.0 2024-09-18 14:52:43,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=776690.3333333334, ans=0.125 2024-09-18 14:53:02,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=776718.6666666666, ans=0.0 2024-09-18 14:53:35,237 INFO [train.py:1198] (0/2) Epoch 43, batch 5750, loss[loss=0.2066, ctc_loss=0.1357, cr_loss=0.3546, over 20904.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1452, cr_loss=0.3718, over 4098363.21 frames. ], batch size: 54, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:53:44,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=776803.6666666666, ans=0.035 2024-09-18 14:54:09,158 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.180e+02 2.325e+02 2.491e+02 3.054e+02, threshold=4.650e+02, percent-clipped=0.0 2024-09-18 14:54:20,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=776888.6666666666, ans=0.125 2024-09-18 14:54:42,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=776917.0, ans=0.125 2024-09-18 14:54:49,874 INFO [train.py:1198] (0/2) Epoch 43, batch 5800, loss[loss=0.1939, ctc_loss=0.1243, cr_loss=0.3478, over 20895.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1443, cr_loss=0.3696, over 4100089.54 frames. ], batch size: 54, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:55:00,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776945.3333333334, ans=0.1 2024-09-18 14:55:10,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2024-09-18 14:55:35,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=777030.3333333334, ans=0.05 2024-09-18 14:56:05,027 INFO [train.py:1198] (0/2) Epoch 43, batch 5850, loss[loss=0.2519, ctc_loss=0.1659, cr_loss=0.4296, over 20640.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1439, cr_loss=0.3699, over 4107012.02 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:56:05,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=777087.0, ans=0.125 2024-09-18 14:56:15,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777087.0, ans=0.1 2024-09-18 14:56:23,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=777115.3333333334, ans=0.0 2024-09-18 14:56:39,743 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.219e+02 2.331e+02 2.519e+02 5.601e+02, threshold=4.663e+02, percent-clipped=1.0 2024-09-18 14:56:53,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-18 14:57:14,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2024-09-18 14:57:19,928 INFO [train.py:1198] (0/2) Epoch 43, batch 5900, loss[loss=0.2159, ctc_loss=0.1444, cr_loss=0.3578, over 21038.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1437, cr_loss=0.3698, over 4108037.75 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:57:21,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=777228.6666666666, ans=0.2 2024-09-18 14:57:23,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=777228.6666666666, ans=0.125 2024-09-18 14:57:35,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=777257.0, ans=0.0 2024-09-18 14:58:10,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-09-18 14:58:36,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2024-09-18 14:58:36,424 INFO [train.py:1198] (0/2) Epoch 43, batch 5950, loss[loss=0.236, ctc_loss=0.159, cr_loss=0.3849, over 19351.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1439, cr_loss=0.3697, over 4105984.86 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 16.0 2024-09-18 14:58:51,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=777398.6666666666, ans=0.125 2024-09-18 14:59:01,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=777398.6666666666, ans=0.125 2024-09-18 14:59:05,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=777427.0, ans=0.0 2024-09-18 14:59:09,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-09-18 14:59:10,483 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.295e+02 2.422e+02 2.594e+02 3.751e+02, threshold=4.844e+02, percent-clipped=0.0 2024-09-18 14:59:25,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=777455.3333333334, ans=0.125 2024-09-18 14:59:50,746 INFO [train.py:1198] (0/2) Epoch 43, batch 6000, loss[loss=0.1593, ctc_loss=0.1008, cr_loss=0.2927, over 20258.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1437, cr_loss=0.3698, over 4106099.82 frames. ], batch size: 45, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 14:59:50,747 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 15:00:10,152 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.0392, 3.1757, 2.3514, 2.9637], device='cuda:0') 2024-09-18 15:00:11,064 INFO [train.py:1230] (0/2) Epoch 43, validation: loss=0.039, ctc_loss=0.039, cr_loss=1.514e-14, over 944034.00 frames. 2024-09-18 15:00:11,064 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 15:00:12,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2024-09-18 15:00:14,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777512.0, ans=0.1 2024-09-18 15:00:29,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-09-18 15:00:48,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=777568.6666666666, ans=0.125 2024-09-18 15:01:13,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=777625.3333333334, ans=0.2 2024-09-18 15:01:19,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=777625.3333333334, ans=0.025 2024-09-18 15:01:24,839 INFO [train.py:1198] (0/2) Epoch 43, batch 6050, loss[loss=0.2054, ctc_loss=0.1334, cr_loss=0.36, over 20980.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1449, cr_loss=0.3714, over 4088050.38 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 32.0 2024-09-18 15:01:36,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=777653.6666666666, ans=0.125 2024-09-18 15:01:59,007 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.298e+02 2.460e+02 2.615e+02 3.264e+02, threshold=4.919e+02, percent-clipped=0.0 2024-09-18 15:02:13,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-18 15:02:23,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=777738.6666666666, ans=0.0 2024-09-18 15:02:27,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=777767.0, ans=0.125 2024-09-18 15:02:40,546 INFO [train.py:1198] (0/2) Epoch 43, batch 6100, loss[loss=0.2161, ctc_loss=0.1422, cr_loss=0.37, over 20975.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.37, over 4101801.44 frames. ], batch size: 55, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:02:51,553 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=12.0 2024-09-18 15:02:56,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=777823.6666666666, ans=0.125 2024-09-18 15:02:57,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=777823.6666666666, ans=0.125 2024-09-18 15:03:08,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=777852.0, ans=0.0 2024-09-18 15:03:55,029 INFO [train.py:1198] (0/2) Epoch 43, batch 6150, loss[loss=0.2335, ctc_loss=0.1608, cr_loss=0.3632, over 14800.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.3691, over 4076144.98 frames. ], batch size: 150, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:04:27,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=777993.6666666666, ans=0.2 2024-09-18 15:04:28,652 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.235e+02 2.400e+02 2.551e+02 3.329e+02, threshold=4.800e+02, percent-clipped=0.0 2024-09-18 15:04:33,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=777993.6666666666, ans=0.0 2024-09-18 15:04:39,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2024-09-18 15:05:08,098 INFO [train.py:1198] (0/2) Epoch 43, batch 6200, loss[loss=0.2688, ctc_loss=0.1805, cr_loss=0.4413, over 18395.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1455, cr_loss=0.3715, over 4053540.51 frames. ], batch size: 108, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:05:44,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=12.0 2024-09-18 15:05:51,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=778135.3333333334, ans=0.125 2024-09-18 15:05:55,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=778163.6666666666, ans=0.1 2024-09-18 15:06:17,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=778192.0, ans=0.2 2024-09-18 15:06:23,317 INFO [train.py:1198] (0/2) Epoch 43, batch 6250, loss[loss=0.255, ctc_loss=0.169, cr_loss=0.4296, over 20686.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1459, cr_loss=0.3718, over 4044612.42 frames. ], batch size: 68, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:06:36,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-18 15:06:56,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=22.5 2024-09-18 15:06:57,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.292e+02 2.418e+02 2.579e+02 4.063e+02, threshold=4.837e+02, percent-clipped=0.0 2024-09-18 15:07:11,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=778305.3333333334, ans=0.125 2024-09-18 15:07:16,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=778305.3333333334, ans=0.1 2024-09-18 15:07:22,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=778333.6666666666, ans=0.035 2024-09-18 15:07:35,717 INFO [train.py:1198] (0/2) Epoch 43, batch 6300, loss[loss=0.2783, ctc_loss=0.1973, cr_loss=0.4052, over 14161.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1487, cr_loss=0.3743, over 3943525.95 frames. ], batch size: 149, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:07:52,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=778390.3333333334, ans=15.0 2024-09-18 15:08:04,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-09-18 15:08:11,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-18 15:08:42,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=778475.3333333334, ans=0.0 2024-09-18 15:08:48,052 INFO [train.py:1198] (0/2) Epoch 43, batch 6350, loss[loss=0.2552, ctc_loss=0.1791, cr_loss=0.3802, over 14654.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1538, cr_loss=0.3772, over 3716577.05 frames. ], batch size: 149, lr: 1.90e-03, grad_scale: 32.0 2024-09-18 15:09:21,174 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.095e+02 2.681e+02 2.898e+02 3.118e+02 4.424e+02, threshold=5.797e+02, percent-clipped=0.0 2024-09-18 15:09:47,227 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-43.pt 2024-09-18 15:10:38,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.47 vs. limit=10.0 2024-09-18 15:10:38,529 INFO [train.py:1198] (0/2) Epoch 44, batch 0, loss[loss=0.2561, ctc_loss=0.1745, cr_loss=0.4078, over 18255.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1745, cr_loss=0.4078, over 18255.00 frames. ], batch size: 108, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:10:38,529 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 15:10:56,828 INFO [train.py:1230] (0/2) Epoch 44, validation: loss=0.03933, ctc_loss=0.03933, cr_loss=1.444e-14, over 944034.00 frames. 2024-09-18 15:10:56,828 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 15:11:03,420 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:11:42,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-09-18 15:11:49,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=778704.8333333334, ans=0.125 2024-09-18 15:12:05,546 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.47 vs. limit=10.0 2024-09-18 15:12:11,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-18 15:12:14,858 INFO [train.py:1198] (0/2) Epoch 44, batch 50, loss[loss=0.227, ctc_loss=0.1511, cr_loss=0.3797, over 21049.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1469, cr_loss=0.3756, over 922640.40 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:12:16,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=778761.5, ans=0.2 2024-09-18 15:12:52,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=778818.1666666666, ans=0.0 2024-09-18 15:13:05,000 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.200e+02 2.332e+02 2.528e+02 3.841e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-18 15:13:11,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=778846.5, ans=0.1 2024-09-18 15:13:30,712 INFO [train.py:1198] (0/2) Epoch 44, batch 100, loss[loss=0.222, ctc_loss=0.1438, cr_loss=0.391, over 20907.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1448, cr_loss=0.3709, over 1620585.74 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:13:49,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.38 vs. limit=15.0 2024-09-18 15:13:52,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=778931.5, ans=0.0 2024-09-18 15:14:14,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778988.1666666666, ans=0.1 2024-09-18 15:14:17,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=778988.1666666666, ans=0.0 2024-09-18 15:14:21,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2024-09-18 15:14:26,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=778988.1666666666, ans=0.125 2024-09-18 15:14:32,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779016.5, ans=0.1 2024-09-18 15:14:35,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=779016.5, ans=0.2 2024-09-18 15:14:38,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=22.5 2024-09-18 15:14:40,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=779016.5, ans=0.025 2024-09-18 15:14:45,750 INFO [train.py:1198] (0/2) Epoch 44, batch 150, loss[loss=0.1923, ctc_loss=0.1246, cr_loss=0.3382, over 19960.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1452, cr_loss=0.3708, over 2159720.81 frames. ], batch size: 44, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:14:48,269 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=15.0 2024-09-18 15:15:04,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=779073.1666666666, ans=0.125 2024-09-18 15:15:20,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779101.5, ans=0.1 2024-09-18 15:15:20,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=779101.5, ans=0.125 2024-09-18 15:15:35,933 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.231e+02 2.391e+02 2.493e+02 5.095e+02, threshold=4.781e+02, percent-clipped=2.0 2024-09-18 15:16:04,851 INFO [train.py:1198] (0/2) Epoch 44, batch 200, loss[loss=0.1969, ctc_loss=0.1275, cr_loss=0.3472, over 20987.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1438, cr_loss=0.368, over 2602900.11 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:17:08,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=779299.8333333334, ans=0.0 2024-09-18 15:17:23,327 INFO [train.py:1198] (0/2) Epoch 44, batch 250, loss[loss=0.2251, ctc_loss=0.1519, cr_loss=0.3663, over 20617.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1438, cr_loss=0.3685, over 2925755.26 frames. ], batch size: 68, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:18:12,900 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.204e+02 2.348e+02 2.462e+02 5.022e+02, threshold=4.695e+02, percent-clipped=1.0 2024-09-18 15:18:38,352 INFO [train.py:1198] (0/2) Epoch 44, batch 300, loss[loss=0.2405, ctc_loss=0.1587, cr_loss=0.4087, over 19278.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3707, over 3169312.26 frames. ], batch size: 90, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:18:50,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=779469.8333333334, ans=0.025 2024-09-18 15:18:56,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=779498.1666666666, ans=0.125 2024-09-18 15:19:14,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=779526.5, ans=0.0 2024-09-18 15:19:17,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=779526.5, ans=0.0 2024-09-18 15:19:54,139 INFO [train.py:1198] (0/2) Epoch 44, batch 350, loss[loss=0.224, ctc_loss=0.1446, cr_loss=0.3967, over 21022.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1456, cr_loss=0.3718, over 3364028.53 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:19:58,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=779611.5, ans=0.0 2024-09-18 15:20:10,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.91 vs. limit=10.0 2024-09-18 15:20:17,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=779639.8333333334, ans=0.125 2024-09-18 15:20:26,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=779668.1666666666, ans=0.125 2024-09-18 15:20:29,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779668.1666666666, ans=0.1 2024-09-18 15:20:43,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=779696.5, ans=0.0 2024-09-18 15:20:44,403 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.249e+02 2.382e+02 2.585e+02 6.896e+02, threshold=4.763e+02, percent-clipped=1.0 2024-09-18 15:21:10,216 INFO [train.py:1198] (0/2) Epoch 44, batch 400, loss[loss=0.2494, ctc_loss=0.1669, cr_loss=0.4127, over 19495.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1453, cr_loss=0.3712, over 3526050.05 frames. ], batch size: 90, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:21:10,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=779753.1666666666, ans=0.0 2024-09-18 15:21:16,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779753.1666666666, ans=0.1 2024-09-18 15:21:51,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779809.8333333334, ans=0.1 2024-09-18 15:22:03,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=779838.1666666666, ans=0.125 2024-09-18 15:22:28,640 INFO [train.py:1198] (0/2) Epoch 44, batch 450, loss[loss=0.1937, ctc_loss=0.1257, cr_loss=0.3397, over 20969.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.145, cr_loss=0.3704, over 3655511.96 frames. ], batch size: 49, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:22:33,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=779894.8333333334, ans=0.125 2024-09-18 15:22:35,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779894.8333333334, ans=0.1 2024-09-18 15:22:36,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=779894.8333333334, ans=0.125 2024-09-18 15:22:50,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=779923.1666666666, ans=0.0 2024-09-18 15:22:51,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=779923.1666666666, ans=0.0 2024-09-18 15:23:14,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2024-09-18 15:23:22,936 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.261e+02 2.364e+02 2.599e+02 3.244e+02, threshold=4.729e+02, percent-clipped=0.0 2024-09-18 15:23:33,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=780008.1666666666, ans=0.0 2024-09-18 15:23:47,006 INFO [train.py:1198] (0/2) Epoch 44, batch 500, loss[loss=0.2119, ctc_loss=0.1367, cr_loss=0.3759, over 20894.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1448, cr_loss=0.3706, over 3754057.77 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:23:53,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=780036.5, ans=0.0 2024-09-18 15:24:08,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780064.8333333334, ans=0.1 2024-09-18 15:24:13,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=780064.8333333334, ans=0.2 2024-09-18 15:24:31,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=780121.5, ans=0.125 2024-09-18 15:24:34,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780121.5, ans=0.1 2024-09-18 15:25:02,022 INFO [train.py:1198] (0/2) Epoch 44, batch 550, loss[loss=0.2334, ctc_loss=0.1553, cr_loss=0.3906, over 19444.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1448, cr_loss=0.3713, over 3828491.99 frames. ], batch size: 90, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:25:40,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=780234.8333333334, ans=0.0 2024-09-18 15:25:53,769 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.294e+02 2.377e+02 2.572e+02 1.494e+03, threshold=4.754e+02, percent-clipped=2.0 2024-09-18 15:26:17,803 INFO [train.py:1198] (0/2) Epoch 44, batch 600, loss[loss=0.171, ctc_loss=0.1109, cr_loss=0.3006, over 20983.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.145, cr_loss=0.3711, over 3876036.03 frames. ], batch size: 51, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:26:33,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-18 15:27:26,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=780433.1666666666, ans=0.125 2024-09-18 15:27:36,975 INFO [train.py:1198] (0/2) Epoch 44, batch 650, loss[loss=0.2144, ctc_loss=0.1393, cr_loss=0.3755, over 21047.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.145, cr_loss=0.3705, over 3912340.36 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:27:48,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780461.5, ans=0.1 2024-09-18 15:27:55,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780489.8333333334, ans=0.1 2024-09-18 15:28:04,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=780489.8333333334, ans=0.125 2024-09-18 15:28:10,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780518.1666666666, ans=0.1 2024-09-18 15:28:26,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=780546.5, ans=0.0 2024-09-18 15:28:26,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=780546.5, ans=0.0 2024-09-18 15:28:31,084 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.233e+02 2.345e+02 2.494e+02 3.601e+02, threshold=4.689e+02, percent-clipped=0.0 2024-09-18 15:28:52,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=780574.8333333334, ans=0.125 2024-09-18 15:28:55,557 INFO [train.py:1198] (0/2) Epoch 44, batch 700, loss[loss=0.1924, ctc_loss=0.125, cr_loss=0.3371, over 20958.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1442, cr_loss=0.3687, over 3938877.61 frames. ], batch size: 50, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:30:10,965 INFO [train.py:1198] (0/2) Epoch 44, batch 750, loss[loss=0.2285, ctc_loss=0.1504, cr_loss=0.3903, over 21062.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1455, cr_loss=0.3717, over 3969226.46 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:30:18,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=780744.8333333334, ans=0.125 2024-09-18 15:30:22,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=780744.8333333334, ans=0.125 2024-09-18 15:30:38,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=780773.1666666666, ans=0.1 2024-09-18 15:31:02,431 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.229e+02 2.376e+02 2.533e+02 3.347e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-18 15:31:26,641 INFO [train.py:1198] (0/2) Epoch 44, batch 800, loss[loss=0.1916, ctc_loss=0.1249, cr_loss=0.3336, over 20954.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1457, cr_loss=0.3715, over 3970464.19 frames. ], batch size: 49, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:31:46,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-09-18 15:31:48,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=780914.8333333334, ans=0.07 2024-09-18 15:31:51,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-18 15:32:04,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=780943.1666666666, ans=0.125 2024-09-18 15:32:07,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=780943.1666666666, ans=0.125 2024-09-18 15:32:07,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=780943.1666666666, ans=0.2 2024-09-18 15:32:18,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=780971.5, ans=0.125 2024-09-18 15:32:26,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2024-09-18 15:32:42,231 INFO [train.py:1198] (0/2) Epoch 44, batch 850, loss[loss=0.2015, ctc_loss=0.1305, cr_loss=0.3552, over 21062.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.3699, over 4007226.82 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:32:46,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781028.1666666666, ans=0.1 2024-09-18 15:33:38,118 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.222e+02 2.405e+02 2.536e+02 3.684e+02, threshold=4.811e+02, percent-clipped=0.0 2024-09-18 15:34:03,775 INFO [train.py:1198] (0/2) Epoch 44, batch 900, loss[loss=0.2253, ctc_loss=0.1524, cr_loss=0.3643, over 20642.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1441, cr_loss=0.3687, over 4034314.84 frames. ], batch size: 68, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:34:11,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=781169.8333333334, ans=0.2 2024-09-18 15:34:30,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781198.1666666666, ans=0.1 2024-09-18 15:34:32,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.61 vs. limit=10.0 2024-09-18 15:35:20,465 INFO [train.py:1198] (0/2) Epoch 44, batch 950, loss[loss=0.1729, ctc_loss=0.1132, cr_loss=0.2987, over 20976.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1433, cr_loss=0.3676, over 4049519.66 frames. ], batch size: 50, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:36:04,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=781396.5, ans=0.125 2024-09-18 15:36:09,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=781396.5, ans=0.025 2024-09-18 15:36:13,302 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.223e+02 2.386e+02 2.536e+02 3.288e+02, threshold=4.771e+02, percent-clipped=0.0 2024-09-18 15:36:15,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781396.5, ans=0.1 2024-09-18 15:36:36,711 INFO [train.py:1198] (0/2) Epoch 44, batch 1000, loss[loss=0.2289, ctc_loss=0.1492, cr_loss=0.3984, over 20877.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1425, cr_loss=0.3671, over 4069154.52 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:36:39,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.09 vs. limit=15.0 2024-09-18 15:36:47,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=781453.1666666666, ans=0.1 2024-09-18 15:36:47,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=781453.1666666666, ans=0.125 2024-09-18 15:36:59,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=781481.5, ans=0.0 2024-09-18 15:37:05,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=781509.8333333334, ans=0.0 2024-09-18 15:37:17,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781509.8333333334, ans=0.1 2024-09-18 15:37:25,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=781538.1666666666, ans=0.125 2024-09-18 15:37:44,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=781566.5, ans=0.0 2024-09-18 15:37:52,025 INFO [train.py:1198] (0/2) Epoch 44, batch 1050, loss[loss=0.2011, ctc_loss=0.1308, cr_loss=0.3514, over 20763.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1435, cr_loss=0.369, over 4078202.81 frames. ], batch size: 56, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:38:28,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-18 15:38:45,558 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.240e+02 2.329e+02 2.481e+02 2.734e+02, threshold=4.658e+02, percent-clipped=0.0 2024-09-18 15:39:11,763 INFO [train.py:1198] (0/2) Epoch 44, batch 1100, loss[loss=0.1994, ctc_loss=0.129, cr_loss=0.3523, over 21063.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3683, over 4082555.13 frames. ], batch size: 53, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:39:36,765 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-18 15:39:39,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=781764.8333333334, ans=0.0 2024-09-18 15:40:14,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=781849.8333333334, ans=0.2 2024-09-18 15:40:27,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=781849.8333333334, ans=0.0 2024-09-18 15:40:30,679 INFO [train.py:1198] (0/2) Epoch 44, batch 1150, loss[loss=0.2207, ctc_loss=0.1421, cr_loss=0.3934, over 21020.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3673, over 4099641.17 frames. ], batch size: 61, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:40:43,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=781878.1666666666, ans=0.1 2024-09-18 15:40:55,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2024-09-18 15:41:24,241 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.278e+02 2.383e+02 2.503e+02 4.330e+02, threshold=4.766e+02, percent-clipped=0.0 2024-09-18 15:41:35,024 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-276000.pt 2024-09-18 15:41:36,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=781991.5, ans=0.0 2024-09-18 15:41:48,009 INFO [train.py:1198] (0/2) Epoch 44, batch 1200, loss[loss=0.2189, ctc_loss=0.1455, cr_loss=0.3669, over 21082.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3674, over 4112613.43 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:42:09,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=782048.1666666666, ans=0.125 2024-09-18 15:42:12,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782048.1666666666, ans=0.1 2024-09-18 15:42:14,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=782048.1666666666, ans=0.0 2024-09-18 15:42:18,698 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:42:20,298 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:42:35,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782104.8333333334, ans=0.1 2024-09-18 15:42:54,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=12.0 2024-09-18 15:43:03,648 INFO [train.py:1198] (0/2) Epoch 44, batch 1250, loss[loss=0.2346, ctc_loss=0.1575, cr_loss=0.3856, over 20874.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.3669, over 4107392.43 frames. ], batch size: 57, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:43:14,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=782161.5, ans=0.125 2024-09-18 15:43:25,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=782189.8333333334, ans=0.125 2024-09-18 15:43:25,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=12.0 2024-09-18 15:43:31,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=782189.8333333334, ans=0.125 2024-09-18 15:43:54,681 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2024-09-18 15:43:56,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.222e+02 2.361e+02 2.534e+02 5.802e+02, threshold=4.722e+02, percent-clipped=2.0 2024-09-18 15:44:19,790 INFO [train.py:1198] (0/2) Epoch 44, batch 1300, loss[loss=0.2442, ctc_loss=0.1651, cr_loss=0.3951, over 18252.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3673, over 4098454.12 frames. ], batch size: 108, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:45:00,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=782359.8333333334, ans=0.125 2024-09-18 15:45:00,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=782359.8333333334, ans=0.2 2024-09-18 15:45:14,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=782388.1666666666, ans=0.125 2024-09-18 15:45:38,307 INFO [train.py:1198] (0/2) Epoch 44, batch 1350, loss[loss=0.1843, ctc_loss=0.121, cr_loss=0.3165, over 20937.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1422, cr_loss=0.3673, over 4111691.55 frames. ], batch size: 48, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:45:45,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=782444.8333333334, ans=0.125 2024-09-18 15:45:46,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=782444.8333333334, ans=0.2 2024-09-18 15:46:16,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=782501.5, ans=0.125 2024-09-18 15:46:26,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=782529.8333333334, ans=0.125 2024-09-18 15:46:34,053 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.263e+02 2.364e+02 2.570e+02 4.080e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-18 15:46:56,899 INFO [train.py:1198] (0/2) Epoch 44, batch 1400, loss[loss=0.226, ctc_loss=0.1485, cr_loss=0.3876, over 21008.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1425, cr_loss=0.3676, over 4117766.38 frames. ], batch size: 63, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:47:08,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=782586.5, ans=0.125 2024-09-18 15:47:39,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=782643.1666666666, ans=0.1 2024-09-18 15:47:47,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=782671.5, ans=0.125 2024-09-18 15:48:05,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=782699.8333333334, ans=0.125 2024-09-18 15:48:08,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=782699.8333333334, ans=0.0 2024-09-18 15:48:12,454 INFO [train.py:1198] (0/2) Epoch 44, batch 1450, loss[loss=0.1993, ctc_loss=0.1296, cr_loss=0.3485, over 20886.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3686, over 4113410.34 frames. ], batch size: 54, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:48:43,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=782784.8333333334, ans=0.125 2024-09-18 15:49:05,698 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.211e+02 2.362e+02 2.505e+02 3.838e+02, threshold=4.723e+02, percent-clipped=0.0 2024-09-18 15:49:28,430 INFO [train.py:1198] (0/2) Epoch 44, batch 1500, loss[loss=0.2427, ctc_loss=0.1641, cr_loss=0.393, over 19454.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3684, over 4108231.93 frames. ], batch size: 90, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:49:37,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=782869.8333333334, ans=0.04949747468305833 2024-09-18 15:49:48,411 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:49:55,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782898.1666666666, ans=0.1 2024-09-18 15:50:31,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782983.1666666666, ans=0.1 2024-09-18 15:50:46,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=783011.5, ans=0.125 2024-09-18 15:50:46,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783011.5, ans=0.1 2024-09-18 15:50:47,737 INFO [train.py:1198] (0/2) Epoch 44, batch 1550, loss[loss=0.2225, ctc_loss=0.1467, cr_loss=0.379, over 20705.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1438, cr_loss=0.3703, over 4111399.87 frames. ], batch size: 71, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:51:39,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=783096.5, ans=0.0 2024-09-18 15:51:39,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=783096.5, ans=0.0 2024-09-18 15:51:43,735 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.221e+02 2.341e+02 2.547e+02 3.742e+02, threshold=4.682e+02, percent-clipped=0.0 2024-09-18 15:52:06,567 INFO [train.py:1198] (0/2) Epoch 44, batch 1600, loss[loss=0.2525, ctc_loss=0.1696, cr_loss=0.4146, over 20963.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1445, cr_loss=0.3713, over 4106662.82 frames. ], batch size: 64, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:52:34,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=783181.5, ans=0.1 2024-09-18 15:53:02,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=783238.1666666666, ans=0.125 2024-09-18 15:53:05,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=783266.5, ans=0.015 2024-09-18 15:53:22,061 INFO [train.py:1198] (0/2) Epoch 44, batch 1650, loss[loss=0.2152, ctc_loss=0.1401, cr_loss=0.3754, over 20991.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.145, cr_loss=0.372, over 4109377.74 frames. ], batch size: 55, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:54:12,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783379.8333333334, ans=0.1 2024-09-18 15:54:15,587 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.298e+02 2.409e+02 2.609e+02 3.009e+02, threshold=4.818e+02, percent-clipped=0.0 2024-09-18 15:54:18,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=783379.8333333334, ans=0.125 2024-09-18 15:54:26,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=783408.1666666666, ans=0.125 2024-09-18 15:54:34,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=783408.1666666666, ans=0.95 2024-09-18 15:54:38,531 INFO [train.py:1198] (0/2) Epoch 44, batch 1700, loss[loss=0.1978, ctc_loss=0.128, cr_loss=0.349, over 20826.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1444, cr_loss=0.3712, over 4118812.52 frames. ], batch size: 59, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:54:54,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2024-09-18 15:55:07,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=783493.1666666666, ans=0.0 2024-09-18 15:55:21,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=783493.1666666666, ans=0.0 2024-09-18 15:55:27,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=783521.5, ans=0.125 2024-09-18 15:55:35,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=783521.5, ans=0.125 2024-09-18 15:55:39,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=783549.8333333334, ans=0.125 2024-09-18 15:55:54,319 INFO [train.py:1198] (0/2) Epoch 44, batch 1750, loss[loss=0.2342, ctc_loss=0.1548, cr_loss=0.3973, over 20691.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1445, cr_loss=0.3714, over 4117188.96 frames. ], batch size: 71, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:56:40,217 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:56:50,359 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.237e+02 2.340e+02 2.460e+02 5.468e+02, threshold=4.679e+02, percent-clipped=1.0 2024-09-18 15:57:08,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2024-09-18 15:57:16,307 INFO [train.py:1198] (0/2) Epoch 44, batch 1800, loss[loss=0.1974, ctc_loss=0.1293, cr_loss=0.3404, over 20776.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1437, cr_loss=0.3698, over 4112766.89 frames. ], batch size: 53, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 15:57:21,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=783719.8333333334, ans=0.125 2024-09-18 15:57:41,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=22.5 2024-09-18 15:57:54,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=783776.5, ans=0.0 2024-09-18 15:58:09,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=783804.8333333334, ans=0.0 2024-09-18 15:58:31,830 INFO [train.py:1198] (0/2) Epoch 44, batch 1850, loss[loss=0.1962, ctc_loss=0.1254, cr_loss=0.3539, over 21001.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3688, over 4111554.98 frames. ], batch size: 52, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 15:58:34,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=783861.5, ans=0.0 2024-09-18 15:58:59,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=783889.8333333334, ans=0.0 2024-09-18 15:59:13,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=783918.1666666666, ans=0.125 2024-09-18 15:59:17,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=783946.5, ans=0.0 2024-09-18 15:59:26,404 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.230e+02 2.364e+02 2.564e+02 3.958e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-18 15:59:47,674 INFO [train.py:1198] (0/2) Epoch 44, batch 1900, loss[loss=0.2164, ctc_loss=0.1415, cr_loss=0.3748, over 20959.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1422, cr_loss=0.3668, over 4110144.16 frames. ], batch size: 58, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 16:00:10,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-18 16:00:46,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=784116.5, ans=0.0 2024-09-18 16:01:02,716 INFO [train.py:1198] (0/2) Epoch 44, batch 1950, loss[loss=0.2472, ctc_loss=0.1607, cr_loss=0.4326, over 20962.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1423, cr_loss=0.3671, over 4113371.53 frames. ], batch size: 58, lr: 1.88e-03, grad_scale: 16.0 2024-09-18 16:01:15,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-18 16:01:16,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=784173.1666666666, ans=0.1 2024-09-18 16:01:33,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=784201.5, ans=0.2 2024-09-18 16:01:57,749 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.858e+02 2.222e+02 2.340e+02 2.472e+02 5.398e+02, threshold=4.681e+02, percent-clipped=1.0 2024-09-18 16:02:07,895 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.75 vs. limit=15.0 2024-09-18 16:02:18,807 INFO [train.py:1198] (0/2) Epoch 44, batch 2000, loss[loss=0.2367, ctc_loss=0.1575, cr_loss=0.3961, over 20046.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1431, cr_loss=0.3692, over 4115618.96 frames. ], batch size: 80, lr: 1.88e-03, grad_scale: 32.0 2024-09-18 16:02:26,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=784286.5, ans=0.125 2024-09-18 16:02:32,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=784286.5, ans=0.125 2024-09-18 16:02:38,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784314.8333333334, ans=0.1 2024-09-18 16:02:53,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=784343.1666666666, ans=0.0 2024-09-18 16:03:39,697 INFO [train.py:1198] (0/2) Epoch 44, batch 2050, loss[loss=0.2338, ctc_loss=0.1555, cr_loss=0.3914, over 20326.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1442, cr_loss=0.3703, over 4097423.50 frames. ], batch size: 74, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:04:06,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=784456.5, ans=0.125 2024-09-18 16:04:08,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.31 vs. limit=15.0 2024-09-18 16:04:09,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=784484.8333333334, ans=0.125 2024-09-18 16:04:26,022 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-09-18 16:04:34,351 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.235e+02 2.350e+02 2.509e+02 4.746e+02, threshold=4.699e+02, percent-clipped=1.0 2024-09-18 16:04:44,854 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.56 vs. limit=15.0 2024-09-18 16:04:53,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=784541.5, ans=0.025 2024-09-18 16:04:55,778 INFO [train.py:1198] (0/2) Epoch 44, batch 2100, loss[loss=0.2214, ctc_loss=0.1475, cr_loss=0.3695, over 20287.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1438, cr_loss=0.3702, over 4113182.22 frames. ], batch size: 74, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:05:12,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=784598.1666666666, ans=0.025 2024-09-18 16:05:40,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=784654.8333333334, ans=0.2 2024-09-18 16:05:40,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-09-18 16:06:03,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=784683.1666666666, ans=0.125 2024-09-18 16:06:03,306 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:06:06,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=784683.1666666666, ans=0.025 2024-09-18 16:06:12,044 INFO [train.py:1198] (0/2) Epoch 44, batch 2150, loss[loss=0.182, ctc_loss=0.1173, cr_loss=0.3237, over 20955.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1444, cr_loss=0.3715, over 4109058.32 frames. ], batch size: 49, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:06:41,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-09-18 16:07:06,394 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.326e+02 2.412e+02 2.683e+02 9.743e+02, threshold=4.825e+02, percent-clipped=1.0 2024-09-18 16:07:23,641 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:07:27,782 INFO [train.py:1198] (0/2) Epoch 44, batch 2200, loss[loss=0.1927, ctc_loss=0.1253, cr_loss=0.3367, over 21082.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1441, cr_loss=0.3714, over 4111227.88 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:07:28,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=784853.1666666666, ans=0.0 2024-09-18 16:07:49,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=784881.5, ans=0.04949747468305833 2024-09-18 16:07:54,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=784881.5, ans=0.0 2024-09-18 16:08:07,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=784909.8333333334, ans=0.025 2024-09-18 16:08:12,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=784909.8333333334, ans=0.125 2024-09-18 16:08:27,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=12.0 2024-09-18 16:08:34,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=784966.5, ans=0.025 2024-09-18 16:08:49,720 INFO [train.py:1198] (0/2) Epoch 44, batch 2250, loss[loss=0.2041, ctc_loss=0.1332, cr_loss=0.3549, over 20767.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1439, cr_loss=0.3712, over 4112866.26 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:08:50,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=784994.8333333334, ans=0.125 2024-09-18 16:08:55,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=784994.8333333334, ans=0.0 2024-09-18 16:08:59,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=784994.8333333334, ans=0.1 2024-09-18 16:09:44,057 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.232e+02 2.356e+02 2.516e+02 5.738e+02, threshold=4.713e+02, percent-clipped=1.0 2024-09-18 16:09:50,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=785108.1666666666, ans=0.125 2024-09-18 16:10:05,570 INFO [train.py:1198] (0/2) Epoch 44, batch 2300, loss[loss=0.2143, ctc_loss=0.1384, cr_loss=0.3793, over 20821.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1439, cr_loss=0.3708, over 4108359.09 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:10:06,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=785136.5, ans=15.0 2024-09-18 16:10:12,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=785136.5, ans=0.125 2024-09-18 16:10:36,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=22.5 2024-09-18 16:10:43,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=785193.1666666666, ans=0.2 2024-09-18 16:10:46,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=785193.1666666666, ans=0.2 2024-09-18 16:10:49,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=785221.5, ans=0.125 2024-09-18 16:10:52,634 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:10:58,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=785221.5, ans=0.125 2024-09-18 16:11:21,192 INFO [train.py:1198] (0/2) Epoch 44, batch 2350, loss[loss=0.2167, ctc_loss=0.1418, cr_loss=0.3743, over 21022.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1449, cr_loss=0.3721, over 4098446.26 frames. ], batch size: 62, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:11:34,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-18 16:11:52,444 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-09-18 16:12:07,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785363.1666666666, ans=0.1 2024-09-18 16:12:07,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=785363.1666666666, ans=0.025 2024-09-18 16:12:15,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.230e+02 2.373e+02 2.587e+02 6.933e+02, threshold=4.747e+02, percent-clipped=1.0 2024-09-18 16:12:16,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=785363.1666666666, ans=0.2 2024-09-18 16:12:37,193 INFO [train.py:1198] (0/2) Epoch 44, batch 2400, loss[loss=0.2049, ctc_loss=0.1321, cr_loss=0.3637, over 20788.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1445, cr_loss=0.3715, over 4107598.52 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:13:56,662 INFO [train.py:1198] (0/2) Epoch 44, batch 2450, loss[loss=0.2149, ctc_loss=0.1442, cr_loss=0.3531, over 21068.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1438, cr_loss=0.3701, over 4116572.41 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:14:13,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785589.8333333334, ans=0.1 2024-09-18 16:14:53,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=785646.5, ans=10.0 2024-09-18 16:14:54,507 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.191e+02 2.329e+02 2.465e+02 4.448e+02, threshold=4.657e+02, percent-clipped=0.0 2024-09-18 16:15:11,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=785674.8333333334, ans=0.125 2024-09-18 16:15:15,308 INFO [train.py:1198] (0/2) Epoch 44, batch 2500, loss[loss=0.2315, ctc_loss=0.1524, cr_loss=0.3956, over 21024.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3697, over 4097324.39 frames. ], batch size: 61, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:15:24,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785703.1666666666, ans=0.1 2024-09-18 16:15:27,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=785703.1666666666, ans=0.125 2024-09-18 16:16:02,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=785788.1666666666, ans=0.1 2024-09-18 16:16:31,330 INFO [train.py:1198] (0/2) Epoch 44, batch 2550, loss[loss=0.2323, ctc_loss=0.1502, cr_loss=0.4103, over 20977.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1437, cr_loss=0.3696, over 4086853.62 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:17:02,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=785901.5, ans=0.95 2024-09-18 16:17:26,276 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.269e+02 2.356e+02 2.486e+02 3.153e+02, threshold=4.712e+02, percent-clipped=0.0 2024-09-18 16:17:47,675 INFO [train.py:1198] (0/2) Epoch 44, batch 2600, loss[loss=0.2037, ctc_loss=0.135, cr_loss=0.3435, over 20964.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1443, cr_loss=0.3697, over 4081914.99 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:18:12,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=786014.8333333334, ans=0.125 2024-09-18 16:18:18,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=786043.1666666666, ans=0.0 2024-09-18 16:18:41,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=786071.5, ans=0.0 2024-09-18 16:19:03,602 INFO [train.py:1198] (0/2) Epoch 44, batch 2650, loss[loss=0.1973, ctc_loss=0.131, cr_loss=0.3314, over 20978.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1441, cr_loss=0.3681, over 4079592.53 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:19:10,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2024-09-18 16:20:01,380 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.215e+02 2.398e+02 2.540e+02 4.601e+02, threshold=4.795e+02, percent-clipped=0.0 2024-09-18 16:20:22,470 INFO [train.py:1198] (0/2) Epoch 44, batch 2700, loss[loss=0.1955, ctc_loss=0.1272, cr_loss=0.3415, over 20958.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1439, cr_loss=0.3681, over 4091566.35 frames. ], batch size: 48, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:20:22,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=786269.8333333334, ans=0.0 2024-09-18 16:21:11,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=12.0 2024-09-18 16:21:41,605 INFO [train.py:1198] (0/2) Epoch 44, batch 2750, loss[loss=0.2499, ctc_loss=0.1666, cr_loss=0.4162, over 20112.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1431, cr_loss=0.3672, over 4100280.23 frames. ], batch size: 80, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:22:35,915 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.277e+02 2.396e+02 2.598e+02 3.085e+02, threshold=4.792e+02, percent-clipped=0.0 2024-09-18 16:22:57,388 INFO [train.py:1198] (0/2) Epoch 44, batch 2800, loss[loss=0.1998, ctc_loss=0.1297, cr_loss=0.3501, over 21052.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1427, cr_loss=0.3673, over 4102959.54 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:23:02,814 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-09-18 16:23:06,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=786553.1666666666, ans=0.125 2024-09-18 16:23:11,854 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-09-18 16:23:16,179 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:23:46,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=786638.1666666666, ans=0.0 2024-09-18 16:24:13,097 INFO [train.py:1198] (0/2) Epoch 44, batch 2850, loss[loss=0.1959, ctc_loss=0.1263, cr_loss=0.3477, over 20960.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3671, over 4109804.92 frames. ], batch size: 51, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:24:15,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=786694.8333333334, ans=0.125 2024-09-18 16:25:06,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786779.8333333334, ans=0.1 2024-09-18 16:25:07,080 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.256e+02 2.398e+02 2.542e+02 7.526e+02, threshold=4.796e+02, percent-clipped=1.0 2024-09-18 16:25:31,635 INFO [train.py:1198] (0/2) Epoch 44, batch 2900, loss[loss=0.2201, ctc_loss=0.1461, cr_loss=0.3701, over 20966.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1435, cr_loss=0.3677, over 4091800.74 frames. ], batch size: 64, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:25:34,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=786836.5, ans=0.125 2024-09-18 16:25:37,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=786836.5, ans=0.125 2024-09-18 16:26:35,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=786949.8333333334, ans=0.2 2024-09-18 16:26:50,153 INFO [train.py:1198] (0/2) Epoch 44, batch 2950, loss[loss=0.2392, ctc_loss=0.1573, cr_loss=0.4095, over 20783.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1435, cr_loss=0.368, over 4095472.10 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:26:53,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=786978.1666666666, ans=0.025 2024-09-18 16:27:17,965 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:27:44,737 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.241e+02 2.367e+02 2.490e+02 3.421e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-18 16:27:52,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=787091.5, ans=0.1 2024-09-18 16:28:05,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=787119.8333333334, ans=0.0 2024-09-18 16:28:05,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=787119.8333333334, ans=0.2 2024-09-18 16:28:06,203 INFO [train.py:1198] (0/2) Epoch 44, batch 3000, loss[loss=0.2081, ctc_loss=0.1372, cr_loss=0.3544, over 20796.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1438, cr_loss=0.3681, over 4088178.76 frames. ], batch size: 53, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:28:06,204 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 16:28:28,047 INFO [train.py:1230] (0/2) Epoch 44, validation: loss=0.03963, ctc_loss=0.03963, cr_loss=1.646e-14, over 944034.00 frames. 2024-09-18 16:28:28,048 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 16:28:57,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2024-09-18 16:29:08,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=787176.5, ans=0.1 2024-09-18 16:29:23,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=787204.8333333334, ans=0.0 2024-09-18 16:29:34,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-18 16:29:36,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=787233.1666666666, ans=0.0 2024-09-18 16:29:44,037 INFO [train.py:1198] (0/2) Epoch 44, batch 3050, loss[loss=0.2002, ctc_loss=0.1318, cr_loss=0.3417, over 21014.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1435, cr_loss=0.3679, over 4102887.99 frames. ], batch size: 61, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:29:59,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=787289.8333333334, ans=0.02 2024-09-18 16:30:16,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=787318.1666666666, ans=0.025 2024-09-18 16:30:38,884 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.194e+02 2.360e+02 2.503e+02 3.910e+02, threshold=4.720e+02, percent-clipped=0.0 2024-09-18 16:31:03,255 INFO [train.py:1198] (0/2) Epoch 44, batch 3100, loss[loss=0.1753, ctc_loss=0.1141, cr_loss=0.3057, over 19962.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1428, cr_loss=0.3674, over 4106720.53 frames. ], batch size: 44, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:31:07,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=787403.1666666666, ans=0.125 2024-09-18 16:31:35,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=787459.8333333334, ans=0.025 2024-09-18 16:31:40,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-18 16:32:08,813 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.31 vs. limit=15.0 2024-09-18 16:32:22,861 INFO [train.py:1198] (0/2) Epoch 44, batch 3150, loss[loss=0.2282, ctc_loss=0.156, cr_loss=0.3608, over 19488.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1434, cr_loss=0.3683, over 4105006.66 frames. ], batch size: 90, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:32:26,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=787544.8333333334, ans=0.0 2024-09-18 16:33:17,346 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.239e+02 2.339e+02 2.498e+02 3.418e+02, threshold=4.677e+02, percent-clipped=0.0 2024-09-18 16:33:38,833 INFO [train.py:1198] (0/2) Epoch 44, batch 3200, loss[loss=0.2303, ctc_loss=0.1498, cr_loss=0.4024, over 20991.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.3677, over 4108553.04 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:33:40,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=787686.5, ans=0.125 2024-09-18 16:33:52,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=787714.8333333334, ans=0.125 2024-09-18 16:34:09,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=787743.1666666666, ans=0.125 2024-09-18 16:34:16,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2024-09-18 16:34:30,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=787771.5, ans=0.0 2024-09-18 16:34:54,565 INFO [train.py:1198] (0/2) Epoch 44, batch 3250, loss[loss=0.2067, ctc_loss=0.1353, cr_loss=0.357, over 20878.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3683, over 4092378.18 frames. ], batch size: 54, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:35:26,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=787884.8333333334, ans=0.125 2024-09-18 16:35:49,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.261e+02 2.346e+02 2.501e+02 3.187e+02, threshold=4.692e+02, percent-clipped=0.0 2024-09-18 16:36:10,335 INFO [train.py:1198] (0/2) Epoch 44, batch 3300, loss[loss=0.2851, ctc_loss=0.2015, cr_loss=0.4176, over 14142.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.369, over 4084625.34 frames. ], batch size: 149, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:37:03,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=788054.8333333334, ans=0.125 2024-09-18 16:37:28,604 INFO [train.py:1198] (0/2) Epoch 44, batch 3350, loss[loss=0.2477, ctc_loss=0.1653, cr_loss=0.4121, over 20967.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1439, cr_loss=0.3696, over 4087325.46 frames. ], batch size: 67, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:37:48,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=788139.8333333334, ans=0.125 2024-09-18 16:37:58,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=788139.8333333334, ans=0.125 2024-09-18 16:38:25,592 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.223e+02 2.376e+02 2.583e+02 6.569e+02, threshold=4.752e+02, percent-clipped=1.0 2024-09-18 16:38:33,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=788224.8333333334, ans=0.1 2024-09-18 16:38:33,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2024-09-18 16:38:43,387 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-18 16:38:44,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=788224.8333333334, ans=0.125 2024-09-18 16:38:46,717 INFO [train.py:1198] (0/2) Epoch 44, batch 3400, loss[loss=0.2052, ctc_loss=0.1346, cr_loss=0.3526, over 20964.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1435, cr_loss=0.3692, over 4085180.46 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:39:02,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=788281.5, ans=0.125 2024-09-18 16:39:08,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=788281.5, ans=0.0 2024-09-18 16:39:40,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=788338.1666666666, ans=0.1 2024-09-18 16:39:56,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=788366.5, ans=0.125 2024-09-18 16:40:02,483 INFO [train.py:1198] (0/2) Epoch 44, batch 3450, loss[loss=0.2192, ctc_loss=0.1444, cr_loss=0.3742, over 21042.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1433, cr_loss=0.3685, over 4096064.00 frames. ], batch size: 62, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:40:11,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=788394.8333333334, ans=0.125 2024-09-18 16:40:27,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=788423.1666666666, ans=0.025 2024-09-18 16:40:33,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=788451.5, ans=0.035 2024-09-18 16:40:57,123 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.290e+02 2.415e+02 2.549e+02 3.500e+02, threshold=4.830e+02, percent-clipped=0.0 2024-09-18 16:41:00,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-09-18 16:41:18,287 INFO [train.py:1198] (0/2) Epoch 44, batch 3500, loss[loss=0.2112, ctc_loss=0.1433, cr_loss=0.3396, over 20322.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.144, cr_loss=0.3696, over 4099817.98 frames. ], batch size: 74, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:41:20,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=788536.5, ans=0.125 2024-09-18 16:42:05,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=788621.5, ans=15.0 2024-09-18 16:42:37,174 INFO [train.py:1198] (0/2) Epoch 44, batch 3550, loss[loss=0.1958, ctc_loss=0.1296, cr_loss=0.3309, over 20952.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1443, cr_loss=0.3701, over 4098197.57 frames. ], batch size: 49, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:42:49,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=788678.1666666666, ans=0.0 2024-09-18 16:42:58,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=788706.5, ans=0.125 2024-09-18 16:42:58,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=788706.5, ans=0.125 2024-09-18 16:43:34,807 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.266e+02 2.401e+02 2.547e+02 4.043e+02, threshold=4.801e+02, percent-clipped=0.0 2024-09-18 16:43:47,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=788791.5, ans=0.0 2024-09-18 16:43:55,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=788819.8333333334, ans=0.125 2024-09-18 16:43:56,143 INFO [train.py:1198] (0/2) Epoch 44, batch 3600, loss[loss=0.1811, ctc_loss=0.1179, cr_loss=0.316, over 20991.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.369, over 4092755.66 frames. ], batch size: 52, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:43:56,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.13 vs. limit=10.0 2024-09-18 16:44:28,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=788876.5, ans=0.125 2024-09-18 16:44:31,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=788876.5, ans=0.0 2024-09-18 16:44:41,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.20 vs. limit=15.0 2024-09-18 16:44:48,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=788904.8333333334, ans=0.95 2024-09-18 16:45:12,099 INFO [train.py:1198] (0/2) Epoch 44, batch 3650, loss[loss=0.2294, ctc_loss=0.153, cr_loss=0.3818, over 20663.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3693, over 4092775.99 frames. ], batch size: 71, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:45:26,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=788989.8333333334, ans=0.0 2024-09-18 16:46:07,400 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.289e+02 2.448e+02 2.618e+02 3.925e+02, threshold=4.895e+02, percent-clipped=0.0 2024-09-18 16:46:28,864 INFO [train.py:1198] (0/2) Epoch 44, batch 3700, loss[loss=0.187, ctc_loss=0.1227, cr_loss=0.3215, over 20986.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3683, over 4099939.65 frames. ], batch size: 52, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:47:00,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2024-09-18 16:47:34,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=789216.5, ans=0.05 2024-09-18 16:47:44,150 INFO [train.py:1198] (0/2) Epoch 44, batch 3750, loss[loss=0.2488, ctc_loss=0.1648, cr_loss=0.4201, over 20079.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1435, cr_loss=0.369, over 4093791.93 frames. ], batch size: 80, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:48:41,450 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.254e+02 2.407e+02 2.587e+02 3.240e+02, threshold=4.814e+02, percent-clipped=0.0 2024-09-18 16:48:49,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=789358.1666666666, ans=0.0 2024-09-18 16:49:05,622 INFO [train.py:1198] (0/2) Epoch 44, batch 3800, loss[loss=0.2046, ctc_loss=0.1371, cr_loss=0.3374, over 21007.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.3697, over 4080531.74 frames. ], batch size: 61, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:49:37,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=789443.1666666666, ans=0.125 2024-09-18 16:49:41,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2024-09-18 16:50:03,865 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=22.5 2024-09-18 16:50:21,117 INFO [train.py:1198] (0/2) Epoch 44, batch 3850, loss[loss=0.1946, ctc_loss=0.1272, cr_loss=0.3374, over 19939.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1449, cr_loss=0.3711, over 4073587.67 frames. ], batch size: 44, lr: 1.87e-03, grad_scale: 64.0 2024-09-18 16:50:32,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=789528.1666666666, ans=0.125 2024-09-18 16:50:41,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=22.5 2024-09-18 16:51:15,862 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.170e+02 2.352e+02 2.495e+02 3.572e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 16:51:35,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=789669.8333333334, ans=0.125 2024-09-18 16:51:37,027 INFO [train.py:1198] (0/2) Epoch 44, batch 3900, loss[loss=0.2407, ctc_loss=0.162, cr_loss=0.3932, over 20959.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.145, cr_loss=0.3709, over 4070097.01 frames. ], batch size: 67, lr: 1.87e-03, grad_scale: 64.0 2024-09-18 16:51:40,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=789669.8333333334, ans=0.125 2024-09-18 16:51:52,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=789698.1666666666, ans=0.125 2024-09-18 16:52:01,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=789698.1666666666, ans=0.0 2024-09-18 16:52:20,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2024-09-18 16:52:47,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2024-09-18 16:52:51,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=789811.5, ans=22.5 2024-09-18 16:52:52,658 INFO [train.py:1198] (0/2) Epoch 44, batch 3950, loss[loss=0.2445, ctc_loss=0.1614, cr_loss=0.4155, over 20634.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.145, cr_loss=0.3712, over 4070587.30 frames. ], batch size: 66, lr: 1.87e-03, grad_scale: 64.0 2024-09-18 16:53:01,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=789811.5, ans=0.125 2024-09-18 16:53:05,342 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-09-18 16:53:42,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=22.5 2024-09-18 16:53:49,585 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.272e+02 2.415e+02 2.674e+02 3.646e+02, threshold=4.829e+02, percent-clipped=0.0 2024-09-18 16:53:49,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=789896.5, ans=0.1 2024-09-18 16:53:51,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=789896.5, ans=0.0 2024-09-18 16:54:10,904 INFO [train.py:1198] (0/2) Epoch 44, batch 4000, loss[loss=0.1942, ctc_loss=0.1256, cr_loss=0.3427, over 20988.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1452, cr_loss=0.3717, over 4073700.01 frames. ], batch size: 52, lr: 1.87e-03, grad_scale: 64.0 2024-09-18 16:54:12,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=789953.1666666666, ans=0.1 2024-09-18 16:54:51,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=790009.8333333334, ans=0.2 2024-09-18 16:55:02,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=790038.1666666666, ans=0.0 2024-09-18 16:55:05,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=790038.1666666666, ans=0.125 2024-09-18 16:55:30,406 INFO [train.py:1198] (0/2) Epoch 44, batch 4050, loss[loss=0.1942, ctc_loss=0.1267, cr_loss=0.3373, over 20898.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1448, cr_loss=0.3709, over 4074018.84 frames. ], batch size: 54, lr: 1.87e-03, grad_scale: 64.0 2024-09-18 16:55:47,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=790123.1666666666, ans=0.125 2024-09-18 16:56:22,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=790179.8333333334, ans=0.125 2024-09-18 16:56:26,501 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.223e+02 2.372e+02 2.511e+02 3.889e+02, threshold=4.744e+02, percent-clipped=0.0 2024-09-18 16:56:43,794 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.161e-03 2024-09-18 16:56:46,372 INFO [train.py:1198] (0/2) Epoch 44, batch 4100, loss[loss=0.2254, ctc_loss=0.1495, cr_loss=0.3795, over 21067.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1447, cr_loss=0.3711, over 4091141.37 frames. ], batch size: 59, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:57:21,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=790293.1666666666, ans=0.125 2024-09-18 16:57:44,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=790321.5, ans=0.0 2024-09-18 16:57:46,286 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-18 16:57:58,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-18 16:58:02,129 INFO [train.py:1198] (0/2) Epoch 44, batch 4150, loss[loss=0.2196, ctc_loss=0.1447, cr_loss=0.3743, over 19471.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1446, cr_loss=0.3714, over 4098362.08 frames. ], batch size: 90, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 16:58:09,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=790378.1666666666, ans=0.125 2024-09-18 16:58:17,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=790406.5, ans=0.0 2024-09-18 16:58:34,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=790434.8333333334, ans=0.125 2024-09-18 16:58:38,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=790434.8333333334, ans=0.125 2024-09-18 16:58:53,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=790463.1666666666, ans=0.1 2024-09-18 16:58:53,681 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2024-09-18 16:58:59,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.251e+02 2.338e+02 2.466e+02 3.795e+02, threshold=4.675e+02, percent-clipped=0.0 2024-09-18 16:59:17,552 INFO [train.py:1198] (0/2) Epoch 44, batch 4200, loss[loss=0.2261, ctc_loss=0.1511, cr_loss=0.3752, over 20362.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1444, cr_loss=0.3708, over 4098005.15 frames. ], batch size: 74, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 16:59:28,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=790519.8333333334, ans=0.125 2024-09-18 16:59:36,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2024-09-18 16:59:43,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=790548.1666666666, ans=0.125 2024-09-18 16:59:53,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=790576.5, ans=0.0 2024-09-18 17:00:07,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=790604.8333333334, ans=0.0 2024-09-18 17:00:39,887 INFO [train.py:1198] (0/2) Epoch 44, batch 4250, loss[loss=0.2395, ctc_loss=0.1605, cr_loss=0.395, over 20965.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.369, over 4107143.28 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:00:46,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=12.0 2024-09-18 17:01:37,647 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.223e+02 2.363e+02 2.512e+02 3.647e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-18 17:01:43,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2024-09-18 17:01:56,174 INFO [train.py:1198] (0/2) Epoch 44, batch 4300, loss[loss=0.2139, ctc_loss=0.14, cr_loss=0.3694, over 20879.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3689, over 4102429.43 frames. ], batch size: 57, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:02:17,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=790831.5, ans=0.0 2024-09-18 17:03:11,810 INFO [train.py:1198] (0/2) Epoch 44, batch 4350, loss[loss=0.1971, ctc_loss=0.1279, cr_loss=0.3462, over 20969.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3688, over 4115240.30 frames. ], batch size: 51, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:03:13,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=790944.8333333334, ans=0.0 2024-09-18 17:03:30,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=790973.1666666666, ans=0.125 2024-09-18 17:03:37,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=790973.1666666666, ans=0.0 2024-09-18 17:03:40,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=791001.5, ans=0.04949747468305833 2024-09-18 17:04:09,370 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.240e+02 2.398e+02 2.515e+02 3.522e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-18 17:04:27,604 INFO [train.py:1198] (0/2) Epoch 44, batch 4400, loss[loss=0.2358, ctc_loss=0.1575, cr_loss=0.3919, over 20959.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1433, cr_loss=0.3685, over 4101128.87 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:04:40,674 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2024-09-18 17:04:49,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=791114.8333333334, ans=0.125 2024-09-18 17:05:22,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=791171.5, ans=0.125 2024-09-18 17:05:36,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=791199.8333333334, ans=0.025 2024-09-18 17:05:46,682 INFO [train.py:1198] (0/2) Epoch 44, batch 4450, loss[loss=0.2571, ctc_loss=0.1779, cr_loss=0.3964, over 14118.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1428, cr_loss=0.3678, over 4091077.21 frames. ], batch size: 150, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:05:57,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791228.1666666666, ans=0.1 2024-09-18 17:06:09,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=791256.5, ans=0.0 2024-09-18 17:06:32,841 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2024-09-18 17:06:36,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.08 vs. limit=6.0 2024-09-18 17:06:36,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=791313.1666666666, ans=0.1 2024-09-18 17:06:47,420 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.237e+02 2.340e+02 2.513e+02 5.679e+02, threshold=4.681e+02, percent-clipped=1.0 2024-09-18 17:07:05,528 INFO [train.py:1198] (0/2) Epoch 44, batch 4500, loss[loss=0.236, ctc_loss=0.1551, cr_loss=0.4047, over 20966.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1432, cr_loss=0.3696, over 4094598.18 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:07:32,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=791398.1666666666, ans=0.1 2024-09-18 17:08:21,508 INFO [train.py:1198] (0/2) Epoch 44, batch 4550, loss[loss=0.1875, ctc_loss=0.1214, cr_loss=0.3303, over 20971.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1435, cr_loss=0.3702, over 4094850.45 frames. ], batch size: 51, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:09:02,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=791568.1666666666, ans=0.125 2024-09-18 17:09:18,892 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.279e+02 2.386e+02 2.531e+02 3.881e+02, threshold=4.771e+02, percent-clipped=0.0 2024-09-18 17:09:37,396 INFO [train.py:1198] (0/2) Epoch 44, batch 4600, loss[loss=0.1864, ctc_loss=0.1204, cr_loss=0.3298, over 20965.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3695, over 4086543.23 frames. ], batch size: 51, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:09:37,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=791653.1666666666, ans=0.125 2024-09-18 17:09:48,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=12.0 2024-09-18 17:10:11,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-09-18 17:10:16,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=12.0 2024-09-18 17:10:23,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791738.1666666666, ans=0.1 2024-09-18 17:10:52,917 INFO [train.py:1198] (0/2) Epoch 44, batch 4650, loss[loss=0.2513, ctc_loss=0.1659, cr_loss=0.4267, over 20841.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3687, over 4093942.14 frames. ], batch size: 65, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:10:57,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2024-09-18 17:11:22,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=791823.1666666666, ans=0.125 2024-09-18 17:11:31,538 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2024-09-18 17:11:55,194 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.226e+02 2.335e+02 2.506e+02 4.223e+02, threshold=4.670e+02, percent-clipped=0.0 2024-09-18 17:12:14,829 INFO [train.py:1198] (0/2) Epoch 44, batch 4700, loss[loss=0.1743, ctc_loss=0.1118, cr_loss=0.3123, over 20939.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3675, over 4108971.00 frames. ], batch size: 50, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:13:15,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=792049.8333333334, ans=0.025 2024-09-18 17:13:30,238 INFO [train.py:1198] (0/2) Epoch 44, batch 4750, loss[loss=0.2014, ctc_loss=0.1329, cr_loss=0.3425, over 20972.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1427, cr_loss=0.3695, over 4116831.10 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 16.0 2024-09-18 17:13:43,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-09-18 17:13:57,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=792106.5, ans=0.5 2024-09-18 17:13:57,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=792106.5, ans=0.1 2024-09-18 17:14:28,759 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.260e+02 2.389e+02 2.525e+02 3.580e+02, threshold=4.778e+02, percent-clipped=0.0 2024-09-18 17:14:45,713 INFO [train.py:1198] (0/2) Epoch 44, batch 4800, loss[loss=0.2185, ctc_loss=0.1466, cr_loss=0.3594, over 21020.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3679, over 4118919.44 frames. ], batch size: 61, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:14:55,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=792219.8333333334, ans=0.5 2024-09-18 17:15:17,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=792276.5, ans=0.125 2024-09-18 17:15:52,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=792333.1666666666, ans=0.125 2024-09-18 17:16:01,552 INFO [train.py:1198] (0/2) Epoch 44, batch 4850, loss[loss=0.2015, ctc_loss=0.1342, cr_loss=0.3361, over 20967.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3665, over 4117771.74 frames. ], batch size: 50, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:16:01,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=792361.5, ans=0.0 2024-09-18 17:16:16,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=792389.8333333334, ans=0.0 2024-09-18 17:17:04,134 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.276e+02 2.403e+02 2.587e+02 5.923e+02, threshold=4.806e+02, percent-clipped=1.0 2024-09-18 17:17:13,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=792474.8333333334, ans=0.1 2024-09-18 17:17:20,689 INFO [train.py:1198] (0/2) Epoch 44, batch 4900, loss[loss=0.1638, ctc_loss=0.1036, cr_loss=0.3008, over 20959.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1415, cr_loss=0.3662, over 4108973.05 frames. ], batch size: 49, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:17:45,364 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2024-09-18 17:18:01,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=792559.8333333334, ans=0.0 2024-09-18 17:18:38,359 INFO [train.py:1198] (0/2) Epoch 44, batch 4950, loss[loss=0.2128, ctc_loss=0.1407, cr_loss=0.3602, over 20975.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1418, cr_loss=0.3668, over 4115734.09 frames. ], batch size: 58, lr: 1.87e-03, grad_scale: 32.0 2024-09-18 17:18:44,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-09-18 17:19:05,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=792673.1666666666, ans=0.09899494936611666 2024-09-18 17:19:18,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=792701.5, ans=0.1 2024-09-18 17:19:36,706 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.237e+02 2.369e+02 2.526e+02 3.939e+02, threshold=4.738e+02, percent-clipped=0.0 2024-09-18 17:19:53,577 INFO [train.py:1198] (0/2) Epoch 44, batch 5000, loss[loss=0.2052, ctc_loss=0.1344, cr_loss=0.3539, over 20970.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3674, over 4101744.18 frames. ], batch size: 55, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:20:02,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=792786.5, ans=0.0 2024-09-18 17:20:04,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=792786.5, ans=0.1 2024-09-18 17:20:25,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=792843.1666666666, ans=0.1 2024-09-18 17:21:08,206 INFO [train.py:1198] (0/2) Epoch 44, batch 5050, loss[loss=0.2051, ctc_loss=0.1346, cr_loss=0.3524, over 21058.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3675, over 4108269.60 frames. ], batch size: 53, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:21:23,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-18 17:22:05,974 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.238e+02 2.368e+02 2.535e+02 3.170e+02, threshold=4.737e+02, percent-clipped=0.0 2024-09-18 17:22:09,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-09-18 17:22:22,285 INFO [train.py:1198] (0/2) Epoch 44, batch 5100, loss[loss=0.238, ctc_loss=0.1568, cr_loss=0.4059, over 20691.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3683, over 4108378.92 frames. ], batch size: 71, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:22:36,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=793098.1666666666, ans=0.025 2024-09-18 17:22:38,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2024-09-18 17:22:49,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=793098.1666666666, ans=0.125 2024-09-18 17:23:04,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=793126.5, ans=0.025 2024-09-18 17:23:04,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=793126.5, ans=0.1 2024-09-18 17:23:36,493 INFO [train.py:1198] (0/2) Epoch 44, batch 5150, loss[loss=0.2368, ctc_loss=0.1584, cr_loss=0.3922, over 20100.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1436, cr_loss=0.3694, over 4105043.67 frames. ], batch size: 80, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:23:54,416 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:24:33,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-18 17:24:33,976 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.263e+02 2.379e+02 2.589e+02 3.092e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 17:24:38,610 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-280000.pt 2024-09-18 17:24:46,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793324.8333333334, ans=0.1 2024-09-18 17:24:51,604 INFO [train.py:1198] (0/2) Epoch 44, batch 5200, loss[loss=0.233, ctc_loss=0.1553, cr_loss=0.3885, over 20841.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1436, cr_loss=0.3695, over 4104749.63 frames. ], batch size: 65, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:25:10,522 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2024-09-18 17:25:46,163 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2024-09-18 17:25:58,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=793466.5, ans=0.0 2024-09-18 17:26:06,000 INFO [train.py:1198] (0/2) Epoch 44, batch 5250, loss[loss=0.1765, ctc_loss=0.1158, cr_loss=0.3039, over 20970.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1438, cr_loss=0.3698, over 4094399.02 frames. ], batch size: 50, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:26:59,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=793579.8333333334, ans=0.0 2024-09-18 17:27:07,682 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.198e+02 2.344e+02 2.484e+02 5.021e+02, threshold=4.687e+02, percent-clipped=1.0 2024-09-18 17:27:24,241 INFO [train.py:1198] (0/2) Epoch 44, batch 5300, loss[loss=0.2382, ctc_loss=0.1583, cr_loss=0.3995, over 20985.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1428, cr_loss=0.3675, over 4103090.55 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:27:46,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=793664.8333333334, ans=0.125 2024-09-18 17:28:41,712 INFO [train.py:1198] (0/2) Epoch 44, batch 5350, loss[loss=0.2127, ctc_loss=0.1424, cr_loss=0.3519, over 20825.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1425, cr_loss=0.3669, over 4104237.21 frames. ], batch size: 59, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:28:59,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=793806.5, ans=0.0 2024-09-18 17:29:24,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793863.1666666666, ans=0.1 2024-09-18 17:29:39,565 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.206e+02 2.376e+02 2.538e+02 3.029e+02, threshold=4.751e+02, percent-clipped=0.0 2024-09-18 17:29:41,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=793891.5, ans=0.125 2024-09-18 17:29:55,659 INFO [train.py:1198] (0/2) Epoch 44, batch 5400, loss[loss=0.2628, ctc_loss=0.1825, cr_loss=0.4018, over 14230.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1437, cr_loss=0.3688, over 4084529.29 frames. ], batch size: 149, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:30:39,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=22.5 2024-09-18 17:30:45,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=794004.8333333334, ans=0.125 2024-09-18 17:30:52,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=794004.8333333334, ans=0.125 2024-09-18 17:30:54,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=794033.1666666666, ans=0.0 2024-09-18 17:31:10,158 INFO [train.py:1198] (0/2) Epoch 44, batch 5450, loss[loss=0.2358, ctc_loss=0.1585, cr_loss=0.3864, over 20715.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1444, cr_loss=0.3702, over 4082287.06 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:31:29,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=794089.8333333334, ans=0.125 2024-09-18 17:32:01,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=794146.5, ans=0.05 2024-09-18 17:32:10,369 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.224e+02 2.387e+02 2.567e+02 3.134e+02, threshold=4.774e+02, percent-clipped=0.0 2024-09-18 17:32:15,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794174.8333333334, ans=0.1 2024-09-18 17:32:16,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=794174.8333333334, ans=0.125 2024-09-18 17:32:25,111 INFO [train.py:1198] (0/2) Epoch 44, batch 5500, loss[loss=0.2185, ctc_loss=0.1442, cr_loss=0.3715, over 20834.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.3699, over 4095994.92 frames. ], batch size: 59, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:32:25,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2024-09-18 17:32:43,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=794231.5, ans=0.0 2024-09-18 17:32:50,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794231.5, ans=0.1 2024-09-18 17:33:15,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=794288.1666666666, ans=0.125 2024-09-18 17:33:39,277 INFO [train.py:1198] (0/2) Epoch 44, batch 5550, loss[loss=0.1958, ctc_loss=0.129, cr_loss=0.3341, over 20773.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1448, cr_loss=0.371, over 4086944.56 frames. ], batch size: 56, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:34:28,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=794429.8333333334, ans=0.125 2024-09-18 17:34:38,925 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.257e+02 2.381e+02 2.509e+02 3.359e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 17:34:53,908 INFO [train.py:1198] (0/2) Epoch 44, batch 5600, loss[loss=0.1709, ctc_loss=0.1089, cr_loss=0.3098, over 20979.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.144, cr_loss=0.3699, over 4092926.84 frames. ], batch size: 48, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:35:13,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=794514.8333333334, ans=0.125 2024-09-18 17:35:20,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=794514.8333333334, ans=0.125 2024-09-18 17:35:51,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=794571.5, ans=0.0 2024-09-18 17:36:09,840 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2024-09-18 17:36:10,626 INFO [train.py:1198] (0/2) Epoch 44, batch 5650, loss[loss=0.1738, ctc_loss=0.1117, cr_loss=0.3108, over 20976.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.143, cr_loss=0.3679, over 4097905.13 frames. ], batch size: 51, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:36:23,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2024-09-18 17:36:52,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=794684.8333333334, ans=0.0 2024-09-18 17:36:52,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.63 vs. limit=22.5 2024-09-18 17:37:13,943 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.240e+02 2.370e+02 2.562e+02 3.826e+02, threshold=4.739e+02, percent-clipped=0.0 2024-09-18 17:37:27,350 INFO [train.py:1198] (0/2) Epoch 44, batch 5700, loss[loss=0.2126, ctc_loss=0.1385, cr_loss=0.3707, over 20793.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.368, over 4092540.42 frames. ], batch size: 53, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:37:33,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=794769.8333333334, ans=0.0 2024-09-18 17:38:20,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=794854.8333333334, ans=0.125 2024-09-18 17:38:25,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=22.5 2024-09-18 17:38:27,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=794883.1666666666, ans=0.125 2024-09-18 17:38:42,456 INFO [train.py:1198] (0/2) Epoch 44, batch 5750, loss[loss=0.2062, ctc_loss=0.1324, cr_loss=0.3687, over 20884.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.3689, over 4085332.82 frames. ], batch size: 57, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:39:21,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=22.5 2024-09-18 17:39:43,338 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.301e+02 2.405e+02 2.544e+02 5.354e+02, threshold=4.810e+02, percent-clipped=1.0 2024-09-18 17:39:49,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=795024.8333333334, ans=0.2 2024-09-18 17:39:56,751 INFO [train.py:1198] (0/2) Epoch 44, batch 5800, loss[loss=0.2288, ctc_loss=0.151, cr_loss=0.3891, over 21054.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1433, cr_loss=0.368, over 4090000.95 frames. ], batch size: 56, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:39:59,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=795053.1666666666, ans=0.125 2024-09-18 17:40:00,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=795053.1666666666, ans=0.125 2024-09-18 17:40:00,694 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-18 17:40:05,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=795053.1666666666, ans=0.125 2024-09-18 17:41:10,617 INFO [train.py:1198] (0/2) Epoch 44, batch 5850, loss[loss=0.231, ctc_loss=0.1538, cr_loss=0.3861, over 20962.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1427, cr_loss=0.3669, over 4085916.02 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:41:40,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=795251.5, ans=0.125 2024-09-18 17:41:54,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.14 vs. limit=10.0 2024-09-18 17:42:11,175 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.307e+02 2.427e+02 2.615e+02 7.127e+02, threshold=4.855e+02, percent-clipped=1.0 2024-09-18 17:42:24,560 INFO [train.py:1198] (0/2) Epoch 44, batch 5900, loss[loss=0.2013, ctc_loss=0.1315, cr_loss=0.349, over 20845.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3675, over 4079553.09 frames. ], batch size: 59, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:42:44,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=795364.8333333334, ans=0.025 2024-09-18 17:43:06,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=795393.1666666666, ans=0.125 2024-09-18 17:43:12,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=795421.5, ans=0.125 2024-09-18 17:43:22,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=795449.8333333334, ans=0.125 2024-09-18 17:43:32,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2024-09-18 17:43:38,670 INFO [train.py:1198] (0/2) Epoch 44, batch 5950, loss[loss=0.1748, ctc_loss=0.1114, cr_loss=0.3172, over 20956.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1432, cr_loss=0.3673, over 4078547.13 frames. ], batch size: 49, lr: 1.86e-03, grad_scale: 16.0 2024-09-18 17:43:42,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=795478.1666666666, ans=0.125 2024-09-18 17:44:06,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=795506.5, ans=0.125 2024-09-18 17:44:42,203 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.247e+02 2.351e+02 2.487e+02 3.184e+02, threshold=4.702e+02, percent-clipped=0.0 2024-09-18 17:44:55,381 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=15.0 2024-09-18 17:44:55,861 INFO [train.py:1198] (0/2) Epoch 44, batch 6000, loss[loss=0.1953, ctc_loss=0.1287, cr_loss=0.3332, over 20950.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1421, cr_loss=0.3657, over 4086560.31 frames. ], batch size: 50, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:44:55,863 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 17:45:15,457 INFO [train.py:1230] (0/2) Epoch 44, validation: loss=0.03935, ctc_loss=0.03935, cr_loss=1.572e-14, over 944034.00 frames. 2024-09-18 17:45:15,458 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 17:45:32,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=795648.1666666666, ans=0.125 2024-09-18 17:45:37,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-09-18 17:45:58,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=795704.8333333334, ans=0.025 2024-09-18 17:46:01,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=795704.8333333334, ans=0.125 2024-09-18 17:46:07,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=795704.8333333334, ans=0.0 2024-09-18 17:46:29,334 INFO [train.py:1198] (0/2) Epoch 44, batch 6050, loss[loss=0.2341, ctc_loss=0.1538, cr_loss=0.4018, over 20978.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1419, cr_loss=0.366, over 4098001.46 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:46:52,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795789.8333333334, ans=0.1 2024-09-18 17:47:22,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=795846.5, ans=0.125 2024-09-18 17:47:31,558 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.254e+02 2.386e+02 2.577e+02 4.407e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-18 17:47:45,145 INFO [train.py:1198] (0/2) Epoch 44, batch 6100, loss[loss=0.2321, ctc_loss=0.1604, cr_loss=0.3586, over 18462.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1419, cr_loss=0.3656, over 4101732.26 frames. ], batch size: 108, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:47:53,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=795903.1666666666, ans=0.07 2024-09-18 17:48:27,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=795959.8333333334, ans=0.125 2024-09-18 17:48:31,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=795988.1666666666, ans=15.0 2024-09-18 17:48:40,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=795988.1666666666, ans=0.0 2024-09-18 17:48:47,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=796016.5, ans=0.05 2024-09-18 17:49:00,134 INFO [train.py:1198] (0/2) Epoch 44, batch 6150, loss[loss=0.1872, ctc_loss=0.1234, cr_loss=0.3189, over 20977.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1418, cr_loss=0.3649, over 4089170.88 frames. ], batch size: 51, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:49:06,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=796044.8333333334, ans=0.04949747468305833 2024-09-18 17:49:16,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=796073.1666666666, ans=0.2 2024-09-18 17:49:37,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2024-09-18 17:49:40,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-18 17:50:00,797 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.226e+02 2.383e+02 2.566e+02 3.697e+02, threshold=4.765e+02, percent-clipped=0.0 2024-09-18 17:50:07,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=796158.1666666666, ans=0.1 2024-09-18 17:50:09,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=796158.1666666666, ans=0.025 2024-09-18 17:50:14,547 INFO [train.py:1198] (0/2) Epoch 44, batch 6200, loss[loss=0.2256, ctc_loss=0.1493, cr_loss=0.3817, over 21011.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1417, cr_loss=0.3649, over 4076259.08 frames. ], batch size: 62, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:50:23,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=796186.5, ans=0.125 2024-09-18 17:51:25,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=796328.1666666666, ans=0.125 2024-09-18 17:51:27,098 INFO [train.py:1198] (0/2) Epoch 44, batch 6250, loss[loss=0.2326, ctc_loss=0.1527, cr_loss=0.3994, over 20661.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1429, cr_loss=0.3663, over 4046704.53 frames. ], batch size: 71, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:51:48,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=796356.5, ans=0.0 2024-09-18 17:51:49,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=796356.5, ans=0.2 2024-09-18 17:52:27,305 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.784e+02 2.280e+02 2.451e+02 2.671e+02 5.085e+02, threshold=4.902e+02, percent-clipped=1.0 2024-09-18 17:52:27,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=796441.5, ans=0.125 2024-09-18 17:52:37,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=796441.5, ans=0.125 2024-09-18 17:52:40,873 INFO [train.py:1198] (0/2) Epoch 44, batch 6300, loss[loss=0.2357, ctc_loss=0.1631, cr_loss=0.3629, over 14181.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1438, cr_loss=0.3675, over 4003988.22 frames. ], batch size: 149, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:52:52,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=796469.8333333334, ans=0.2 2024-09-18 17:52:53,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=796498.1666666666, ans=0.125 2024-09-18 17:52:59,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=796498.1666666666, ans=0.125 2024-09-18 17:53:24,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=796554.8333333334, ans=0.2 2024-09-18 17:53:34,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=796554.8333333334, ans=0.125 2024-09-18 17:53:44,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=796583.1666666666, ans=15.0 2024-09-18 17:53:51,338 INFO [train.py:1198] (0/2) Epoch 44, batch 6350, loss[loss=0.2379, ctc_loss=0.1647, cr_loss=0.366, over 14025.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1489, cr_loss=0.3721, over 3819753.19 frames. ], batch size: 149, lr: 1.86e-03, grad_scale: 32.0 2024-09-18 17:54:11,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=796639.8333333334, ans=0.0 2024-09-18 17:54:32,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=796668.1666666666, ans=0.125 2024-09-18 17:54:35,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=796696.5, ans=0.0 2024-09-18 17:54:50,304 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-44.pt 2024-09-18 17:55:40,104 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.667e+02 2.899e+02 3.128e+02 5.853e+02, threshold=5.799e+02, percent-clipped=3.0 2024-09-18 17:55:40,123 INFO [train.py:1198] (0/2) Epoch 45, batch 0, loss[loss=0.2336, ctc_loss=0.1552, cr_loss=0.3924, over 20943.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1552, cr_loss=0.3924, over 20943.00 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 17:55:40,124 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 17:55:58,282 INFO [train.py:1230] (0/2) Epoch 45, validation: loss=0.03872, ctc_loss=0.03872, cr_loss=1.531e-14, over 944034.00 frames. 2024-09-18 17:55:58,283 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 17:56:39,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=796784.3333333334, ans=0.0 2024-09-18 17:56:43,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=796812.6666666666, ans=0.125 2024-09-18 17:56:49,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=796812.6666666666, ans=0.2 2024-09-18 17:56:52,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=796812.6666666666, ans=0.0 2024-09-18 17:56:57,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796841.0, ans=0.1 2024-09-18 17:57:00,837 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-18 17:57:13,625 INFO [train.py:1198] (0/2) Epoch 45, batch 50, loss[loss=0.237, ctc_loss=0.1581, cr_loss=0.3941, over 20707.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1435, cr_loss=0.3684, over 921569.70 frames. ], batch size: 71, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 17:57:36,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=796897.6666666666, ans=0.5 2024-09-18 17:57:48,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=796926.0, ans=0.95 2024-09-18 17:58:06,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=796954.3333333334, ans=0.125 2024-09-18 17:58:28,894 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.219e+02 2.318e+02 2.504e+02 2.972e+02, threshold=4.635e+02, percent-clipped=0.0 2024-09-18 17:58:28,915 INFO [train.py:1198] (0/2) Epoch 45, batch 100, loss[loss=0.195, ctc_loss=0.1269, cr_loss=0.3403, over 20961.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.369, over 1621469.36 frames. ], batch size: 48, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 17:58:48,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=797039.3333333334, ans=0.2 2024-09-18 17:59:10,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=797067.6666666666, ans=0.0 2024-09-18 17:59:14,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=797096.0, ans=0.125 2024-09-18 17:59:47,778 INFO [train.py:1198] (0/2) Epoch 45, batch 150, loss[loss=0.2328, ctc_loss=0.1512, cr_loss=0.408, over 20984.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3685, over 2166774.53 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:00:05,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2024-09-18 18:00:40,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=797237.6666666666, ans=0.0 2024-09-18 18:00:57,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=797266.0, ans=0.035 2024-09-18 18:01:06,698 INFO [train.py:1198] (0/2) Epoch 45, batch 200, loss[loss=0.2213, ctc_loss=0.145, cr_loss=0.3817, over 21012.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.3689, over 2588277.92 frames. ], batch size: 63, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:01:08,175 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.215e+02 2.309e+02 2.528e+02 3.607e+02, threshold=4.617e+02, percent-clipped=0.0 2024-09-18 18:01:13,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=797294.3333333334, ans=0.0 2024-09-18 18:01:23,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=797322.6666666666, ans=0.1 2024-09-18 18:01:40,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=797351.0, ans=0.0 2024-09-18 18:01:53,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=797379.3333333334, ans=0.125 2024-09-18 18:02:16,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=797407.6666666666, ans=0.125 2024-09-18 18:02:22,092 INFO [train.py:1198] (0/2) Epoch 45, batch 250, loss[loss=0.2215, ctc_loss=0.148, cr_loss=0.3677, over 19435.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1438, cr_loss=0.3684, over 2905525.65 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:02:32,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=797436.0, ans=0.0 2024-09-18 18:02:48,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=797464.3333333334, ans=0.125 2024-09-18 18:02:57,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2024-09-18 18:03:02,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=22.5 2024-09-18 18:03:23,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=797549.3333333334, ans=0.07 2024-09-18 18:03:37,495 INFO [train.py:1198] (0/2) Epoch 45, batch 300, loss[loss=0.2242, ctc_loss=0.1509, cr_loss=0.3668, over 19483.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.369, over 3178644.48 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:03:39,019 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.004e+02 2.263e+02 2.431e+02 2.595e+02 4.391e+02, threshold=4.861e+02, percent-clipped=0.0 2024-09-18 18:03:50,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=797577.6666666666, ans=0.2 2024-09-18 18:03:50,435 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2024-09-18 18:04:53,374 INFO [train.py:1198] (0/2) Epoch 45, batch 350, loss[loss=0.218, ctc_loss=0.1423, cr_loss=0.3782, over 20924.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.37, over 3384037.45 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:05:00,057 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=12.0 2024-09-18 18:05:01,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=797719.3333333334, ans=0.125 2024-09-18 18:05:35,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=797776.0, ans=0.0 2024-09-18 18:05:46,296 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-09-18 18:05:47,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=797804.3333333334, ans=0.125 2024-09-18 18:06:15,558 INFO [train.py:1198] (0/2) Epoch 45, batch 400, loss[loss=0.2192, ctc_loss=0.1431, cr_loss=0.3805, over 21025.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.3694, over 3538959.70 frames. ], batch size: 62, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 18:06:17,080 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.233e+02 2.338e+02 2.589e+02 3.115e+02, threshold=4.676e+02, percent-clipped=0.0 2024-09-18 18:07:13,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=797946.0, ans=0.025 2024-09-18 18:07:25,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=797974.3333333334, ans=0.1 2024-09-18 18:07:31,436 INFO [train.py:1198] (0/2) Epoch 45, batch 450, loss[loss=0.2221, ctc_loss=0.1471, cr_loss=0.3747, over 20666.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3681, over 3655391.76 frames. ], batch size: 66, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 18:07:54,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=798031.0, ans=0.125 2024-09-18 18:08:46,374 INFO [train.py:1198] (0/2) Epoch 45, batch 500, loss[loss=0.262, ctc_loss=0.1803, cr_loss=0.4089, over 17957.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.3702, over 3745770.14 frames. ], batch size: 108, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 18:08:47,829 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.224e+02 2.357e+02 2.491e+02 6.322e+02, threshold=4.714e+02, percent-clipped=1.0 2024-09-18 18:09:13,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=798172.6666666666, ans=0.125 2024-09-18 18:09:19,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=798201.0, ans=0.5 2024-09-18 18:10:02,022 INFO [train.py:1198] (0/2) Epoch 45, batch 550, loss[loss=0.2382, ctc_loss=0.1592, cr_loss=0.3949, over 20223.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3683, over 3828178.87 frames. ], batch size: 80, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:10:11,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=798286.0, ans=0.125 2024-09-18 18:10:21,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-18 18:10:22,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=22.5 2024-09-18 18:10:26,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=798314.3333333334, ans=0.2 2024-09-18 18:11:20,368 INFO [train.py:1198] (0/2) Epoch 45, batch 600, loss[loss=0.1628, ctc_loss=0.1059, cr_loss=0.2843, over 20949.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3681, over 3897621.94 frames. ], batch size: 49, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:11:22,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=798427.6666666666, ans=0.125 2024-09-18 18:11:23,566 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.258e+02 2.386e+02 2.547e+02 3.258e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-18 18:11:25,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=798427.6666666666, ans=0.2 2024-09-18 18:12:39,538 INFO [train.py:1198] (0/2) Epoch 45, batch 650, loss[loss=0.2018, ctc_loss=0.1332, cr_loss=0.343, over 20769.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1433, cr_loss=0.3688, over 3934949.87 frames. ], batch size: 53, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:12:56,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=798597.6666666666, ans=0.125 2024-09-18 18:13:13,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=798626.0, ans=0.0 2024-09-18 18:13:26,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-09-18 18:13:43,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=798682.6666666666, ans=0.0 2024-09-18 18:13:55,096 INFO [train.py:1198] (0/2) Epoch 45, batch 700, loss[loss=0.1966, ctc_loss=0.1262, cr_loss=0.3522, over 21011.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3673, over 3968351.30 frames. ], batch size: 52, lr: 1.84e-03, grad_scale: 8.0 2024-09-18 18:13:59,551 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.285e+02 2.417e+02 2.573e+02 3.290e+02, threshold=4.834e+02, percent-clipped=0.0 2024-09-18 18:14:13,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=798739.3333333334, ans=0.125 2024-09-18 18:14:22,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=798739.3333333334, ans=0.0 2024-09-18 18:14:30,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=22.5 2024-09-18 18:14:32,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=798767.6666666666, ans=0.0 2024-09-18 18:14:40,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=798796.0, ans=0.0 2024-09-18 18:14:49,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=798796.0, ans=0.0 2024-09-18 18:15:10,468 INFO [train.py:1198] (0/2) Epoch 45, batch 750, loss[loss=0.2381, ctc_loss=0.1596, cr_loss=0.3927, over 18246.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1434, cr_loss=0.3681, over 3994234.87 frames. ], batch size: 108, lr: 1.84e-03, grad_scale: 8.0 2024-09-18 18:15:38,128 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=15.0 2024-09-18 18:16:25,408 INFO [train.py:1198] (0/2) Epoch 45, batch 800, loss[loss=0.2199, ctc_loss=0.1462, cr_loss=0.3683, over 20976.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3684, over 4018632.32 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:16:28,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=798994.3333333334, ans=0.125 2024-09-18 18:16:29,964 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.236e+02 2.375e+02 2.577e+02 6.506e+02, threshold=4.749e+02, percent-clipped=1.0 2024-09-18 18:17:01,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=799051.0, ans=0.125 2024-09-18 18:17:11,354 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2024-09-18 18:17:47,227 INFO [train.py:1198] (0/2) Epoch 45, batch 850, loss[loss=0.1762, ctc_loss=0.1139, cr_loss=0.3115, over 20973.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3672, over 4029458.76 frames. ], batch size: 52, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:17:47,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=799136.0, ans=0.2 2024-09-18 18:18:08,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=799164.3333333334, ans=0.125 2024-09-18 18:18:16,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=799192.6666666666, ans=0.125 2024-09-18 18:18:25,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=799192.6666666666, ans=0.125 2024-09-18 18:18:42,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=799221.0, ans=0.125 2024-09-18 18:18:52,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.24 vs. limit=15.0 2024-09-18 18:19:00,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=799249.3333333334, ans=0.125 2024-09-18 18:19:02,991 INFO [train.py:1198] (0/2) Epoch 45, batch 900, loss[loss=0.2256, ctc_loss=0.1498, cr_loss=0.3792, over 20212.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1424, cr_loss=0.3665, over 4051535.26 frames. ], batch size: 74, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:19:07,605 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.849e+02 2.270e+02 2.356e+02 2.546e+02 3.184e+02, threshold=4.712e+02, percent-clipped=0.0 2024-09-18 18:20:14,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=799391.0, ans=0.125 2024-09-18 18:20:15,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2024-09-18 18:20:18,849 INFO [train.py:1198] (0/2) Epoch 45, batch 950, loss[loss=0.2126, ctc_loss=0.1393, cr_loss=0.3663, over 20967.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1427, cr_loss=0.3664, over 4049599.94 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:20:44,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=799447.6666666666, ans=0.1 2024-09-18 18:20:50,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2024-09-18 18:21:03,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=799504.3333333334, ans=0.0 2024-09-18 18:21:10,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=799504.3333333334, ans=0.0 2024-09-18 18:21:13,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=799504.3333333334, ans=0.025 2024-09-18 18:21:13,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=799504.3333333334, ans=0.125 2024-09-18 18:21:33,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=799561.0, ans=0.125 2024-09-18 18:21:34,836 INFO [train.py:1198] (0/2) Epoch 45, batch 1000, loss[loss=0.1941, ctc_loss=0.1267, cr_loss=0.3372, over 20985.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1435, cr_loss=0.3678, over 4055000.60 frames. ], batch size: 50, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:21:39,515 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.263e+02 2.432e+02 2.613e+02 3.713e+02, threshold=4.863e+02, percent-clipped=0.0 2024-09-18 18:22:25,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799646.0, ans=0.1 2024-09-18 18:22:34,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=799646.0, ans=0.125 2024-09-18 18:22:54,102 INFO [train.py:1198] (0/2) Epoch 45, batch 1050, loss[loss=0.173, ctc_loss=0.1104, cr_loss=0.3128, over 20017.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.3689, over 4063153.44 frames. ], batch size: 44, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:22:59,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=799702.6666666666, ans=0.0 2024-09-18 18:23:04,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=799702.6666666666, ans=0.125 2024-09-18 18:24:12,444 INFO [train.py:1198] (0/2) Epoch 45, batch 1100, loss[loss=0.1933, ctc_loss=0.1253, cr_loss=0.3402, over 19957.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.143, cr_loss=0.3673, over 4068289.13 frames. ], batch size: 44, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:24:17,056 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.246e+02 2.365e+02 2.530e+02 3.099e+02, threshold=4.731e+02, percent-clipped=0.0 2024-09-18 18:24:47,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=799901.0, ans=0.0 2024-09-18 18:24:59,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=799929.3333333334, ans=0.2 2024-09-18 18:24:59,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=799929.3333333334, ans=0.125 2024-09-18 18:25:05,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=799929.3333333334, ans=0.125 2024-09-18 18:25:10,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=799929.3333333334, ans=0.0 2024-09-18 18:25:28,321 INFO [train.py:1198] (0/2) Epoch 45, batch 1150, loss[loss=0.2537, ctc_loss=0.167, cr_loss=0.4336, over 20688.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1428, cr_loss=0.367, over 4075330.37 frames. ], batch size: 68, lr: 1.84e-03, grad_scale: 16.0 2024-09-18 18:25:39,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=799986.0, ans=0.0 2024-09-18 18:25:43,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=800014.3333333334, ans=0.0 2024-09-18 18:25:45,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=800014.3333333334, ans=0.125 2024-09-18 18:26:21,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=800071.0, ans=0.0 2024-09-18 18:26:27,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=800099.3333333334, ans=10.0 2024-09-18 18:26:35,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800099.3333333334, ans=0.1 2024-09-18 18:26:44,307 INFO [train.py:1198] (0/2) Epoch 45, batch 1200, loss[loss=0.2016, ctc_loss=0.1288, cr_loss=0.3639, over 20969.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1424, cr_loss=0.3663, over 4077585.44 frames. ], batch size: 49, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 18:26:49,016 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.224e+02 2.342e+02 2.520e+02 5.310e+02, threshold=4.684e+02, percent-clipped=1.0 2024-09-18 18:26:52,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-09-18 18:26:55,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=800127.6666666666, ans=0.035 2024-09-18 18:26:59,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=800156.0, ans=0.0 2024-09-18 18:27:30,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800212.6666666666, ans=0.1 2024-09-18 18:27:39,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=800212.6666666666, ans=0.2 2024-09-18 18:27:40,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800212.6666666666, ans=0.1 2024-09-18 18:27:42,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=800212.6666666666, ans=0.025 2024-09-18 18:27:51,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=800241.0, ans=0.0 2024-09-18 18:28:00,332 INFO [train.py:1198] (0/2) Epoch 45, batch 1250, loss[loss=0.1917, ctc_loss=0.1268, cr_loss=0.3244, over 20981.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1415, cr_loss=0.365, over 4091052.84 frames. ], batch size: 49, lr: 1.84e-03, grad_scale: 32.0 2024-09-18 18:28:05,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=800269.3333333334, ans=0.125 2024-09-18 18:29:20,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=800411.0, ans=0.125 2024-09-18 18:29:21,883 INFO [train.py:1198] (0/2) Epoch 45, batch 1300, loss[loss=0.2492, ctc_loss=0.1655, cr_loss=0.418, over 20735.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1412, cr_loss=0.365, over 4096221.21 frames. ], batch size: 71, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:29:26,371 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.230e+02 2.357e+02 2.489e+02 3.462e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-18 18:29:42,153 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-09-18 18:29:43,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=800439.3333333334, ans=0.0 2024-09-18 18:30:03,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=800467.6666666666, ans=0.125 2024-09-18 18:30:37,603 INFO [train.py:1198] (0/2) Epoch 45, batch 1350, loss[loss=0.2286, ctc_loss=0.1499, cr_loss=0.3937, over 21086.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1418, cr_loss=0.3665, over 4095743.97 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:31:29,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=800637.6666666666, ans=0.125 2024-09-18 18:31:41,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=800666.0, ans=0.2 2024-09-18 18:31:49,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800666.0, ans=0.1 2024-09-18 18:31:52,173 INFO [train.py:1198] (0/2) Epoch 45, batch 1400, loss[loss=0.2466, ctc_loss=0.1634, cr_loss=0.4162, over 20607.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1428, cr_loss=0.3674, over 4093749.21 frames. ], batch size: 71, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:31:58,162 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.315e+02 2.423e+02 2.618e+02 8.259e+02, threshold=4.847e+02, percent-clipped=1.0 2024-09-18 18:32:01,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=800694.3333333334, ans=0.0 2024-09-18 18:33:07,191 INFO [train.py:1198] (0/2) Epoch 45, batch 1450, loss[loss=0.2746, ctc_loss=0.1904, cr_loss=0.4207, over 14189.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1439, cr_loss=0.3691, over 4083958.92 frames. ], batch size: 149, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:33:10,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=800836.0, ans=0.0 2024-09-18 18:33:55,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=800921.0, ans=0.0 2024-09-18 18:34:11,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=800949.3333333334, ans=0.05 2024-09-18 18:34:26,119 INFO [train.py:1198] (0/2) Epoch 45, batch 1500, loss[loss=0.2285, ctc_loss=0.1491, cr_loss=0.3972, over 21056.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1425, cr_loss=0.367, over 4101867.94 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:34:31,902 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.306e+02 2.399e+02 2.560e+02 3.657e+02, threshold=4.798e+02, percent-clipped=0.0 2024-09-18 18:34:53,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=12.0 2024-09-18 18:34:54,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-18 18:35:43,568 INFO [train.py:1198] (0/2) Epoch 45, batch 1550, loss[loss=0.2375, ctc_loss=0.156, cr_loss=0.4077, over 20392.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3694, over 4098124.74 frames. ], batch size: 74, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:35:59,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=801147.6666666666, ans=0.125 2024-09-18 18:36:12,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=801176.0, ans=0.2 2024-09-18 18:36:18,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801176.0, ans=0.1 2024-09-18 18:36:50,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801232.6666666666, ans=0.1 2024-09-18 18:36:58,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=801261.0, ans=0.125 2024-09-18 18:36:59,562 INFO [train.py:1198] (0/2) Epoch 45, batch 1600, loss[loss=0.181, ctc_loss=0.1153, cr_loss=0.3282, over 20938.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3691, over 4101144.65 frames. ], batch size: 48, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:37:05,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.255e+02 2.375e+02 2.515e+02 3.208e+02, threshold=4.750e+02, percent-clipped=0.0 2024-09-18 18:37:18,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=801289.3333333334, ans=0.125 2024-09-18 18:37:45,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=801346.0, ans=0.2 2024-09-18 18:37:48,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=801346.0, ans=0.125 2024-09-18 18:37:55,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=801346.0, ans=0.125 2024-09-18 18:38:15,174 INFO [train.py:1198] (0/2) Epoch 45, batch 1650, loss[loss=0.2384, ctc_loss=0.1592, cr_loss=0.396, over 20660.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3682, over 4094674.88 frames. ], batch size: 68, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:38:15,990 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2024-09-18 18:38:26,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=801402.6666666666, ans=0.025 2024-09-18 18:38:33,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=801431.0, ans=0.04949747468305833 2024-09-18 18:38:45,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=801459.3333333334, ans=0.0 2024-09-18 18:39:30,701 INFO [train.py:1198] (0/2) Epoch 45, batch 1700, loss[loss=0.2521, ctc_loss=0.1711, cr_loss=0.4053, over 19418.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3691, over 4092977.11 frames. ], batch size: 90, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:39:36,041 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-09-18 18:39:36,820 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.253e+02 2.361e+02 2.504e+02 5.463e+02, threshold=4.722e+02, percent-clipped=1.0 2024-09-18 18:39:40,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801544.3333333334, ans=0.1 2024-09-18 18:40:17,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=801629.3333333334, ans=10.0 2024-09-18 18:40:22,632 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:40:52,309 INFO [train.py:1198] (0/2) Epoch 45, batch 1750, loss[loss=0.2072, ctc_loss=0.135, cr_loss=0.3608, over 20764.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1427, cr_loss=0.3685, over 4107093.78 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:41:03,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-18 18:41:16,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=801714.3333333334, ans=0.125 2024-09-18 18:41:26,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801742.6666666666, ans=0.1 2024-09-18 18:41:59,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=801799.3333333334, ans=0.0 2024-09-18 18:42:07,892 INFO [train.py:1198] (0/2) Epoch 45, batch 1800, loss[loss=0.2223, ctc_loss=0.1497, cr_loss=0.3629, over 19443.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1418, cr_loss=0.3667, over 4108790.74 frames. ], batch size: 90, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:42:14,107 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.228e+02 2.327e+02 2.526e+02 4.210e+02, threshold=4.654e+02, percent-clipped=0.0 2024-09-18 18:42:20,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=801827.6666666666, ans=0.125 2024-09-18 18:42:26,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=801856.0, ans=0.025 2024-09-18 18:42:36,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-09-18 18:42:51,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=801884.3333333334, ans=0.125 2024-09-18 18:43:24,137 INFO [train.py:1198] (0/2) Epoch 45, batch 1850, loss[loss=0.2075, ctc_loss=0.1377, cr_loss=0.3486, over 20921.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.3679, over 4103955.91 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:43:44,560 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2024-09-18 18:43:54,065 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.63 vs. limit=15.0 2024-09-18 18:43:57,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=15.0 2024-09-18 18:43:58,163 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.59 vs. limit=10.0 2024-09-18 18:44:39,734 INFO [train.py:1198] (0/2) Epoch 45, batch 1900, loss[loss=0.2606, ctc_loss=0.1749, cr_loss=0.4287, over 20825.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.369, over 4088202.31 frames. ], batch size: 65, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:44:43,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=802111.0, ans=0.2 2024-09-18 18:44:45,627 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.299e+02 2.434e+02 2.578e+02 3.526e+02, threshold=4.868e+02, percent-clipped=0.0 2024-09-18 18:45:05,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=802139.3333333334, ans=22.5 2024-09-18 18:45:34,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=802196.0, ans=0.125 2024-09-18 18:45:42,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=802224.3333333334, ans=0.125 2024-09-18 18:45:57,309 INFO [train.py:1198] (0/2) Epoch 45, batch 1950, loss[loss=0.2123, ctc_loss=0.1422, cr_loss=0.3505, over 20939.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3685, over 4095629.76 frames. ], batch size: 64, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:46:19,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=802281.0, ans=0.125 2024-09-18 18:46:24,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-18 18:47:06,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=802366.0, ans=0.125 2024-09-18 18:47:15,488 INFO [train.py:1198] (0/2) Epoch 45, batch 2000, loss[loss=0.2744, ctc_loss=0.1866, cr_loss=0.4389, over 18424.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1422, cr_loss=0.3671, over 4106471.08 frames. ], batch size: 108, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:47:21,505 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.275e+02 2.408e+02 2.614e+02 4.325e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-18 18:47:23,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=802394.3333333334, ans=0.0 2024-09-18 18:48:30,858 INFO [train.py:1198] (0/2) Epoch 45, batch 2050, loss[loss=0.2356, ctc_loss=0.1553, cr_loss=0.4015, over 20635.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3671, over 4080246.15 frames. ], batch size: 66, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:48:55,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=802564.3333333334, ans=0.2 2024-09-18 18:49:10,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=802592.6666666666, ans=0.2 2024-09-18 18:49:10,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2024-09-18 18:49:22,446 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=22.5 2024-09-18 18:49:23,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=22.5 2024-09-18 18:49:45,847 INFO [train.py:1198] (0/2) Epoch 45, batch 2100, loss[loss=0.223, ctc_loss=0.1478, cr_loss=0.3758, over 21043.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.142, cr_loss=0.3663, over 4086009.42 frames. ], batch size: 62, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:49:53,326 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.287e+02 2.405e+02 2.555e+02 6.408e+02, threshold=4.809e+02, percent-clipped=1.0 2024-09-18 18:49:58,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=802677.6666666666, ans=0.5 2024-09-18 18:50:22,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-09-18 18:50:24,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=802734.3333333334, ans=0.1 2024-09-18 18:51:01,154 INFO [train.py:1198] (0/2) Epoch 45, batch 2150, loss[loss=0.2148, ctc_loss=0.1394, cr_loss=0.3769, over 20997.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1422, cr_loss=0.3664, over 4091383.24 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:51:03,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=802819.3333333334, ans=0.125 2024-09-18 18:51:09,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=802819.3333333334, ans=0.2 2024-09-18 18:51:24,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=802847.6666666666, ans=0.125 2024-09-18 18:51:25,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=802847.6666666666, ans=0.0 2024-09-18 18:51:48,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=802876.0, ans=0.025 2024-09-18 18:52:13,792 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:52:21,943 INFO [train.py:1198] (0/2) Epoch 45, batch 2200, loss[loss=0.2064, ctc_loss=0.1376, cr_loss=0.3443, over 21019.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3684, over 4083319.34 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:52:29,526 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.281e+02 2.422e+02 2.611e+02 3.786e+02, threshold=4.845e+02, percent-clipped=0.0 2024-09-18 18:52:35,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=802989.3333333334, ans=0.0 2024-09-18 18:52:57,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803017.6666666666, ans=0.1 2024-09-18 18:53:00,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=803017.6666666666, ans=0.125 2024-09-18 18:53:00,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=803017.6666666666, ans=0.125 2024-09-18 18:53:27,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.56 vs. limit=22.5 2024-09-18 18:53:37,444 INFO [train.py:1198] (0/2) Epoch 45, batch 2250, loss[loss=0.2297, ctc_loss=0.1551, cr_loss=0.3729, over 19368.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1436, cr_loss=0.3682, over 4087120.74 frames. ], batch size: 90, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:54:07,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=803159.3333333334, ans=0.2 2024-09-18 18:54:25,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=803187.6666666666, ans=10.0 2024-09-18 18:54:34,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=803187.6666666666, ans=0.0 2024-09-18 18:54:39,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=803216.0, ans=0.0 2024-09-18 18:54:44,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-09-18 18:54:52,611 INFO [train.py:1198] (0/2) Epoch 45, batch 2300, loss[loss=0.2249, ctc_loss=0.1482, cr_loss=0.3833, over 21073.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1438, cr_loss=0.3687, over 4087529.98 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:54:56,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=803244.3333333334, ans=0.2 2024-09-18 18:55:00,160 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.237e+02 2.377e+02 2.585e+02 3.314e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-18 18:55:06,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803272.6666666666, ans=0.1 2024-09-18 18:55:13,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.81 vs. limit=22.5 2024-09-18 18:55:54,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=803357.6666666666, ans=0.0 2024-09-18 18:56:08,029 INFO [train.py:1198] (0/2) Epoch 45, batch 2350, loss[loss=0.1869, ctc_loss=0.1231, cr_loss=0.319, over 19472.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1438, cr_loss=0.3693, over 4067029.98 frames. ], batch size: 90, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 18:56:13,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=22.5 2024-09-18 18:56:38,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=803414.3333333334, ans=0.0 2024-09-18 18:56:49,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=803442.6666666666, ans=0.125 2024-09-18 18:57:01,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-18 18:57:16,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=803499.3333333334, ans=0.125 2024-09-18 18:57:29,452 INFO [train.py:1198] (0/2) Epoch 45, batch 2400, loss[loss=0.2192, ctc_loss=0.1452, cr_loss=0.3698, over 21064.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1435, cr_loss=0.3691, over 4085273.81 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:57:37,133 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.266e+02 2.397e+02 2.555e+02 3.642e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 18:57:44,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=803556.0, ans=0.1 2024-09-18 18:57:44,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803556.0, ans=0.1 2024-09-18 18:57:52,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=803556.0, ans=0.2 2024-09-18 18:58:14,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=803612.6666666666, ans=0.2 2024-09-18 18:58:28,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=803641.0, ans=0.2 2024-09-18 18:58:44,657 INFO [train.py:1198] (0/2) Epoch 45, batch 2450, loss[loss=0.2253, ctc_loss=0.1486, cr_loss=0.3837, over 20836.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.3695, over 4081310.73 frames. ], batch size: 65, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 18:59:29,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-09-18 18:59:59,803 INFO [train.py:1198] (0/2) Epoch 45, batch 2500, loss[loss=0.2097, ctc_loss=0.1385, cr_loss=0.3561, over 20886.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1437, cr_loss=0.3695, over 4079725.89 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:00:07,077 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.203e+02 2.346e+02 2.471e+02 5.155e+02, threshold=4.693e+02, percent-clipped=1.0 2024-09-18 19:00:08,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=803811.0, ans=0.025 2024-09-18 19:00:34,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=803867.6666666666, ans=0.0 2024-09-18 19:01:09,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=803924.3333333334, ans=0.0 2024-09-18 19:01:14,966 INFO [train.py:1198] (0/2) Epoch 45, batch 2550, loss[loss=0.2308, ctc_loss=0.1489, cr_loss=0.4091, over 20824.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1436, cr_loss=0.3694, over 4077985.19 frames. ], batch size: 65, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:01:31,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803981.0, ans=0.1 2024-09-18 19:01:48,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=804009.3333333334, ans=0.125 2024-09-18 19:01:48,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=804009.3333333334, ans=0.0 2024-09-18 19:01:50,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=804009.3333333334, ans=0.025 2024-09-18 19:02:33,136 INFO [train.py:1198] (0/2) Epoch 45, batch 2600, loss[loss=0.18, ctc_loss=0.1164, cr_loss=0.3176, over 20951.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.3699, over 4079643.51 frames. ], batch size: 50, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:02:39,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=804094.3333333334, ans=0.125 2024-09-18 19:02:40,633 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.253e+02 2.400e+02 2.597e+02 3.513e+02, threshold=4.801e+02, percent-clipped=0.0 2024-09-18 19:02:42,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=804094.3333333334, ans=0.025 2024-09-18 19:02:46,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804122.6666666666, ans=0.1 2024-09-18 19:03:22,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-18 19:03:30,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=804179.3333333334, ans=0.125 2024-09-18 19:03:33,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=12.0 2024-09-18 19:03:36,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=804207.6666666666, ans=10.0 2024-09-18 19:03:47,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=804207.6666666666, ans=0.125 2024-09-18 19:03:51,346 INFO [train.py:1198] (0/2) Epoch 45, batch 2650, loss[loss=0.2436, ctc_loss=0.1618, cr_loss=0.4088, over 20672.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.3692, over 4070932.57 frames. ], batch size: 66, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:04:12,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=804264.3333333334, ans=0.125 2024-09-18 19:04:18,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=804264.3333333334, ans=0.2 2024-09-18 19:04:20,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2024-09-18 19:04:29,110 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:04:54,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=804349.3333333334, ans=0.07 2024-09-18 19:05:00,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=804349.3333333334, ans=0.2 2024-09-18 19:05:06,013 INFO [train.py:1198] (0/2) Epoch 45, batch 2700, loss[loss=0.2166, ctc_loss=0.1454, cr_loss=0.3562, over 20694.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1436, cr_loss=0.3693, over 4074281.64 frames. ], batch size: 71, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:05:13,432 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.277e+02 2.422e+02 2.633e+02 3.547e+02, threshold=4.844e+02, percent-clipped=0.0 2024-09-18 19:05:19,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=804406.0, ans=0.125 2024-09-18 19:05:57,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=804462.6666666666, ans=0.1 2024-09-18 19:06:08,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-18 19:06:20,846 INFO [train.py:1198] (0/2) Epoch 45, batch 2750, loss[loss=0.2285, ctc_loss=0.1497, cr_loss=0.3937, over 19977.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1444, cr_loss=0.371, over 4073021.74 frames. ], batch size: 80, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:07:19,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804632.6666666666, ans=0.1 2024-09-18 19:07:22,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=804632.6666666666, ans=0.125 2024-09-18 19:07:23,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=804632.6666666666, ans=0.1 2024-09-18 19:07:26,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=804632.6666666666, ans=0.125 2024-09-18 19:07:35,572 INFO [train.py:1198] (0/2) Epoch 45, batch 2800, loss[loss=0.1806, ctc_loss=0.1165, cr_loss=0.3201, over 20978.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1447, cr_loss=0.3712, over 4075352.78 frames. ], batch size: 48, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:07:37,164 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-284000.pt 2024-09-18 19:07:47,068 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.260e+02 2.383e+02 2.615e+02 3.904e+02, threshold=4.766e+02, percent-clipped=0.0 2024-09-18 19:08:09,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=804717.6666666666, ans=0.1 2024-09-18 19:08:24,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=804746.0, ans=0.125 2024-09-18 19:08:39,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804746.0, ans=0.1 2024-09-18 19:08:58,649 INFO [train.py:1198] (0/2) Epoch 45, batch 2850, loss[loss=0.1896, ctc_loss=0.1242, cr_loss=0.3272, over 20940.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1439, cr_loss=0.3698, over 4086479.59 frames. ], batch size: 50, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:09:09,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=804802.6666666666, ans=0.125 2024-09-18 19:09:33,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=804859.3333333334, ans=0.2 2024-09-18 19:09:48,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=804887.6666666666, ans=0.0 2024-09-18 19:10:12,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=804944.3333333334, ans=0.125 2024-09-18 19:10:14,070 INFO [train.py:1198] (0/2) Epoch 45, batch 2900, loss[loss=0.2205, ctc_loss=0.1448, cr_loss=0.3784, over 21055.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3687, over 4101406.30 frames. ], batch size: 62, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:10:15,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=804944.3333333334, ans=0.07 2024-09-18 19:10:21,356 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.238e+02 2.395e+02 2.569e+02 3.756e+02, threshold=4.791e+02, percent-clipped=0.0 2024-09-18 19:11:05,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=805029.3333333334, ans=0.1 2024-09-18 19:11:29,747 INFO [train.py:1198] (0/2) Epoch 45, batch 2950, loss[loss=0.2189, ctc_loss=0.1439, cr_loss=0.3752, over 20820.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1427, cr_loss=0.3682, over 4102418.66 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:11:49,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=805114.3333333334, ans=0.125 2024-09-18 19:12:37,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=805199.3333333334, ans=0.0 2024-09-18 19:12:44,801 INFO [train.py:1198] (0/2) Epoch 45, batch 3000, loss[loss=0.2377, ctc_loss=0.1583, cr_loss=0.397, over 20870.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3688, over 4095814.93 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:12:44,802 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 19:13:03,151 INFO [train.py:1230] (0/2) Epoch 45, validation: loss=0.03897, ctc_loss=0.03897, cr_loss=1.53e-14, over 944034.00 frames. 2024-09-18 19:13:03,152 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 19:13:15,001 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.226e+02 2.374e+02 2.629e+02 4.930e+02, threshold=4.749e+02, percent-clipped=1.0 2024-09-18 19:13:46,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=805284.3333333334, ans=0.2 2024-09-18 19:14:23,702 INFO [train.py:1198] (0/2) Epoch 45, batch 3050, loss[loss=0.233, ctc_loss=0.1543, cr_loss=0.3936, over 19383.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3689, over 4102708.09 frames. ], batch size: 90, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:14:25,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805369.3333333334, ans=0.1 2024-09-18 19:14:33,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=805369.3333333334, ans=0.1 2024-09-18 19:15:11,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=22.5 2024-09-18 19:15:22,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=805482.6666666666, ans=0.125 2024-09-18 19:15:38,620 INFO [train.py:1198] (0/2) Epoch 45, batch 3100, loss[loss=0.1888, ctc_loss=0.1252, cr_loss=0.3184, over 21088.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3682, over 4109093.17 frames. ], batch size: 53, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:15:47,837 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.199e+02 2.341e+02 2.490e+02 3.106e+02, threshold=4.683e+02, percent-clipped=0.0 2024-09-18 19:16:50,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=805624.3333333334, ans=0.0 2024-09-18 19:16:54,659 INFO [train.py:1198] (0/2) Epoch 45, batch 3150, loss[loss=0.2334, ctc_loss=0.1565, cr_loss=0.3848, over 20648.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3683, over 4111174.74 frames. ], batch size: 68, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:17:20,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=805681.0, ans=0.2 2024-09-18 19:17:52,990 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2024-09-18 19:18:10,031 INFO [train.py:1198] (0/2) Epoch 45, batch 3200, loss[loss=0.2213, ctc_loss=0.1473, cr_loss=0.3703, over 20836.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3688, over 4102933.73 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:18:20,423 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.070e+02 2.271e+02 2.400e+02 2.558e+02 6.870e+02, threshold=4.801e+02, percent-clipped=1.0 2024-09-18 19:18:41,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=805851.0, ans=0.125 2024-09-18 19:19:10,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=805907.6666666666, ans=0.125 2024-09-18 19:19:27,470 INFO [train.py:1198] (0/2) Epoch 45, batch 3250, loss[loss=0.2276, ctc_loss=0.1518, cr_loss=0.3788, over 21011.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1431, cr_loss=0.3696, over 4105452.16 frames. ], batch size: 63, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:19:30,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=805936.0, ans=0.0 2024-09-18 19:19:34,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.82 vs. limit=10.0 2024-09-18 19:20:08,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=805992.6666666666, ans=0.015 2024-09-18 19:20:40,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=806049.3333333334, ans=0.125 2024-09-18 19:20:45,758 INFO [train.py:1198] (0/2) Epoch 45, batch 3300, loss[loss=0.2263, ctc_loss=0.1512, cr_loss=0.3754, over 20331.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3679, over 4103801.64 frames. ], batch size: 74, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:20:56,354 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.201e+02 2.322e+02 2.511e+02 2.958e+02, threshold=4.644e+02, percent-clipped=0.0 2024-09-18 19:21:04,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-09-18 19:21:05,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=806106.0, ans=0.2 2024-09-18 19:21:11,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=806106.0, ans=0.0 2024-09-18 19:21:14,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=806134.3333333334, ans=0.125 2024-09-18 19:21:28,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=806134.3333333334, ans=0.1 2024-09-18 19:21:28,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=806134.3333333334, ans=0.0 2024-09-18 19:21:56,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=806191.0, ans=0.2 2024-09-18 19:22:00,990 INFO [train.py:1198] (0/2) Epoch 45, batch 3350, loss[loss=0.2493, ctc_loss=0.1674, cr_loss=0.4095, over 20953.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3679, over 4107984.39 frames. ], batch size: 64, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:22:01,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=806219.3333333334, ans=0.125 2024-09-18 19:22:24,128 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=22.5 2024-09-18 19:22:31,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=806276.0, ans=0.0 2024-09-18 19:22:51,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=806304.3333333334, ans=0.125 2024-09-18 19:22:55,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806304.3333333334, ans=0.1 2024-09-18 19:23:13,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=806332.6666666666, ans=0.07 2024-09-18 19:23:16,649 INFO [train.py:1198] (0/2) Epoch 45, batch 3400, loss[loss=0.2066, ctc_loss=0.1354, cr_loss=0.356, over 20979.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1432, cr_loss=0.3694, over 4101086.50 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:23:27,441 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.263e+02 2.399e+02 2.569e+02 3.908e+02, threshold=4.799e+02, percent-clipped=0.0 2024-09-18 19:24:32,380 INFO [train.py:1198] (0/2) Epoch 45, batch 3450, loss[loss=0.2357, ctc_loss=0.156, cr_loss=0.3988, over 21041.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3692, over 4087959.09 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:24:42,069 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2024-09-18 19:25:11,030 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2024-09-18 19:25:34,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=806587.6666666666, ans=0.125 2024-09-18 19:25:46,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=806616.0, ans=0.125 2024-09-18 19:25:53,813 INFO [train.py:1198] (0/2) Epoch 45, batch 3500, loss[loss=0.228, ctc_loss=0.1514, cr_loss=0.3829, over 20859.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1433, cr_loss=0.3693, over 4082598.54 frames. ], batch size: 57, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:25:55,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=806644.3333333334, ans=0.05 2024-09-18 19:26:04,327 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.917e+02 2.268e+02 2.378e+02 2.564e+02 4.975e+02, threshold=4.757e+02, percent-clipped=1.0 2024-09-18 19:27:09,327 INFO [train.py:1198] (0/2) Epoch 45, batch 3550, loss[loss=0.1881, ctc_loss=0.1221, cr_loss=0.33, over 20870.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.3687, over 4086797.09 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:27:11,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=806786.0, ans=0.125 2024-09-18 19:27:25,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=806814.3333333334, ans=22.5 2024-09-18 19:27:26,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=806814.3333333334, ans=0.025 2024-09-18 19:28:05,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-18 19:28:14,480 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:28:24,596 INFO [train.py:1198] (0/2) Epoch 45, batch 3600, loss[loss=0.2246, ctc_loss=0.1459, cr_loss=0.3936, over 21026.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3693, over 4085391.51 frames. ], batch size: 61, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:28:35,278 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.822e+02 2.229e+02 2.367e+02 2.528e+02 3.256e+02, threshold=4.733e+02, percent-clipped=0.0 2024-09-18 19:29:02,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=806984.3333333334, ans=0.2 2024-09-18 19:29:10,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=807012.6666666666, ans=0.025 2024-09-18 19:29:32,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=807041.0, ans=0.125 2024-09-18 19:29:39,739 INFO [train.py:1198] (0/2) Epoch 45, batch 3650, loss[loss=0.2531, ctc_loss=0.1673, cr_loss=0.4289, over 19390.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.143, cr_loss=0.3697, over 4086696.07 frames. ], batch size: 90, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:30:26,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=807154.3333333334, ans=0.125 2024-09-18 19:30:31,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=807154.3333333334, ans=0.125 2024-09-18 19:30:59,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2024-09-18 19:31:00,787 INFO [train.py:1198] (0/2) Epoch 45, batch 3700, loss[loss=0.2274, ctc_loss=0.1505, cr_loss=0.3846, over 20949.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.3689, over 4098992.91 frames. ], batch size: 67, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:31:01,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=807211.0, ans=0.0 2024-09-18 19:31:04,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-09-18 19:31:13,129 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.278e+02 2.418e+02 2.561e+02 3.251e+02, threshold=4.835e+02, percent-clipped=0.0 2024-09-18 19:31:32,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=807267.6666666666, ans=0.0 2024-09-18 19:31:46,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=807296.0, ans=0.125 2024-09-18 19:32:16,265 INFO [train.py:1198] (0/2) Epoch 45, batch 3750, loss[loss=0.2314, ctc_loss=0.153, cr_loss=0.392, over 21019.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.3692, over 4090481.18 frames. ], batch size: 61, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:32:20,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2024-09-18 19:32:21,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807352.6666666666, ans=0.1 2024-09-18 19:32:30,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2024-09-18 19:32:33,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=807381.0, ans=0.125 2024-09-18 19:32:48,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=807409.3333333334, ans=0.0 2024-09-18 19:33:04,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=807437.6666666666, ans=0.0 2024-09-18 19:33:10,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=807437.6666666666, ans=0.125 2024-09-18 19:33:31,270 INFO [train.py:1198] (0/2) Epoch 45, batch 3800, loss[loss=0.2121, ctc_loss=0.1406, cr_loss=0.3577, over 20883.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3687, over 4096195.88 frames. ], batch size: 54, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:33:43,700 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.188e+02 2.339e+02 2.511e+02 3.025e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-18 19:33:55,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2024-09-18 19:34:18,361 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2024-09-18 19:34:47,564 INFO [train.py:1198] (0/2) Epoch 45, batch 3850, loss[loss=0.2523, ctc_loss=0.1687, cr_loss=0.4177, over 18326.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1433, cr_loss=0.3697, over 4092274.30 frames. ], batch size: 108, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:34:48,062 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:35:06,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.43 vs. limit=15.0 2024-09-18 19:35:54,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=807749.3333333334, ans=0.125 2024-09-18 19:36:05,650 INFO [train.py:1198] (0/2) Epoch 45, batch 3900, loss[loss=0.2472, ctc_loss=0.1656, cr_loss=0.4079, over 18440.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1448, cr_loss=0.372, over 4093609.61 frames. ], batch size: 108, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:36:17,893 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.294e+02 2.399e+02 2.557e+02 3.460e+02, threshold=4.798e+02, percent-clipped=0.0 2024-09-18 19:36:21,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=807806.0, ans=0.125 2024-09-18 19:37:05,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=807862.6666666666, ans=0.125 2024-09-18 19:37:10,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=807891.0, ans=0.1 2024-09-18 19:37:24,233 INFO [train.py:1198] (0/2) Epoch 45, batch 3950, loss[loss=0.1875, ctc_loss=0.1234, cr_loss=0.3207, over 20955.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1447, cr_loss=0.3711, over 4097003.76 frames. ], batch size: 48, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:37:35,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=807919.3333333334, ans=0.2 2024-09-18 19:37:40,071 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:38:08,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=808004.3333333334, ans=0.0 2024-09-18 19:38:11,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=808004.3333333334, ans=0.2 2024-09-18 19:38:40,561 INFO [train.py:1198] (0/2) Epoch 45, batch 4000, loss[loss=0.2346, ctc_loss=0.1574, cr_loss=0.3857, over 21020.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1439, cr_loss=0.37, over 4110001.13 frames. ], batch size: 63, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:38:52,765 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.284e+02 2.406e+02 2.540e+02 3.120e+02, threshold=4.813e+02, percent-clipped=0.0 2024-09-18 19:38:53,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=808061.0, ans=0.125 2024-09-18 19:38:56,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=808089.3333333334, ans=0.025 2024-09-18 19:39:06,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=808089.3333333334, ans=0.1 2024-09-18 19:39:35,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=808146.0, ans=0.0 2024-09-18 19:39:53,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=808174.3333333334, ans=0.125 2024-09-18 19:39:56,068 INFO [train.py:1198] (0/2) Epoch 45, batch 4050, loss[loss=0.2285, ctc_loss=0.1551, cr_loss=0.3671, over 21008.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.3687, over 4113771.84 frames. ], batch size: 61, lr: 1.83e-03, grad_scale: 32.0 2024-09-18 19:40:25,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=808259.3333333334, ans=0.04949747468305833 2024-09-18 19:40:50,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=808287.6666666666, ans=0.0 2024-09-18 19:41:05,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=808316.0, ans=0.125 2024-09-18 19:41:11,469 INFO [train.py:1198] (0/2) Epoch 45, batch 4100, loss[loss=0.2312, ctc_loss=0.1527, cr_loss=0.3923, over 21074.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3683, over 4114164.10 frames. ], batch size: 59, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:41:24,933 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.233e+02 2.357e+02 2.491e+02 5.288e+02, threshold=4.714e+02, percent-clipped=1.0 2024-09-18 19:42:32,243 INFO [train.py:1198] (0/2) Epoch 45, batch 4150, loss[loss=0.2, ctc_loss=0.1316, cr_loss=0.3417, over 20979.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1427, cr_loss=0.3687, over 4110741.09 frames. ], batch size: 52, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:42:38,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=808486.0, ans=0.95 2024-09-18 19:42:39,124 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2024-09-18 19:43:01,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.63 vs. limit=10.0 2024-09-18 19:43:20,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808571.0, ans=0.1 2024-09-18 19:43:28,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=808571.0, ans=0.125 2024-09-18 19:43:33,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=808599.3333333334, ans=0.0 2024-09-18 19:43:47,736 INFO [train.py:1198] (0/2) Epoch 45, batch 4200, loss[loss=0.2196, ctc_loss=0.1413, cr_loss=0.3916, over 21050.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3691, over 4109736.57 frames. ], batch size: 56, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:43:54,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=808627.6666666666, ans=0.1 2024-09-18 19:43:57,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=808627.6666666666, ans=0.125 2024-09-18 19:44:01,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.247e+02 2.404e+02 2.517e+02 7.160e+02, threshold=4.809e+02, percent-clipped=1.0 2024-09-18 19:44:37,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.51 vs. limit=22.5 2024-09-18 19:44:53,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=808741.0, ans=0.0 2024-09-18 19:45:03,400 INFO [train.py:1198] (0/2) Epoch 45, batch 4250, loss[loss=0.2042, ctc_loss=0.1366, cr_loss=0.3378, over 20973.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3689, over 4097291.55 frames. ], batch size: 58, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:45:21,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=808797.6666666666, ans=0.0 2024-09-18 19:45:23,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=808797.6666666666, ans=0.125 2024-09-18 19:45:29,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2024-09-18 19:45:46,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2024-09-18 19:45:47,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=808854.3333333334, ans=0.125 2024-09-18 19:46:02,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=808882.6666666666, ans=0.2 2024-09-18 19:46:18,555 INFO [train.py:1198] (0/2) Epoch 45, batch 4300, loss[loss=0.1997, ctc_loss=0.1293, cr_loss=0.3521, over 20957.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1432, cr_loss=0.3695, over 4092436.47 frames. ], batch size: 49, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:46:30,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=808911.0, ans=0.125 2024-09-18 19:46:32,141 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.027e+02 2.198e+02 2.379e+02 2.525e+02 3.771e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-18 19:46:47,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=808967.6666666666, ans=0.025 2024-09-18 19:47:06,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808996.0, ans=0.1 2024-09-18 19:47:11,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=808996.0, ans=0.0 2024-09-18 19:47:14,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=808996.0, ans=0.1 2024-09-18 19:47:17,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-09-18 19:47:21,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=809024.3333333334, ans=0.0 2024-09-18 19:47:36,191 INFO [train.py:1198] (0/2) Epoch 45, batch 4350, loss[loss=0.2088, ctc_loss=0.1388, cr_loss=0.3503, over 20977.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1438, cr_loss=0.3702, over 4088049.81 frames. ], batch size: 67, lr: 1.83e-03, grad_scale: 16.0 2024-09-18 19:47:45,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=809052.6666666666, ans=0.0 2024-09-18 19:48:09,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=809109.3333333334, ans=0.025 2024-09-18 19:48:41,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.58 vs. limit=10.0 2024-09-18 19:48:47,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=809166.0, ans=0.2 2024-09-18 19:48:54,677 INFO [train.py:1198] (0/2) Epoch 45, batch 4400, loss[loss=0.2538, ctc_loss=0.1758, cr_loss=0.39, over 18368.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3688, over 4095514.83 frames. ], batch size: 108, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 19:49:02,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=809194.3333333334, ans=0.0 2024-09-18 19:49:08,521 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.285e+02 2.400e+02 2.563e+02 6.544e+02, threshold=4.799e+02, percent-clipped=1.0 2024-09-18 19:49:31,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=809251.0, ans=0.125 2024-09-18 19:49:42,261 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-18 19:50:10,282 INFO [train.py:1198] (0/2) Epoch 45, batch 4450, loss[loss=0.2317, ctc_loss=0.1533, cr_loss=0.3919, over 20653.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1424, cr_loss=0.3684, over 4107155.15 frames. ], batch size: 68, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 19:50:51,813 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2024-09-18 19:51:02,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=809421.0, ans=0.0 2024-09-18 19:51:02,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809421.0, ans=0.1 2024-09-18 19:51:10,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.04 vs. limit=12.0 2024-09-18 19:51:26,095 INFO [train.py:1198] (0/2) Epoch 45, batch 4500, loss[loss=0.2352, ctc_loss=0.1576, cr_loss=0.388, over 20673.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3676, over 4095448.38 frames. ], batch size: 66, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 19:51:32,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=809477.6666666666, ans=0.125 2024-09-18 19:51:39,820 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.214e+02 2.367e+02 2.533e+02 3.449e+02, threshold=4.733e+02, percent-clipped=0.0 2024-09-18 19:51:49,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=809506.0, ans=0.125 2024-09-18 19:52:42,003 INFO [train.py:1198] (0/2) Epoch 45, batch 4550, loss[loss=0.2008, ctc_loss=0.1333, cr_loss=0.3375, over 20963.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3666, over 4100866.76 frames. ], batch size: 51, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 19:53:19,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=809676.0, ans=0.125 2024-09-18 19:53:20,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=809676.0, ans=0.015 2024-09-18 19:53:23,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=809676.0, ans=0.125 2024-09-18 19:53:34,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=809704.3333333334, ans=0.125 2024-09-18 19:53:42,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=809704.3333333334, ans=0.04949747468305833 2024-09-18 19:54:02,374 INFO [train.py:1198] (0/2) Epoch 45, batch 4600, loss[loss=0.2116, ctc_loss=0.1407, cr_loss=0.3541, over 20847.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3683, over 4092769.14 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 19:54:12,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-18 19:54:17,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.254e+02 2.428e+02 2.643e+02 4.087e+02, threshold=4.856e+02, percent-clipped=0.0 2024-09-18 19:55:17,725 INFO [train.py:1198] (0/2) Epoch 45, batch 4650, loss[loss=0.2062, ctc_loss=0.1348, cr_loss=0.3569, over 20802.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.368, over 4096516.05 frames. ], batch size: 53, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 19:55:38,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=809931.0, ans=0.125 2024-09-18 19:56:15,428 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-18 19:56:32,824 INFO [train.py:1198] (0/2) Epoch 45, batch 4700, loss[loss=0.2286, ctc_loss=0.1516, cr_loss=0.3851, over 20380.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1423, cr_loss=0.3669, over 4101122.17 frames. ], batch size: 74, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 19:56:36,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=810044.3333333334, ans=0.025 2024-09-18 19:56:47,761 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 2.266e+02 2.415e+02 2.594e+02 3.766e+02, threshold=4.830e+02, percent-clipped=0.0 2024-09-18 19:56:55,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=810072.6666666666, ans=0.125 2024-09-18 19:57:09,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=810101.0, ans=0.125 2024-09-18 19:57:26,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-09-18 19:57:35,596 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2024-09-18 19:57:48,035 INFO [train.py:1198] (0/2) Epoch 45, batch 4750, loss[loss=0.1623, ctc_loss=0.1032, cr_loss=0.2955, over 20932.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1427, cr_loss=0.3682, over 4099820.12 frames. ], batch size: 49, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 19:58:06,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=810214.3333333334, ans=0.125 2024-09-18 19:58:15,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=810214.3333333334, ans=0.125 2024-09-18 19:58:16,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=810242.6666666666, ans=0.0 2024-09-18 19:59:06,126 INFO [train.py:1198] (0/2) Epoch 45, batch 4800, loss[loss=0.1876, ctc_loss=0.1246, cr_loss=0.3153, over 20942.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1419, cr_loss=0.3664, over 4095679.72 frames. ], batch size: 50, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 19:59:07,025 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=22.5 2024-09-18 19:59:24,218 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.260e+02 2.378e+02 2.546e+02 4.963e+02, threshold=4.756e+02, percent-clipped=1.0 2024-09-18 19:59:29,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=810356.0, ans=0.0 2024-09-18 19:59:50,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=810384.3333333334, ans=0.2 2024-09-18 20:00:11,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=810441.0, ans=0.0 2024-09-18 20:00:11,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2024-09-18 20:00:16,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=810441.0, ans=0.125 2024-09-18 20:00:24,605 INFO [train.py:1198] (0/2) Epoch 45, batch 4850, loss[loss=0.189, ctc_loss=0.1208, cr_loss=0.3411, over 20966.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1416, cr_loss=0.366, over 4096954.20 frames. ], batch size: 51, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:00:41,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=810497.6666666666, ans=0.125 2024-09-18 20:00:52,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=810497.6666666666, ans=0.125 2024-09-18 20:01:30,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=810582.6666666666, ans=0.125 2024-09-18 20:01:39,103 INFO [train.py:1198] (0/2) Epoch 45, batch 4900, loss[loss=0.2087, ctc_loss=0.1357, cr_loss=0.3648, over 20873.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3674, over 4105847.67 frames. ], batch size: 54, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:01:53,641 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.234e+02 2.384e+02 2.527e+02 3.984e+02, threshold=4.768e+02, percent-clipped=0.0 2024-09-18 20:02:21,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=810667.6666666666, ans=0.125 2024-09-18 20:02:25,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=810696.0, ans=0.0 2024-09-18 20:02:53,891 INFO [train.py:1198] (0/2) Epoch 45, batch 4950, loss[loss=0.2264, ctc_loss=0.1495, cr_loss=0.3849, over 21080.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1414, cr_loss=0.3663, over 4107885.58 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:03:04,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=810752.6666666666, ans=0.025 2024-09-18 20:03:17,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=810781.0, ans=0.125 2024-09-18 20:03:41,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=810837.6666666666, ans=0.125 2024-09-18 20:03:53,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=810866.0, ans=0.125 2024-09-18 20:03:53,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=810866.0, ans=0.125 2024-09-18 20:04:07,717 INFO [train.py:1198] (0/2) Epoch 45, batch 5000, loss[loss=0.2407, ctc_loss=0.1575, cr_loss=0.416, over 20866.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3673, over 4107322.16 frames. ], batch size: 65, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:04:22,515 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 2.233e+02 2.353e+02 2.590e+02 4.290e+02, threshold=4.707e+02, percent-clipped=0.0 2024-09-18 20:04:30,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=810922.6666666666, ans=0.1 2024-09-18 20:04:48,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=12.0 2024-09-18 20:04:57,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=12.0 2024-09-18 20:05:01,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=810979.3333333334, ans=0.05 2024-09-18 20:05:04,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=810979.3333333334, ans=0.0 2024-09-18 20:05:22,206 INFO [train.py:1198] (0/2) Epoch 45, batch 5050, loss[loss=0.2309, ctc_loss=0.1547, cr_loss=0.3808, over 20960.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3678, over 4108240.67 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:05:28,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=811036.0, ans=0.07 2024-09-18 20:05:53,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=811092.6666666666, ans=0.125 2024-09-18 20:06:08,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=811121.0, ans=0.125 2024-09-18 20:06:13,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=15.0 2024-09-18 20:06:14,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=811121.0, ans=0.5 2024-09-18 20:06:35,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=811177.6666666666, ans=0.2 2024-09-18 20:06:37,106 INFO [train.py:1198] (0/2) Epoch 45, batch 5100, loss[loss=0.2354, ctc_loss=0.1566, cr_loss=0.3942, over 21024.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1425, cr_loss=0.3674, over 4106403.81 frames. ], batch size: 62, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 20:06:50,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=811206.0, ans=0.5 2024-09-18 20:06:53,217 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.268e+02 2.408e+02 2.549e+02 3.406e+02, threshold=4.815e+02, percent-clipped=0.0 2024-09-18 20:06:53,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=811206.0, ans=0.125 2024-09-18 20:06:59,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=811206.0, ans=0.025 2024-09-18 20:07:12,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=811234.3333333334, ans=0.1 2024-09-18 20:07:13,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=811234.3333333334, ans=0.0 2024-09-18 20:07:32,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=811262.6666666666, ans=0.125 2024-09-18 20:07:46,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=811291.0, ans=0.125 2024-09-18 20:07:53,669 INFO [train.py:1198] (0/2) Epoch 45, batch 5150, loss[loss=0.2212, ctc_loss=0.1462, cr_loss=0.3751, over 20971.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3684, over 4103217.94 frames. ], batch size: 58, lr: 1.82e-03, grad_scale: 16.0 2024-09-18 20:08:30,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=811376.0, ans=10.0 2024-09-18 20:09:09,985 INFO [train.py:1198] (0/2) Epoch 45, batch 5200, loss[loss=0.1991, ctc_loss=0.1297, cr_loss=0.3466, over 20956.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3684, over 4104783.32 frames. ], batch size: 50, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:09:23,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=811489.3333333334, ans=0.025 2024-09-18 20:09:26,183 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.256e+02 2.414e+02 2.581e+02 3.470e+02, threshold=4.828e+02, percent-clipped=0.0 2024-09-18 20:10:24,220 INFO [train.py:1198] (0/2) Epoch 45, batch 5250, loss[loss=0.2101, ctc_loss=0.1358, cr_loss=0.3717, over 20829.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3677, over 4093383.60 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:11:05,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811659.3333333334, ans=0.1 2024-09-18 20:11:16,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=811687.6666666666, ans=0.2 2024-09-18 20:11:39,064 INFO [train.py:1198] (0/2) Epoch 45, batch 5300, loss[loss=0.2347, ctc_loss=0.156, cr_loss=0.3936, over 20866.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1419, cr_loss=0.3666, over 4105584.13 frames. ], batch size: 57, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:11:40,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=811744.3333333334, ans=0.125 2024-09-18 20:11:51,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=811744.3333333334, ans=0.2 2024-09-18 20:11:55,662 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.221e+02 2.355e+02 2.478e+02 3.018e+02, threshold=4.710e+02, percent-clipped=0.0 2024-09-18 20:12:36,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=811829.3333333334, ans=0.125 2024-09-18 20:12:37,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=811857.6666666666, ans=0.2 2024-09-18 20:12:42,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=22.5 2024-09-18 20:12:53,891 INFO [train.py:1198] (0/2) Epoch 45, batch 5350, loss[loss=0.1902, ctc_loss=0.1239, cr_loss=0.3315, over 19966.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1418, cr_loss=0.366, over 4096916.88 frames. ], batch size: 44, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:12:54,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=811886.0, ans=0.125 2024-09-18 20:13:48,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2024-09-18 20:13:54,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=811999.3333333334, ans=0.0 2024-09-18 20:14:08,683 INFO [train.py:1198] (0/2) Epoch 45, batch 5400, loss[loss=0.2256, ctc_loss=0.1505, cr_loss=0.3759, over 21017.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3674, over 4101260.42 frames. ], batch size: 61, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:14:19,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=812027.6666666666, ans=0.125 2024-09-18 20:14:24,858 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.291e+02 2.455e+02 2.611e+02 3.469e+02, threshold=4.910e+02, percent-clipped=0.0 2024-09-18 20:14:44,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=812084.3333333334, ans=0.125 2024-09-18 20:14:49,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=812084.3333333334, ans=0.0 2024-09-18 20:14:56,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812112.6666666666, ans=0.1 2024-09-18 20:14:58,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=812112.6666666666, ans=0.0 2024-09-18 20:15:22,568 INFO [train.py:1198] (0/2) Epoch 45, batch 5450, loss[loss=0.2094, ctc_loss=0.137, cr_loss=0.3621, over 20884.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3674, over 4092971.44 frames. ], batch size: 54, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:15:37,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=812197.6666666666, ans=0.0 2024-09-18 20:15:58,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=812226.0, ans=0.125 2024-09-18 20:16:05,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=812254.3333333334, ans=0.0 2024-09-18 20:16:20,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=812282.6666666666, ans=0.125 2024-09-18 20:16:24,959 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-09-18 20:16:36,183 INFO [train.py:1198] (0/2) Epoch 45, batch 5500, loss[loss=0.2076, ctc_loss=0.1353, cr_loss=0.3614, over 20983.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1426, cr_loss=0.367, over 4097214.25 frames. ], batch size: 48, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:16:52,071 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.240e+02 2.378e+02 2.585e+02 3.655e+02, threshold=4.755e+02, percent-clipped=0.0 2024-09-18 20:17:38,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=812424.3333333334, ans=0.025 2024-09-18 20:17:40,337 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-09-18 20:17:54,291 INFO [train.py:1198] (0/2) Epoch 45, batch 5550, loss[loss=0.2221, ctc_loss=0.1475, cr_loss=0.373, over 20849.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1428, cr_loss=0.3676, over 4091958.31 frames. ], batch size: 65, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:17:57,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=812452.6666666666, ans=0.0 2024-09-18 20:18:00,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=812452.6666666666, ans=0.2 2024-09-18 20:18:12,450 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:18:14,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.18 vs. limit=15.0 2024-09-18 20:18:53,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=812566.0, ans=0.125 2024-09-18 20:19:01,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=812566.0, ans=0.125 2024-09-18 20:19:01,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=812566.0, ans=0.0 2024-09-18 20:19:03,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=812566.0, ans=0.2 2024-09-18 20:19:05,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=812566.0, ans=0.0 2024-09-18 20:19:08,123 INFO [train.py:1198] (0/2) Epoch 45, batch 5600, loss[loss=0.2135, ctc_loss=0.1429, cr_loss=0.3533, over 20782.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1434, cr_loss=0.3683, over 4081725.43 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:19:24,960 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.261e+02 2.398e+02 2.555e+02 3.909e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-18 20:20:17,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=812707.6666666666, ans=0.125 2024-09-18 20:20:20,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=812736.0, ans=0.025 2024-09-18 20:20:21,892 INFO [train.py:1198] (0/2) Epoch 45, batch 5650, loss[loss=0.2104, ctc_loss=0.1377, cr_loss=0.3637, over 20910.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.143, cr_loss=0.3679, over 4083174.35 frames. ], batch size: 57, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:20:23,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.24 vs. limit=15.0 2024-09-18 20:20:36,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=812764.3333333334, ans=0.0 2024-09-18 20:21:02,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=812792.6666666666, ans=0.0 2024-09-18 20:21:21,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=812849.3333333334, ans=0.0 2024-09-18 20:21:24,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.63 vs. limit=15.0 2024-09-18 20:21:36,223 INFO [train.py:1198] (0/2) Epoch 45, batch 5700, loss[loss=0.2079, ctc_loss=0.1371, cr_loss=0.3537, over 21078.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1418, cr_loss=0.3656, over 4094899.28 frames. ], batch size: 59, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:21:38,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=812877.6666666666, ans=0.2 2024-09-18 20:21:52,486 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.280e+02 2.407e+02 2.556e+02 3.733e+02, threshold=4.814e+02, percent-clipped=0.0 2024-09-18 20:22:25,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=812962.6666666666, ans=0.025 2024-09-18 20:22:33,133 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:22:48,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812991.0, ans=0.1 2024-09-18 20:22:50,897 INFO [train.py:1198] (0/2) Epoch 45, batch 5750, loss[loss=0.2022, ctc_loss=0.1311, cr_loss=0.3551, over 21060.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1419, cr_loss=0.366, over 4100409.43 frames. ], batch size: 53, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:22:55,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=813019.3333333334, ans=0.125 2024-09-18 20:23:02,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=813019.3333333334, ans=0.125 2024-09-18 20:23:31,758 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.55 vs. limit=15.0 2024-09-18 20:23:36,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-18 20:24:01,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=813132.6666666666, ans=0.0 2024-09-18 20:24:01,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=813132.6666666666, ans=0.2 2024-09-18 20:24:05,229 INFO [train.py:1198] (0/2) Epoch 45, batch 5800, loss[loss=0.239, ctc_loss=0.1581, cr_loss=0.4044, over 21005.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1418, cr_loss=0.3659, over 4091644.02 frames. ], batch size: 61, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:24:21,681 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.265e+02 2.380e+02 2.521e+02 3.588e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 20:24:30,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=813189.3333333334, ans=0.125 2024-09-18 20:25:01,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=813246.0, ans=0.0 2024-09-18 20:25:19,089 INFO [train.py:1198] (0/2) Epoch 45, batch 5850, loss[loss=0.2396, ctc_loss=0.1623, cr_loss=0.3863, over 20308.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1417, cr_loss=0.3654, over 4094765.79 frames. ], batch size: 74, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:25:20,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=813302.6666666666, ans=0.125 2024-09-18 20:25:46,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=813331.0, ans=0.125 2024-09-18 20:25:48,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-09-18 20:25:53,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2024-09-18 20:26:04,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=813359.3333333334, ans=0.125 2024-09-18 20:26:28,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-09-18 20:26:37,318 INFO [train.py:1198] (0/2) Epoch 45, batch 5900, loss[loss=0.2413, ctc_loss=0.164, cr_loss=0.3862, over 13893.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1416, cr_loss=0.3656, over 4090752.90 frames. ], batch size: 149, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:26:48,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=813444.3333333334, ans=0.125 2024-09-18 20:26:53,914 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.229e+02 2.390e+02 2.560e+02 6.063e+02, threshold=4.781e+02, percent-clipped=1.0 2024-09-18 20:26:58,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=813472.6666666666, ans=0.0 2024-09-18 20:27:44,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=813557.6666666666, ans=0.025 2024-09-18 20:27:52,136 INFO [train.py:1198] (0/2) Epoch 45, batch 5950, loss[loss=0.1876, ctc_loss=0.1239, cr_loss=0.3184, over 20964.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3662, over 4099055.16 frames. ], batch size: 51, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:28:03,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=22.5 2024-09-18 20:28:05,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=813614.3333333334, ans=0.0 2024-09-18 20:28:07,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=813614.3333333334, ans=0.0 2024-09-18 20:28:14,745 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:28:14,756 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:28:17,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=813614.3333333334, ans=0.0 2024-09-18 20:28:57,981 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:29:05,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813727.6666666666, ans=0.1 2024-09-18 20:29:06,639 INFO [train.py:1198] (0/2) Epoch 45, batch 6000, loss[loss=0.1758, ctc_loss=0.1116, cr_loss=0.3212, over 21080.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1412, cr_loss=0.3649, over 4085746.31 frames. ], batch size: 53, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:29:06,640 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 20:29:23,922 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2460, 2.2912, 3.1197, 2.2740], device='cuda:0') 2024-09-18 20:29:26,288 INFO [train.py:1230] (0/2) Epoch 45, validation: loss=0.03935, ctc_loss=0.03935, cr_loss=1.582e-14, over 944034.00 frames. 2024-09-18 20:29:26,288 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 20:29:30,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-09-18 20:29:42,660 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 2.282e+02 2.411e+02 2.557e+02 5.115e+02, threshold=4.821e+02, percent-clipped=1.0 2024-09-18 20:30:08,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=813784.3333333334, ans=0.0 2024-09-18 20:30:20,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813812.6666666666, ans=0.1 2024-09-18 20:30:28,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-18 20:30:41,115 INFO [train.py:1198] (0/2) Epoch 45, batch 6050, loss[loss=0.2239, ctc_loss=0.1471, cr_loss=0.3836, over 20698.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1418, cr_loss=0.3659, over 4085593.73 frames. ], batch size: 68, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:31:03,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=813897.6666666666, ans=0.125 2024-09-18 20:31:15,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=813926.0, ans=0.025 2024-09-18 20:31:19,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=813926.0, ans=0.05 2024-09-18 20:31:54,548 INFO [train.py:1198] (0/2) Epoch 45, batch 6100, loss[loss=0.2257, ctc_loss=0.1506, cr_loss=0.3758, over 21023.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1427, cr_loss=0.3675, over 4080525.90 frames. ], batch size: 61, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:32:00,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=814011.0, ans=0.0 2024-09-18 20:32:02,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=814011.0, ans=0.125 2024-09-18 20:32:09,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=814039.3333333334, ans=0.2 2024-09-18 20:32:10,832 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.295e+02 2.406e+02 2.594e+02 3.498e+02, threshold=4.813e+02, percent-clipped=0.0 2024-09-18 20:32:19,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=814039.3333333334, ans=0.0 2024-09-18 20:33:08,157 INFO [train.py:1198] (0/2) Epoch 45, batch 6150, loss[loss=0.1908, ctc_loss=0.1264, cr_loss=0.3222, over 20956.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1432, cr_loss=0.3688, over 4086796.99 frames. ], batch size: 52, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:33:51,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=814209.3333333334, ans=0.125 2024-09-18 20:34:10,839 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:34:23,562 INFO [train.py:1198] (0/2) Epoch 45, batch 6200, loss[loss=0.2313, ctc_loss=0.1538, cr_loss=0.3874, over 21064.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1441, cr_loss=0.3701, over 4056871.28 frames. ], batch size: 56, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:34:40,093 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.066e+02 2.309e+02 2.445e+02 2.676e+02 3.869e+02, threshold=4.890e+02, percent-clipped=0.0 2024-09-18 20:34:51,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=814351.0, ans=0.2 2024-09-18 20:34:52,674 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=22.5 2024-09-18 20:34:58,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=814351.0, ans=0.125 2024-09-18 20:35:35,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.59 vs. limit=10.0 2024-09-18 20:35:38,117 INFO [train.py:1198] (0/2) Epoch 45, batch 6250, loss[loss=0.185, ctc_loss=0.1204, cr_loss=0.3234, over 20951.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1429, cr_loss=0.3675, over 4042970.71 frames. ], batch size: 48, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:35:42,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=814436.0, ans=0.025 2024-09-18 20:35:51,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=814464.3333333334, ans=0.0 2024-09-18 20:36:05,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814464.3333333334, ans=0.1 2024-09-18 20:36:17,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=814492.6666666666, ans=0.125 2024-09-18 20:36:21,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=814521.0, ans=0.025 2024-09-18 20:36:34,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=814521.0, ans=0.05 2024-09-18 20:36:39,268 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=22.5 2024-09-18 20:36:41,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=814549.3333333334, ans=0.125 2024-09-18 20:36:51,897 INFO [train.py:1198] (0/2) Epoch 45, batch 6300, loss[loss=0.2802, ctc_loss=0.1968, cr_loss=0.4169, over 14048.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1447, cr_loss=0.3701, over 4031697.57 frames. ], batch size: 149, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:36:52,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814577.6666666666, ans=0.1 2024-09-18 20:37:04,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814577.6666666666, ans=0.1 2024-09-18 20:37:07,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=814606.0, ans=0.035 2024-09-18 20:37:08,493 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.338e+02 2.474e+02 2.662e+02 4.819e+02, threshold=4.947e+02, percent-clipped=0.0 2024-09-18 20:37:39,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=814662.6666666666, ans=0.125 2024-09-18 20:37:45,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=814662.6666666666, ans=0.2 2024-09-18 20:37:48,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=814662.6666666666, ans=0.125 2024-09-18 20:38:07,082 INFO [train.py:1198] (0/2) Epoch 45, batch 6350, loss[loss=0.2561, ctc_loss=0.1779, cr_loss=0.3909, over 14204.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1469, cr_loss=0.3719, over 3948528.39 frames. ], batch size: 149, lr: 1.82e-03, grad_scale: 32.0 2024-09-18 20:38:07,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=814719.3333333334, ans=0.0 2024-09-18 20:38:07,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=814719.3333333334, ans=0.125 2024-09-18 20:38:21,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=814747.6666666666, ans=0.125 2024-09-18 20:38:21,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=814747.6666666666, ans=0.025 2024-09-18 20:38:41,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=814776.0, ans=0.0 2024-09-18 20:38:48,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=814804.3333333334, ans=10.0 2024-09-18 20:38:51,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=814804.3333333334, ans=0.125 2024-09-18 20:39:05,044 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-45.pt 2024-09-18 20:39:50,615 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:39:50,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-18 20:39:51,726 INFO [train.py:1198] (0/2) Epoch 46, batch 0, loss[loss=0.2586, ctc_loss=0.1698, cr_loss=0.4439, over 20837.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1698, cr_loss=0.4439, over 20837.00 frames. ], batch size: 65, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:39:51,728 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 20:40:09,784 INFO [train.py:1230] (0/2) Epoch 46, validation: loss=0.03873, ctc_loss=0.03873, cr_loss=1.576e-14, over 944034.00 frames. 2024-09-18 20:40:09,785 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 20:40:30,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=814863.8333333334, ans=10.0 2024-09-18 20:40:44,899 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.444e+02 2.730e+02 2.944e+02 3.654e+02, threshold=5.460e+02, percent-clipped=0.0 2024-09-18 20:40:48,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814892.1666666666, ans=0.1 2024-09-18 20:40:57,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=814892.1666666666, ans=0.125 2024-09-18 20:41:30,204 INFO [train.py:1198] (0/2) Epoch 46, batch 50, loss[loss=0.2197, ctc_loss=0.1445, cr_loss=0.3759, over 20888.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3694, over 924277.91 frames. ], batch size: 57, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:42:38,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=815090.5, ans=0.2 2024-09-18 20:42:45,888 INFO [train.py:1198] (0/2) Epoch 46, batch 100, loss[loss=0.2102, ctc_loss=0.1383, cr_loss=0.3594, over 20771.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1414, cr_loss=0.3669, over 1629874.93 frames. ], batch size: 53, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:43:15,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.35 vs. limit=15.0 2024-09-18 20:43:16,027 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.193e+02 2.400e+02 2.544e+02 3.395e+02, threshold=4.800e+02, percent-clipped=0.0 2024-09-18 20:43:21,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=12.0 2024-09-18 20:43:27,260 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-18 20:43:42,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=815203.8333333334, ans=0.09899494936611666 2024-09-18 20:44:01,122 INFO [train.py:1198] (0/2) Epoch 46, batch 150, loss[loss=0.2323, ctc_loss=0.1546, cr_loss=0.3886, over 20888.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3676, over 2178834.37 frames. ], batch size: 57, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:44:06,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=12.0 2024-09-18 20:44:19,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=815288.8333333334, ans=0.0 2024-09-18 20:44:34,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=815317.1666666666, ans=0.125 2024-09-18 20:44:40,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=815317.1666666666, ans=0.035 2024-09-18 20:44:48,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=815345.5, ans=0.0 2024-09-18 20:44:51,593 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:45:16,408 INFO [train.py:1198] (0/2) Epoch 46, batch 200, loss[loss=0.2037, ctc_loss=0.1307, cr_loss=0.3647, over 20957.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1411, cr_loss=0.3671, over 2608056.48 frames. ], batch size: 55, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:45:31,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=815430.5, ans=0.0 2024-09-18 20:45:46,301 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.198e+02 2.358e+02 2.505e+02 4.198e+02, threshold=4.716e+02, percent-clipped=0.0 2024-09-18 20:45:58,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=815458.8333333334, ans=0.125 2024-09-18 20:46:22,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-18 20:46:26,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=815515.5, ans=0.025 2024-09-18 20:46:33,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=815515.5, ans=0.125 2024-09-18 20:46:36,429 INFO [train.py:1198] (0/2) Epoch 46, batch 250, loss[loss=0.2395, ctc_loss=0.1582, cr_loss=0.4067, over 19486.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1412, cr_loss=0.3671, over 2944849.20 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:46:59,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=815572.1666666666, ans=0.125 2024-09-18 20:46:59,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.99 vs. limit=15.0 2024-09-18 20:47:18,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=815600.5, ans=0.0 2024-09-18 20:47:31,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=815628.8333333334, ans=0.125 2024-09-18 20:47:42,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=815657.1666666666, ans=0.0 2024-09-18 20:47:50,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=815685.5, ans=0.025 2024-09-18 20:47:51,346 INFO [train.py:1198] (0/2) Epoch 46, batch 300, loss[loss=0.2261, ctc_loss=0.1522, cr_loss=0.3694, over 20821.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.142, cr_loss=0.3678, over 3193488.99 frames. ], batch size: 65, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:48:14,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=815713.8333333334, ans=0.0 2024-09-18 20:48:21,446 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.253e+02 2.352e+02 2.520e+02 3.176e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 20:48:29,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=815742.1666666666, ans=0.125 2024-09-18 20:48:56,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=815798.8333333334, ans=0.95 2024-09-18 20:49:05,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815827.1666666666, ans=0.1 2024-09-18 20:49:06,779 INFO [train.py:1198] (0/2) Epoch 46, batch 350, loss[loss=0.2158, ctc_loss=0.1433, cr_loss=0.3627, over 20985.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.3679, over 3401742.92 frames. ], batch size: 55, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:49:11,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=22.5 2024-09-18 20:49:16,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=815827.1666666666, ans=0.2 2024-09-18 20:49:43,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=815883.8333333334, ans=0.125 2024-09-18 20:49:53,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=815912.1666666666, ans=0.0 2024-09-18 20:49:53,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815912.1666666666, ans=0.1 2024-09-18 20:50:08,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=815940.5, ans=0.0 2024-09-18 20:50:11,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=815940.5, ans=0.125 2024-09-18 20:50:16,738 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2024-09-18 20:50:22,118 INFO [train.py:1198] (0/2) Epoch 46, batch 400, loss[loss=0.2235, ctc_loss=0.1474, cr_loss=0.3801, over 21085.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3676, over 3547793.38 frames. ], batch size: 53, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:50:31,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=815968.8333333334, ans=0.125 2024-09-18 20:50:37,395 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-288000.pt 2024-09-18 20:50:43,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815997.1666666666, ans=0.1 2024-09-18 20:50:53,613 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.215e+02 2.369e+02 2.519e+02 3.261e+02, threshold=4.738e+02, percent-clipped=0.0 2024-09-18 20:50:58,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=816025.5, ans=0.125 2024-09-18 20:51:04,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=816025.5, ans=0.2 2024-09-18 20:51:20,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=816053.8333333334, ans=0.0 2024-09-18 20:51:39,043 INFO [train.py:1198] (0/2) Epoch 46, batch 450, loss[loss=0.2211, ctc_loss=0.1482, cr_loss=0.3642, over 21073.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1411, cr_loss=0.3656, over 3664861.66 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:52:46,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=816223.8333333334, ans=0.125 2024-09-18 20:52:59,645 INFO [train.py:1198] (0/2) Epoch 46, batch 500, loss[loss=0.2139, ctc_loss=0.138, cr_loss=0.3792, over 21064.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1414, cr_loss=0.3658, over 3768022.73 frames. ], batch size: 53, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:53:29,692 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 2.255e+02 2.347e+02 2.535e+02 3.411e+02, threshold=4.694e+02, percent-clipped=0.0 2024-09-18 20:54:10,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=816365.5, ans=0.025 2024-09-18 20:54:14,664 INFO [train.py:1198] (0/2) Epoch 46, batch 550, loss[loss=0.2088, ctc_loss=0.1358, cr_loss=0.365, over 20933.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1419, cr_loss=0.3667, over 3830554.78 frames. ], batch size: 60, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:54:19,777 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:54:37,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=816422.1666666666, ans=0.5 2024-09-18 20:54:43,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816450.5, ans=0.1 2024-09-18 20:54:51,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816450.5, ans=0.1 2024-09-18 20:54:58,867 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-09-18 20:55:01,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=816478.8333333334, ans=0.0 2024-09-18 20:55:04,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=816478.8333333334, ans=0.2 2024-09-18 20:55:08,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=816478.8333333334, ans=0.2 2024-09-18 20:55:25,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-18 20:55:29,697 INFO [train.py:1198] (0/2) Epoch 46, batch 600, loss[loss=0.2136, ctc_loss=0.1394, cr_loss=0.3707, over 21017.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1423, cr_loss=0.3671, over 3884116.94 frames. ], batch size: 61, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:55:33,131 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:55:34,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=816535.5, ans=0.125 2024-09-18 20:55:55,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=816563.8333333334, ans=0.125 2024-09-18 20:55:59,258 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.252e+02 2.397e+02 2.572e+02 4.425e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 20:56:22,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=816620.5, ans=0.5 2024-09-18 20:56:25,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=816620.5, ans=0.025 2024-09-18 20:56:27,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=816620.5, ans=0.0 2024-09-18 20:56:34,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=816648.8333333334, ans=0.125 2024-09-18 20:56:44,596 INFO [train.py:1198] (0/2) Epoch 46, batch 650, loss[loss=0.2511, ctc_loss=0.1665, cr_loss=0.4227, over 20968.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3679, over 3932814.70 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 32.0 2024-09-18 20:56:44,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=816677.1666666666, ans=0.0 2024-09-18 20:56:52,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=12.0 2024-09-18 20:56:54,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=816677.1666666666, ans=0.125 2024-09-18 20:56:57,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=816677.1666666666, ans=0.025 2024-09-18 20:57:00,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=816705.5, ans=0.0 2024-09-18 20:58:05,800 INFO [train.py:1198] (0/2) Epoch 46, batch 700, loss[loss=0.2505, ctc_loss=0.167, cr_loss=0.4176, over 20662.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3687, over 3962538.03 frames. ], batch size: 68, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 20:58:25,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=816847.1666666666, ans=0.125 2024-09-18 20:58:33,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=816847.1666666666, ans=0.125 2024-09-18 20:58:35,522 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.241e+02 2.365e+02 2.538e+02 3.589e+02, threshold=4.730e+02, percent-clipped=0.0 2024-09-18 20:58:38,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=816875.5, ans=0.125 2024-09-18 20:58:43,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=816875.5, ans=0.125 2024-09-18 20:58:49,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=816903.8333333334, ans=0.0 2024-09-18 20:59:04,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=816932.1666666666, ans=0.125 2024-09-18 20:59:20,789 INFO [train.py:1198] (0/2) Epoch 46, batch 750, loss[loss=0.2184, ctc_loss=0.1415, cr_loss=0.3847, over 20789.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1435, cr_loss=0.3692, over 3984093.53 frames. ], batch size: 53, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 20:59:21,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=22.5 2024-09-18 21:00:25,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817073.8333333334, ans=0.1 2024-09-18 21:00:35,782 INFO [train.py:1198] (0/2) Epoch 46, batch 800, loss[loss=0.2422, ctc_loss=0.1636, cr_loss=0.3934, over 20827.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3687, over 4004257.79 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:01:05,507 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.311e+02 2.454e+02 2.599e+02 3.298e+02, threshold=4.908e+02, percent-clipped=0.0 2024-09-18 21:01:19,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=817187.1666666666, ans=0.125 2024-09-18 21:01:50,357 INFO [train.py:1198] (0/2) Epoch 46, batch 850, loss[loss=0.2663, ctc_loss=0.1867, cr_loss=0.3978, over 13958.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3689, over 4029028.16 frames. ], batch size: 150, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:02:06,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=817272.1666666666, ans=0.125 2024-09-18 21:02:14,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=817272.1666666666, ans=0.0 2024-09-18 21:02:15,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=817272.1666666666, ans=0.125 2024-09-18 21:03:08,561 INFO [train.py:1198] (0/2) Epoch 46, batch 900, loss[loss=0.2314, ctc_loss=0.1534, cr_loss=0.3901, over 20955.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3689, over 4056295.72 frames. ], batch size: 58, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:03:16,915 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=22.5 2024-09-18 21:03:41,444 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.294e+02 2.436e+02 2.536e+02 3.738e+02, threshold=4.872e+02, percent-clipped=0.0 2024-09-18 21:03:53,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=817442.1666666666, ans=0.0 2024-09-18 21:04:07,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817470.5, ans=0.1 2024-09-18 21:04:26,345 INFO [train.py:1198] (0/2) Epoch 46, batch 950, loss[loss=0.1832, ctc_loss=0.1184, cr_loss=0.3243, over 21048.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3682, over 4058548.09 frames. ], batch size: 53, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:04:26,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=817527.1666666666, ans=0.0 2024-09-18 21:04:34,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=817527.1666666666, ans=0.0 2024-09-18 21:04:34,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=817527.1666666666, ans=0.2 2024-09-18 21:04:37,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=817527.1666666666, ans=0.1 2024-09-18 21:04:46,880 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-18 21:05:41,715 INFO [train.py:1198] (0/2) Epoch 46, batch 1000, loss[loss=0.2174, ctc_loss=0.1462, cr_loss=0.356, over 20748.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1424, cr_loss=0.3663, over 4054061.29 frames. ], batch size: 71, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:06:06,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=817697.1666666666, ans=0.0 2024-09-18 21:06:11,914 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.291e+02 2.408e+02 2.588e+02 4.290e+02, threshold=4.816e+02, percent-clipped=0.0 2024-09-18 21:06:16,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=817725.5, ans=0.0 2024-09-18 21:06:24,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=817725.5, ans=0.04949747468305833 2024-09-18 21:06:56,996 INFO [train.py:1198] (0/2) Epoch 46, batch 1050, loss[loss=0.2273, ctc_loss=0.152, cr_loss=0.3764, over 20965.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1427, cr_loss=0.3674, over 4062379.89 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:07:03,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=817810.5, ans=0.1 2024-09-18 21:07:06,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=817810.5, ans=0.0 2024-09-18 21:07:33,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=817867.1666666666, ans=0.05 2024-09-18 21:08:01,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=817923.8333333334, ans=0.125 2024-09-18 21:08:11,871 INFO [train.py:1198] (0/2) Epoch 46, batch 1100, loss[loss=0.1718, ctc_loss=0.1119, cr_loss=0.2996, over 20976.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3689, over 4075483.65 frames. ], batch size: 48, lr: 1.80e-03, grad_scale: 64.0 2024-09-18 21:08:44,931 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.310e+02 2.456e+02 2.641e+02 3.685e+02, threshold=4.912e+02, percent-clipped=0.0 2024-09-18 21:09:00,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=818008.8333333334, ans=0.125 2024-09-18 21:09:13,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=818037.1666666666, ans=0.0 2024-09-18 21:09:27,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=818065.5, ans=0.0 2024-09-18 21:09:31,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=818093.8333333334, ans=0.025 2024-09-18 21:09:33,028 INFO [train.py:1198] (0/2) Epoch 46, batch 1150, loss[loss=0.253, ctc_loss=0.1711, cr_loss=0.4094, over 20107.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1443, cr_loss=0.3705, over 4065112.79 frames. ], batch size: 80, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:09:42,820 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-09-18 21:09:56,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=818122.1666666666, ans=0.0 2024-09-18 21:10:02,663 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=22.5 2024-09-18 21:10:17,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=818178.8333333334, ans=0.125 2024-09-18 21:10:29,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=818178.8333333334, ans=0.125 2024-09-18 21:10:31,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=818178.8333333334, ans=0.0 2024-09-18 21:10:48,793 INFO [train.py:1198] (0/2) Epoch 46, batch 1200, loss[loss=0.203, ctc_loss=0.1335, cr_loss=0.3475, over 21071.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.3694, over 4073681.17 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:10:59,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=818235.5, ans=0.0 2024-09-18 21:11:16,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=818263.8333333334, ans=0.1 2024-09-18 21:11:17,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=818292.1666666666, ans=0.025 2024-09-18 21:11:20,677 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.264e+02 2.384e+02 2.544e+02 3.200e+02, threshold=4.767e+02, percent-clipped=0.0 2024-09-18 21:11:43,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=818320.5, ans=0.0 2024-09-18 21:11:45,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=818320.5, ans=0.04949747468305833 2024-09-18 21:11:47,401 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=15.0 2024-09-18 21:12:04,764 INFO [train.py:1198] (0/2) Epoch 46, batch 1250, loss[loss=0.2035, ctc_loss=0.1323, cr_loss=0.3559, over 20841.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1439, cr_loss=0.3708, over 4088349.51 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:13:19,706 INFO [train.py:1198] (0/2) Epoch 46, batch 1300, loss[loss=0.1748, ctc_loss=0.1131, cr_loss=0.3087, over 20949.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3675, over 4095337.92 frames. ], batch size: 51, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:13:35,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=818547.1666666666, ans=0.125 2024-09-18 21:13:46,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=818547.1666666666, ans=0.0 2024-09-18 21:13:51,737 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.225e+02 2.348e+02 2.571e+02 4.045e+02, threshold=4.697e+02, percent-clipped=0.0 2024-09-18 21:13:58,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=818575.5, ans=0.0 2024-09-18 21:13:59,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=818575.5, ans=0.125 2024-09-18 21:14:25,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=818632.1666666666, ans=0.125 2024-09-18 21:14:35,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=818632.1666666666, ans=0.0 2024-09-18 21:14:37,972 INFO [train.py:1198] (0/2) Epoch 46, batch 1350, loss[loss=0.1976, ctc_loss=0.1269, cr_loss=0.3535, over 20890.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3674, over 4098066.40 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:15:07,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=818688.8333333334, ans=0.1 2024-09-18 21:15:07,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=22.5 2024-09-18 21:15:11,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=818717.1666666666, ans=0.2 2024-09-18 21:15:50,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-09-18 21:15:56,772 INFO [train.py:1198] (0/2) Epoch 46, batch 1400, loss[loss=0.2148, ctc_loss=0.1426, cr_loss=0.3607, over 20881.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3665, over 4105823.25 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:16:12,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=818830.5, ans=0.1 2024-09-18 21:16:12,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=818830.5, ans=0.2 2024-09-18 21:16:30,304 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.209e+02 2.340e+02 2.531e+02 4.612e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-18 21:16:56,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=818915.5, ans=0.125 2024-09-18 21:17:09,070 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-09-18 21:17:13,039 INFO [train.py:1198] (0/2) Epoch 46, batch 1450, loss[loss=0.195, ctc_loss=0.1272, cr_loss=0.3391, over 21058.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1412, cr_loss=0.3657, over 4106408.48 frames. ], batch size: 62, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:17:25,907 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=22.5 2024-09-18 21:17:57,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=819028.8333333334, ans=0.125 2024-09-18 21:18:06,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-09-18 21:18:12,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=819057.1666666666, ans=0.0 2024-09-18 21:18:16,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=819057.1666666666, ans=0.2 2024-09-18 21:18:28,584 INFO [train.py:1198] (0/2) Epoch 46, batch 1500, loss[loss=0.2424, ctc_loss=0.1632, cr_loss=0.3962, over 20962.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.3672, over 4111588.53 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:18:54,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=819113.8333333334, ans=0.0 2024-09-18 21:19:01,506 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.253e+02 2.389e+02 2.516e+02 6.989e+02, threshold=4.778e+02, percent-clipped=1.0 2024-09-18 21:19:04,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=819142.1666666666, ans=0.2 2024-09-18 21:19:11,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=819142.1666666666, ans=0.025 2024-09-18 21:19:24,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=819170.5, ans=0.125 2024-09-18 21:19:42,735 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:19:43,779 INFO [train.py:1198] (0/2) Epoch 46, batch 1550, loss[loss=0.196, ctc_loss=0.1286, cr_loss=0.3369, over 19920.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1425, cr_loss=0.3675, over 4107064.26 frames. ], batch size: 44, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:19:48,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819227.1666666666, ans=0.1 2024-09-18 21:19:55,265 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=22.5 2024-09-18 21:20:05,681 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2024-09-18 21:20:22,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=15.0 2024-09-18 21:20:30,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-18 21:20:50,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2024-09-18 21:21:04,466 INFO [train.py:1198] (0/2) Epoch 46, batch 1600, loss[loss=0.1929, ctc_loss=0.1235, cr_loss=0.3467, over 20963.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1431, cr_loss=0.3699, over 4109413.05 frames. ], batch size: 48, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:21:24,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=819397.1666666666, ans=0.0 2024-09-18 21:21:30,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819397.1666666666, ans=0.1 2024-09-18 21:21:31,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=819397.1666666666, ans=0.0 2024-09-18 21:21:37,830 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.926e+02 2.223e+02 2.381e+02 2.505e+02 3.293e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 21:21:57,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=819453.8333333334, ans=0.125 2024-09-18 21:22:16,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2024-09-18 21:22:20,561 INFO [train.py:1198] (0/2) Epoch 46, batch 1650, loss[loss=0.2318, ctc_loss=0.1536, cr_loss=0.3909, over 20844.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1429, cr_loss=0.3699, over 4120777.16 frames. ], batch size: 65, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:22:28,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=819510.5, ans=0.0 2024-09-18 21:22:55,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819567.1666666666, ans=0.1 2024-09-18 21:23:09,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=819595.5, ans=0.5 2024-09-18 21:23:12,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=819595.5, ans=0.0 2024-09-18 21:23:12,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=819595.5, ans=0.0 2024-09-18 21:23:22,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=819623.8333333334, ans=0.125 2024-09-18 21:23:35,985 INFO [train.py:1198] (0/2) Epoch 46, batch 1700, loss[loss=0.2328, ctc_loss=0.1591, cr_loss=0.3684, over 19495.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1434, cr_loss=0.3702, over 4105744.71 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:24:02,457 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2024-09-18 21:24:10,517 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.242e+02 2.353e+02 2.530e+02 3.299e+02, threshold=4.706e+02, percent-clipped=0.0 2024-09-18 21:24:42,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=819765.5, ans=0.125 2024-09-18 21:24:51,127 INFO [train.py:1198] (0/2) Epoch 46, batch 1750, loss[loss=0.1793, ctc_loss=0.1173, cr_loss=0.3098, over 20941.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.3688, over 4117023.82 frames. ], batch size: 50, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:25:02,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=819793.8333333334, ans=0.2 2024-09-18 21:25:14,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.85 vs. limit=10.0 2024-09-18 21:25:17,706 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=12.0 2024-09-18 21:26:09,375 INFO [train.py:1198] (0/2) Epoch 46, batch 1800, loss[loss=0.2282, ctc_loss=0.15, cr_loss=0.3912, over 20004.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1425, cr_loss=0.3687, over 4111174.25 frames. ], batch size: 80, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:26:47,378 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.211e+02 2.352e+02 2.538e+02 3.477e+02, threshold=4.704e+02, percent-clipped=0.0 2024-09-18 21:26:49,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=819992.1666666666, ans=0.2 2024-09-18 21:27:28,096 INFO [train.py:1198] (0/2) Epoch 46, batch 1850, loss[loss=0.2101, ctc_loss=0.1366, cr_loss=0.3676, over 20986.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3689, over 4103621.02 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:28:02,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=820133.8333333334, ans=0.125 2024-09-18 21:28:13,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2024-09-18 21:28:34,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=820190.5, ans=0.125 2024-09-18 21:28:42,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=820218.8333333334, ans=0.0 2024-09-18 21:28:43,473 INFO [train.py:1198] (0/2) Epoch 46, batch 1900, loss[loss=0.1868, ctc_loss=0.1213, cr_loss=0.3279, over 20987.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3664, over 4110789.76 frames. ], batch size: 51, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:28:48,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820218.8333333334, ans=0.1 2024-09-18 21:29:05,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820247.1666666666, ans=0.1 2024-09-18 21:29:18,309 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.004e+02 2.229e+02 2.377e+02 2.550e+02 3.271e+02, threshold=4.755e+02, percent-clipped=0.0 2024-09-18 21:29:21,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=820275.5, ans=0.0 2024-09-18 21:29:38,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820303.8333333334, ans=0.1 2024-09-18 21:29:57,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=820360.5, ans=0.0 2024-09-18 21:29:58,784 INFO [train.py:1198] (0/2) Epoch 46, batch 1950, loss[loss=0.2284, ctc_loss=0.1518, cr_loss=0.3832, over 18213.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3669, over 4107644.05 frames. ], batch size: 108, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:30:06,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=820360.5, ans=0.5 2024-09-18 21:30:47,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=820445.5, ans=0.0 2024-09-18 21:30:50,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=820445.5, ans=0.0 2024-09-18 21:30:57,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=820473.8333333334, ans=0.0 2024-09-18 21:31:14,213 INFO [train.py:1198] (0/2) Epoch 46, batch 2000, loss[loss=0.2129, ctc_loss=0.1439, cr_loss=0.3449, over 20938.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.3678, over 4110095.83 frames. ], batch size: 60, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:31:17,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=820502.1666666666, ans=0.0 2024-09-18 21:31:49,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=820558.8333333334, ans=0.0 2024-09-18 21:31:51,823 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.259e+02 2.419e+02 2.580e+02 7.892e+02, threshold=4.838e+02, percent-clipped=2.0 2024-09-18 21:31:58,469 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:31:58,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820558.8333333334, ans=0.1 2024-09-18 21:32:22,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=820615.5, ans=0.125 2024-09-18 21:32:35,637 INFO [train.py:1198] (0/2) Epoch 46, batch 2050, loss[loss=0.2645, ctc_loss=0.1821, cr_loss=0.4118, over 13859.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3683, over 4097391.59 frames. ], batch size: 150, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:32:37,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=820643.8333333334, ans=0.0 2024-09-18 21:32:51,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=820672.1666666666, ans=0.125 2024-09-18 21:33:05,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820700.5, ans=0.1 2024-09-18 21:33:51,673 INFO [train.py:1198] (0/2) Epoch 46, batch 2100, loss[loss=0.2129, ctc_loss=0.1415, cr_loss=0.3572, over 20965.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1427, cr_loss=0.3684, over 4084059.24 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:34:26,603 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.021e+02 2.267e+02 2.383e+02 2.565e+02 5.256e+02, threshold=4.767e+02, percent-clipped=1.0 2024-09-18 21:34:38,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=820870.5, ans=0.125 2024-09-18 21:34:45,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-18 21:35:03,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=820898.8333333334, ans=0.025 2024-09-18 21:35:07,454 INFO [train.py:1198] (0/2) Epoch 46, batch 2150, loss[loss=0.2297, ctc_loss=0.1513, cr_loss=0.392, over 20951.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3672, over 4095766.02 frames. ], batch size: 60, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:35:27,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=820955.5, ans=0.125 2024-09-18 21:35:32,161 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2024-09-18 21:35:33,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=820955.5, ans=0.125 2024-09-18 21:35:35,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=820983.8333333334, ans=0.125 2024-09-18 21:35:36,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820983.8333333334, ans=0.1 2024-09-18 21:35:50,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=821012.1666666666, ans=0.125 2024-09-18 21:35:51,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=821012.1666666666, ans=0.125 2024-09-18 21:36:22,176 INFO [train.py:1198] (0/2) Epoch 46, batch 2200, loss[loss=0.2281, ctc_loss=0.1515, cr_loss=0.383, over 21031.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1413, cr_loss=0.3657, over 4103650.46 frames. ], batch size: 63, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:36:28,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821068.8333333334, ans=0.1 2024-09-18 21:36:54,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=821125.5, ans=0.2 2024-09-18 21:36:57,419 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.236e+02 2.409e+02 2.555e+02 3.681e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-18 21:37:18,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=821153.8333333334, ans=0.125 2024-09-18 21:37:22,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-09-18 21:37:38,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=821182.1666666666, ans=0.015 2024-09-18 21:37:41,379 INFO [train.py:1198] (0/2) Epoch 46, batch 2250, loss[loss=0.2403, ctc_loss=0.1595, cr_loss=0.4042, over 20638.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3668, over 4110774.32 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:38:03,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=821238.8333333334, ans=0.125 2024-09-18 21:38:23,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=821267.1666666666, ans=0.0 2024-09-18 21:38:59,037 INFO [train.py:1198] (0/2) Epoch 46, batch 2300, loss[loss=0.2214, ctc_loss=0.1468, cr_loss=0.3734, over 21009.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1418, cr_loss=0.3666, over 4115935.29 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:39:13,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821380.5, ans=0.1 2024-09-18 21:39:33,519 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.231e+02 2.396e+02 2.524e+02 3.614e+02, threshold=4.793e+02, percent-clipped=0.0 2024-09-18 21:39:36,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=821408.8333333334, ans=0.0 2024-09-18 21:39:49,364 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=12.0 2024-09-18 21:39:49,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=22.5 2024-09-18 21:40:14,314 INFO [train.py:1198] (0/2) Epoch 46, batch 2350, loss[loss=0.176, ctc_loss=0.1161, cr_loss=0.2995, over 20988.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1417, cr_loss=0.3662, over 4107721.86 frames. ], batch size: 48, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:40:41,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=821522.1666666666, ans=0.125 2024-09-18 21:40:52,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-18 21:41:13,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=821607.1666666666, ans=0.0 2024-09-18 21:41:25,041 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=22.5 2024-09-18 21:41:30,311 INFO [train.py:1198] (0/2) Epoch 46, batch 2400, loss[loss=0.225, ctc_loss=0.1463, cr_loss=0.3934, over 21057.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1413, cr_loss=0.3655, over 4110975.14 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:42:04,803 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.233e+02 2.365e+02 2.515e+02 3.615e+02, threshold=4.730e+02, percent-clipped=0.0 2024-09-18 21:42:15,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=821720.5, ans=0.05 2024-09-18 21:42:45,497 INFO [train.py:1198] (0/2) Epoch 46, batch 2450, loss[loss=0.2228, ctc_loss=0.1455, cr_loss=0.3863, over 20936.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1418, cr_loss=0.3669, over 4119206.56 frames. ], batch size: 60, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:42:54,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=821777.1666666666, ans=0.2 2024-09-18 21:43:02,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=821805.5, ans=0.0 2024-09-18 21:43:08,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=821805.5, ans=10.0 2024-09-18 21:43:18,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=821833.8333333334, ans=0.2 2024-09-18 21:43:31,230 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-18 21:44:06,031 INFO [train.py:1198] (0/2) Epoch 46, batch 2500, loss[loss=0.2217, ctc_loss=0.1458, cr_loss=0.3792, over 21010.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.142, cr_loss=0.3668, over 4118386.34 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:44:16,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821918.8333333334, ans=0.1 2024-09-18 21:44:19,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=821947.1666666666, ans=0.125 2024-09-18 21:44:36,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=821975.5, ans=0.025 2024-09-18 21:44:42,286 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.281e+02 2.378e+02 2.521e+02 3.111e+02, threshold=4.757e+02, percent-clipped=0.0 2024-09-18 21:45:00,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822003.8333333334, ans=0.1 2024-09-18 21:45:11,659 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=22.5 2024-09-18 21:45:21,234 INFO [train.py:1198] (0/2) Epoch 46, batch 2550, loss[loss=0.2033, ctc_loss=0.1319, cr_loss=0.3573, over 21070.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1416, cr_loss=0.3667, over 4121770.04 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:45:25,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.75 vs. limit=5.0 2024-09-18 21:45:33,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=822060.5, ans=0.125 2024-09-18 21:45:51,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=822117.1666666666, ans=0.0 2024-09-18 21:46:07,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=822145.5, ans=0.07 2024-09-18 21:46:10,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.03 vs. limit=22.5 2024-09-18 21:46:35,787 INFO [train.py:1198] (0/2) Epoch 46, batch 2600, loss[loss=0.2349, ctc_loss=0.1573, cr_loss=0.3883, over 20704.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3675, over 4118321.73 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:46:49,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=822230.5, ans=0.125 2024-09-18 21:46:51,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=822230.5, ans=0.125 2024-09-18 21:47:12,272 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.268e+02 2.382e+02 2.565e+02 3.672e+02, threshold=4.764e+02, percent-clipped=0.0 2024-09-18 21:47:17,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=822258.8333333334, ans=0.0 2024-09-18 21:47:41,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=822315.5, ans=0.125 2024-09-18 21:47:51,368 INFO [train.py:1198] (0/2) Epoch 46, batch 2650, loss[loss=0.2223, ctc_loss=0.1449, cr_loss=0.3873, over 21007.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3682, over 4101429.25 frames. ], batch size: 63, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:47:51,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=822343.8333333334, ans=0.025 2024-09-18 21:48:02,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=822343.8333333334, ans=0.125 2024-09-18 21:48:10,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=822372.1666666666, ans=0.0 2024-09-18 21:48:15,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=822372.1666666666, ans=0.125 2024-09-18 21:48:33,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=822400.5, ans=0.07 2024-09-18 21:48:40,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2024-09-18 21:49:09,692 INFO [train.py:1198] (0/2) Epoch 46, batch 2700, loss[loss=0.2264, ctc_loss=0.1523, cr_loss=0.3707, over 19689.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3687, over 4106362.09 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:49:48,747 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.250e+02 2.391e+02 2.532e+02 3.748e+02, threshold=4.783e+02, percent-clipped=0.0 2024-09-18 21:50:01,646 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=22.5 2024-09-18 21:50:03,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2024-09-18 21:50:28,731 INFO [train.py:1198] (0/2) Epoch 46, batch 2750, loss[loss=0.2214, ctc_loss=0.1497, cr_loss=0.3586, over 20876.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3673, over 4115794.01 frames. ], batch size: 65, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 21:50:33,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=822627.1666666666, ans=0.125 2024-09-18 21:50:59,222 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:51:02,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=822683.8333333334, ans=0.0 2024-09-18 21:51:11,935 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=15.0 2024-09-18 21:51:44,399 INFO [train.py:1198] (0/2) Epoch 46, batch 2800, loss[loss=0.1836, ctc_loss=0.1193, cr_loss=0.3215, over 20955.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.3687, over 4093571.69 frames. ], batch size: 51, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:52:20,963 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.317e+02 2.449e+02 2.604e+02 5.763e+02, threshold=4.898e+02, percent-clipped=1.0 2024-09-18 21:52:35,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822853.8333333334, ans=0.1 2024-09-18 21:52:51,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822882.1666666666, ans=0.1 2024-09-18 21:52:56,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=822882.1666666666, ans=0.0 2024-09-18 21:53:00,358 INFO [train.py:1198] (0/2) Epoch 46, batch 2850, loss[loss=0.2345, ctc_loss=0.1587, cr_loss=0.3791, over 17940.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1428, cr_loss=0.3676, over 4108854.36 frames. ], batch size: 108, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:53:03,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822910.5, ans=0.1 2024-09-18 21:53:09,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-09-18 21:53:21,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=822938.8333333334, ans=0.035 2024-09-18 21:53:44,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-09-18 21:54:15,768 INFO [train.py:1198] (0/2) Epoch 46, batch 2900, loss[loss=0.2312, ctc_loss=0.1524, cr_loss=0.3939, over 19582.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3671, over 4109180.43 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:54:20,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=823052.1666666666, ans=0.125 2024-09-18 21:54:20,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=823052.1666666666, ans=0.125 2024-09-18 21:54:44,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=823080.5, ans=0.2 2024-09-18 21:54:51,158 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=22.5 2024-09-18 21:54:55,044 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.266e+02 2.392e+02 2.544e+02 4.341e+02, threshold=4.785e+02, percent-clipped=0.0 2024-09-18 21:55:15,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=823137.1666666666, ans=0.0 2024-09-18 21:55:37,308 INFO [train.py:1198] (0/2) Epoch 46, batch 2950, loss[loss=0.1904, ctc_loss=0.1214, cr_loss=0.3454, over 20991.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1413, cr_loss=0.3653, over 4113054.75 frames. ], batch size: 51, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:56:34,215 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=12.0 2024-09-18 21:56:52,756 INFO [train.py:1198] (0/2) Epoch 46, batch 3000, loss[loss=0.2442, ctc_loss=0.1642, cr_loss=0.3996, over 20986.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1417, cr_loss=0.3652, over 4103425.64 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:56:52,757 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 21:57:02,610 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0013, 3.9583, 2.8946, 3.2971], device='cuda:0') 2024-09-18 21:57:11,205 INFO [train.py:1230] (0/2) Epoch 46, validation: loss=0.03911, ctc_loss=0.03911, cr_loss=1.552e-14, over 944034.00 frames. 2024-09-18 21:57:11,205 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 21:57:20,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=823335.5, ans=0.025 2024-09-18 21:57:47,588 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.264e+02 2.377e+02 2.507e+02 9.092e+02, threshold=4.753e+02, percent-clipped=1.0 2024-09-18 21:58:26,766 INFO [train.py:1198] (0/2) Epoch 46, batch 3050, loss[loss=0.2502, ctc_loss=0.1663, cr_loss=0.4194, over 20670.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1412, cr_loss=0.3652, over 4119458.62 frames. ], batch size: 66, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:58:26,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=823477.1666666666, ans=0.035 2024-09-18 21:59:18,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=823562.1666666666, ans=0.125 2024-09-18 21:59:21,942 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=15.0 2024-09-18 21:59:24,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=823562.1666666666, ans=0.125 2024-09-18 21:59:39,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=823590.5, ans=0.07 2024-09-18 21:59:42,522 INFO [train.py:1198] (0/2) Epoch 46, batch 3100, loss[loss=0.2229, ctc_loss=0.1498, cr_loss=0.3653, over 20833.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1408, cr_loss=0.3651, over 4123381.56 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 21:59:59,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=823647.1666666666, ans=0.125 2024-09-18 22:00:13,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=823647.1666666666, ans=0.125 2024-09-18 22:00:21,626 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.274e+02 2.370e+02 2.562e+02 3.431e+02, threshold=4.740e+02, percent-clipped=0.0 2024-09-18 22:00:32,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=823703.8333333334, ans=0.0 2024-09-18 22:00:58,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=823732.1666666666, ans=0.125 2024-09-18 22:01:04,356 INFO [train.py:1198] (0/2) Epoch 46, batch 3150, loss[loss=0.2262, ctc_loss=0.1502, cr_loss=0.3798, over 19977.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.141, cr_loss=0.3652, over 4116915.64 frames. ], batch size: 80, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:01:11,259 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-09-18 22:01:26,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=823788.8333333334, ans=0.07 2024-09-18 22:01:31,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=823788.8333333334, ans=0.125 2024-09-18 22:01:55,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=823845.5, ans=0.025 2024-09-18 22:02:12,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=823873.8333333334, ans=0.125 2024-09-18 22:02:20,001 INFO [train.py:1198] (0/2) Epoch 46, batch 3200, loss[loss=0.2297, ctc_loss=0.1516, cr_loss=0.3905, over 20923.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1412, cr_loss=0.3653, over 4105617.26 frames. ], batch size: 60, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:02:57,879 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.295e+02 2.427e+02 2.609e+02 3.768e+02, threshold=4.855e+02, percent-clipped=0.0 2024-09-18 22:03:03,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=22.5 2024-09-18 22:03:23,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824015.5, ans=0.1 2024-09-18 22:03:23,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=824015.5, ans=0.0 2024-09-18 22:03:35,300 INFO [train.py:1198] (0/2) Epoch 46, batch 3250, loss[loss=0.2428, ctc_loss=0.1604, cr_loss=0.4121, over 19969.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.3668, over 4096544.51 frames. ], batch size: 80, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:04:01,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=824072.1666666666, ans=0.125 2024-09-18 22:04:08,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2024-09-18 22:04:11,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=824100.5, ans=0.0 2024-09-18 22:04:24,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=824128.8333333334, ans=0.1 2024-09-18 22:04:26,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=824128.8333333334, ans=0.125 2024-09-18 22:04:40,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=824157.1666666666, ans=0.125 2024-09-18 22:04:51,045 INFO [train.py:1198] (0/2) Epoch 46, batch 3300, loss[loss=0.1894, ctc_loss=0.1217, cr_loss=0.3389, over 20953.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1418, cr_loss=0.3661, over 4098789.03 frames. ], batch size: 49, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:05:15,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-18 22:05:28,697 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.245e+02 2.375e+02 2.573e+02 3.336e+02, threshold=4.750e+02, percent-clipped=0.0 2024-09-18 22:05:38,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=824270.5, ans=0.1 2024-09-18 22:05:43,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=824270.5, ans=0.0 2024-09-18 22:05:45,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=824270.5, ans=0.0 2024-09-18 22:05:48,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=824270.5, ans=0.125 2024-09-18 22:06:08,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=824327.1666666666, ans=0.1 2024-09-18 22:06:08,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=824327.1666666666, ans=0.125 2024-09-18 22:06:09,678 INFO [train.py:1198] (0/2) Epoch 46, batch 3350, loss[loss=0.2106, ctc_loss=0.1366, cr_loss=0.3701, over 21041.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1418, cr_loss=0.3662, over 4096277.01 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:06:13,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=824327.1666666666, ans=0.0 2024-09-18 22:06:16,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=824327.1666666666, ans=0.125 2024-09-18 22:06:45,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-18 22:07:05,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=824412.1666666666, ans=0.5 2024-09-18 22:07:08,925 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.16 vs. limit=15.0 2024-09-18 22:07:11,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=824440.5, ans=0.0 2024-09-18 22:07:18,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=824440.5, ans=0.125 2024-09-18 22:07:27,775 INFO [train.py:1198] (0/2) Epoch 46, batch 3400, loss[loss=0.2065, ctc_loss=0.1354, cr_loss=0.3553, over 20786.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3665, over 4101014.86 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:07:48,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824497.1666666666, ans=0.1 2024-09-18 22:07:57,583 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.62 vs. limit=15.0 2024-09-18 22:08:05,996 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.287e+02 2.442e+02 2.582e+02 5.299e+02, threshold=4.883e+02, percent-clipped=1.0 2024-09-18 22:08:43,611 INFO [train.py:1198] (0/2) Epoch 46, batch 3450, loss[loss=0.2093, ctc_loss=0.1374, cr_loss=0.3597, over 19504.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.3687, over 4075062.50 frames. ], batch size: 43, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:08:56,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.91 vs. limit=15.0 2024-09-18 22:09:09,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=824638.8333333334, ans=0.125 2024-09-18 22:09:28,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-09-18 22:09:31,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.73 vs. limit=15.0 2024-09-18 22:09:58,565 INFO [train.py:1198] (0/2) Epoch 46, batch 3500, loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3667, over 20971.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3679, over 4057329.33 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 22:10:12,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=824780.5, ans=0.0 2024-09-18 22:10:37,677 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.261e+02 2.364e+02 2.578e+02 3.983e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-18 22:10:50,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=824837.1666666666, ans=10.0 2024-09-18 22:10:56,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=824837.1666666666, ans=0.07 2024-09-18 22:11:03,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-18 22:11:16,403 INFO [train.py:1198] (0/2) Epoch 46, batch 3550, loss[loss=0.2606, ctc_loss=0.1839, cr_loss=0.3835, over 14232.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3686, over 4065454.38 frames. ], batch size: 149, lr: 1.79e-03, grad_scale: 16.0 2024-09-18 22:11:24,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=824893.8333333334, ans=0.2 2024-09-18 22:12:00,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=824978.8333333334, ans=0.125 2024-09-18 22:12:28,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=825007.1666666666, ans=0.125 2024-09-18 22:12:29,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=825007.1666666666, ans=0.125 2024-09-18 22:12:30,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=825007.1666666666, ans=0.09899494936611666 2024-09-18 22:12:34,638 INFO [train.py:1198] (0/2) Epoch 46, batch 3600, loss[loss=0.2092, ctc_loss=0.1385, cr_loss=0.3539, over 20862.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.368, over 4085514.19 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:12:41,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=825035.5, ans=0.2 2024-09-18 22:13:13,996 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.231e+02 2.343e+02 2.514e+02 3.903e+02, threshold=4.687e+02, percent-clipped=0.0 2024-09-18 22:13:19,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825120.5, ans=0.1 2024-09-18 22:13:19,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=825120.5, ans=0.125 2024-09-18 22:13:50,480 INFO [train.py:1198] (0/2) Epoch 46, batch 3650, loss[loss=0.2377, ctc_loss=0.1571, cr_loss=0.4032, over 20366.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3687, over 4073871.73 frames. ], batch size: 74, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:14:16,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2024-09-18 22:14:26,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=825233.8333333334, ans=0.125 2024-09-18 22:14:34,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=825262.1666666666, ans=0.0 2024-09-18 22:14:46,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=825262.1666666666, ans=0.125 2024-09-18 22:14:49,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=825290.5, ans=0.0 2024-09-18 22:15:06,140 INFO [train.py:1198] (0/2) Epoch 46, batch 3700, loss[loss=0.217, ctc_loss=0.1428, cr_loss=0.3712, over 20742.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3686, over 4078270.85 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:15:09,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825318.8333333334, ans=0.1 2024-09-18 22:15:45,162 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.041e+02 2.311e+02 2.412e+02 2.580e+02 4.323e+02, threshold=4.824e+02, percent-clipped=0.0 2024-09-18 22:16:02,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=825403.8333333334, ans=0.0 2024-09-18 22:16:06,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=825432.1666666666, ans=0.125 2024-09-18 22:16:21,044 INFO [train.py:1198] (0/2) Epoch 46, batch 3750, loss[loss=0.2508, ctc_loss=0.1662, cr_loss=0.4231, over 20776.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.368, over 4088349.08 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:16:39,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=825488.8333333334, ans=0.125 2024-09-18 22:16:51,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=825517.1666666666, ans=0.125 2024-09-18 22:16:59,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-18 22:17:24,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=825573.8333333334, ans=0.125 2024-09-18 22:17:27,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=825573.8333333334, ans=0.2 2024-09-18 22:17:39,556 INFO [train.py:1198] (0/2) Epoch 46, batch 3800, loss[loss=0.1743, ctc_loss=0.1115, cr_loss=0.3139, over 19912.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3674, over 4097708.38 frames. ], batch size: 44, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:17:51,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=825602.1666666666, ans=10.0 2024-09-18 22:17:56,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=825630.5, ans=0.04949747468305833 2024-09-18 22:18:21,482 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.247e+02 2.356e+02 2.504e+02 3.127e+02, threshold=4.713e+02, percent-clipped=0.0 2024-09-18 22:18:28,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=825687.1666666666, ans=0.125 2024-09-18 22:18:45,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=825715.5, ans=0.125 2024-09-18 22:18:52,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=825715.5, ans=0.0 2024-09-18 22:18:57,846 INFO [train.py:1198] (0/2) Epoch 46, batch 3850, loss[loss=0.2236, ctc_loss=0.1453, cr_loss=0.3915, over 20880.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.142, cr_loss=0.3681, over 4097103.27 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:19:07,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=825743.8333333334, ans=0.125 2024-09-18 22:19:50,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=825828.8333333334, ans=0.2 2024-09-18 22:20:13,266 INFO [train.py:1198] (0/2) Epoch 46, batch 3900, loss[loss=0.2288, ctc_loss=0.1501, cr_loss=0.3936, over 20708.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1425, cr_loss=0.3688, over 4104865.43 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:20:17,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=15.0 2024-09-18 22:20:45,787 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-09-18 22:20:52,645 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.227e+02 2.394e+02 2.545e+02 3.122e+02, threshold=4.789e+02, percent-clipped=0.0 2024-09-18 22:21:03,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=825970.5, ans=0.125 2024-09-18 22:21:29,283 INFO [train.py:1198] (0/2) Epoch 46, batch 3950, loss[loss=0.2005, ctc_loss=0.1308, cr_loss=0.3487, over 20986.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1424, cr_loss=0.3686, over 4104647.23 frames. ], batch size: 50, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:21:43,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=826055.5, ans=0.125 2024-09-18 22:21:46,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=826055.5, ans=0.025 2024-09-18 22:22:07,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826083.8333333334, ans=0.1 2024-09-18 22:22:08,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=826083.8333333334, ans=0.0 2024-09-18 22:22:08,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=826083.8333333334, ans=0.0 2024-09-18 22:22:20,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=826112.1666666666, ans=0.125 2024-09-18 22:22:37,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=826140.5, ans=0.025 2024-09-18 22:22:44,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.52 vs. limit=15.0 2024-09-18 22:22:44,853 INFO [train.py:1198] (0/2) Epoch 46, batch 4000, loss[loss=0.2219, ctc_loss=0.1498, cr_loss=0.3603, over 20951.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.143, cr_loss=0.37, over 4104260.47 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:23:27,128 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.305e+02 2.402e+02 2.540e+02 6.910e+02, threshold=4.804e+02, percent-clipped=1.0 2024-09-18 22:24:05,876 INFO [train.py:1198] (0/2) Epoch 46, batch 4050, loss[loss=0.1881, ctc_loss=0.1234, cr_loss=0.3235, over 20968.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.143, cr_loss=0.37, over 4106869.11 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:24:12,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.96 vs. limit=10.0 2024-09-18 22:25:14,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=22.5 2024-09-18 22:25:20,884 INFO [train.py:1198] (0/2) Epoch 46, batch 4100, loss[loss=0.2186, ctc_loss=0.144, cr_loss=0.3733, over 20784.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1431, cr_loss=0.3696, over 4095263.99 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:26:00,240 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.893e+02 2.271e+02 2.414e+02 2.615e+02 4.860e+02, threshold=4.827e+02, percent-clipped=1.0 2024-09-18 22:26:11,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=826537.1666666666, ans=0.125 2024-09-18 22:26:35,858 INFO [train.py:1198] (0/2) Epoch 46, batch 4150, loss[loss=0.2316, ctc_loss=0.1529, cr_loss=0.3936, over 19497.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3691, over 4073709.95 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:26:42,582 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-09-18 22:26:51,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=826622.1666666666, ans=0.125 2024-09-18 22:27:22,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=826678.8333333334, ans=0.2 2024-09-18 22:27:52,127 INFO [train.py:1198] (0/2) Epoch 46, batch 4200, loss[loss=0.1986, ctc_loss=0.1303, cr_loss=0.3415, over 20829.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3665, over 4094080.90 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:28:13,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=826763.8333333334, ans=0.0 2024-09-18 22:28:21,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2024-09-18 22:28:34,625 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.304e+02 2.426e+02 2.594e+02 3.750e+02, threshold=4.852e+02, percent-clipped=0.0 2024-09-18 22:29:08,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-18 22:29:11,260 INFO [train.py:1198] (0/2) Epoch 46, batch 4250, loss[loss=0.221, ctc_loss=0.1472, cr_loss=0.3693, over 21033.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1404, cr_loss=0.3649, over 4091528.75 frames. ], batch size: 62, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:29:34,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=826905.5, ans=0.125 2024-09-18 22:29:36,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-09-18 22:29:41,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=826905.5, ans=0.0 2024-09-18 22:30:11,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=826962.1666666666, ans=0.025 2024-09-18 22:30:26,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=826990.5, ans=0.125 2024-09-18 22:30:29,570 INFO [train.py:1198] (0/2) Epoch 46, batch 4300, loss[loss=0.2163, ctc_loss=0.1435, cr_loss=0.3638, over 21055.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1406, cr_loss=0.3648, over 4078728.96 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:30:29,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=827018.8333333334, ans=0.125 2024-09-18 22:31:08,794 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.293e+02 2.404e+02 2.618e+02 4.801e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-18 22:31:21,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=827103.8333333334, ans=0.125 2024-09-18 22:31:29,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=827132.1666666666, ans=0.125 2024-09-18 22:31:38,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=827132.1666666666, ans=0.125 2024-09-18 22:31:45,349 INFO [train.py:1198] (0/2) Epoch 46, batch 4350, loss[loss=0.2421, ctc_loss=0.1619, cr_loss=0.4012, over 20843.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1415, cr_loss=0.366, over 4070072.59 frames. ], batch size: 65, lr: 1.79e-03, grad_scale: 32.0 2024-09-18 22:32:20,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=827217.1666666666, ans=0.2 2024-09-18 22:32:24,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=827217.1666666666, ans=0.125 2024-09-18 22:32:35,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=827245.5, ans=0.0 2024-09-18 22:32:39,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=827245.5, ans=0.2 2024-09-18 22:33:00,653 INFO [train.py:1198] (0/2) Epoch 46, batch 4400, loss[loss=0.2384, ctc_loss=0.1572, cr_loss=0.4061, over 20871.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3676, over 4060793.00 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:33:13,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-09-18 22:33:16,173 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-292000.pt 2024-09-18 22:33:35,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=827358.8333333334, ans=0.125 2024-09-18 22:33:38,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=827358.8333333334, ans=0.125 2024-09-18 22:33:40,934 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.279e+02 2.378e+02 2.503e+02 3.348e+02, threshold=4.756e+02, percent-clipped=0.0 2024-09-18 22:33:42,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=827358.8333333334, ans=0.0 2024-09-18 22:34:20,252 INFO [train.py:1198] (0/2) Epoch 46, batch 4450, loss[loss=0.1942, ctc_loss=0.1255, cr_loss=0.3435, over 20944.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1427, cr_loss=0.3684, over 4065469.27 frames. ], batch size: 50, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:34:26,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=827443.8333333334, ans=0.04949747468305833 2024-09-18 22:34:28,472 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-18 22:34:43,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827472.1666666666, ans=0.1 2024-09-18 22:34:56,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=827500.5, ans=0.125 2024-09-18 22:35:05,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=827500.5, ans=0.125 2024-09-18 22:35:10,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=827528.8333333334, ans=0.0 2024-09-18 22:35:38,954 INFO [train.py:1198] (0/2) Epoch 46, batch 4500, loss[loss=0.1966, ctc_loss=0.1273, cr_loss=0.3466, over 20953.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3679, over 4079936.71 frames. ], batch size: 51, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:35:42,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827585.5, ans=0.1 2024-09-18 22:36:09,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=827642.1666666666, ans=0.2 2024-09-18 22:36:19,270 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.229e+02 2.370e+02 2.521e+02 3.571e+02, threshold=4.739e+02, percent-clipped=0.0 2024-09-18 22:36:25,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=827670.5, ans=0.0 2024-09-18 22:36:36,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=827670.5, ans=0.125 2024-09-18 22:36:39,293 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:36:54,137 INFO [train.py:1198] (0/2) Epoch 46, batch 4550, loss[loss=0.2357, ctc_loss=0.1583, cr_loss=0.3872, over 20151.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1419, cr_loss=0.3663, over 4074775.41 frames. ], batch size: 80, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:37:08,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-09-18 22:37:14,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=827755.5, ans=0.125 2024-09-18 22:37:44,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=827812.1666666666, ans=0.125 2024-09-18 22:37:47,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=827812.1666666666, ans=0.0 2024-09-18 22:38:03,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827840.5, ans=0.1 2024-09-18 22:38:08,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=827868.8333333334, ans=0.125 2024-09-18 22:38:09,640 INFO [train.py:1198] (0/2) Epoch 46, batch 4600, loss[loss=0.2349, ctc_loss=0.1554, cr_loss=0.3978, over 21028.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1432, cr_loss=0.3681, over 4069833.18 frames. ], batch size: 63, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:38:13,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=827868.8333333334, ans=0.2 2024-09-18 22:38:41,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=15.0 2024-09-18 22:38:45,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=827925.5, ans=0.025 2024-09-18 22:38:50,898 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.260e+02 2.443e+02 2.616e+02 3.291e+02, threshold=4.886e+02, percent-clipped=0.0 2024-09-18 22:38:58,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=827953.8333333334, ans=0.0 2024-09-18 22:39:25,268 INFO [train.py:1198] (0/2) Epoch 46, batch 4650, loss[loss=0.2367, ctc_loss=0.1571, cr_loss=0.3979, over 20948.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1434, cr_loss=0.369, over 4082277.28 frames. ], batch size: 64, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:39:40,470 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:40:00,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828067.1666666666, ans=0.1 2024-09-18 22:40:41,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-18 22:40:45,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828152.1666666666, ans=0.1 2024-09-18 22:40:46,636 INFO [train.py:1198] (0/2) Epoch 46, batch 4700, loss[loss=0.2044, ctc_loss=0.1346, cr_loss=0.3488, over 21067.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1432, cr_loss=0.3678, over 4064384.21 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:40:46,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828152.1666666666, ans=0.1 2024-09-18 22:41:27,328 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.277e+02 2.392e+02 2.504e+02 5.616e+02, threshold=4.785e+02, percent-clipped=1.0 2024-09-18 22:41:43,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2024-09-18 22:41:59,956 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:42:01,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=828293.8333333334, ans=0.0 2024-09-18 22:42:02,560 INFO [train.py:1198] (0/2) Epoch 46, batch 4750, loss[loss=0.2144, ctc_loss=0.1399, cr_loss=0.3724, over 20988.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.143, cr_loss=0.3679, over 4073293.21 frames. ], batch size: 49, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:42:09,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=828293.8333333334, ans=0.0 2024-09-18 22:42:19,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=828322.1666666666, ans=0.125 2024-09-18 22:42:42,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=828350.5, ans=0.125 2024-09-18 22:42:56,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=828378.8333333334, ans=0.0 2024-09-18 22:43:01,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=828407.1666666666, ans=0.07 2024-09-18 22:43:04,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=828407.1666666666, ans=0.125 2024-09-18 22:43:13,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=828407.1666666666, ans=0.0 2024-09-18 22:43:18,007 INFO [train.py:1198] (0/2) Epoch 46, batch 4800, loss[loss=0.2151, ctc_loss=0.1406, cr_loss=0.3723, over 21012.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1436, cr_loss=0.3686, over 4075635.39 frames. ], batch size: 61, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:43:29,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=15.0 2024-09-18 22:43:59,249 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.285e+02 2.408e+02 2.560e+02 3.296e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-18 22:43:59,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=828492.1666666666, ans=0.125 2024-09-18 22:44:04,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=828520.5, ans=10.0 2024-09-18 22:44:06,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=828520.5, ans=0.2 2024-09-18 22:44:09,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-09-18 22:44:22,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=828548.8333333334, ans=0.125 2024-09-18 22:44:34,235 INFO [train.py:1198] (0/2) Epoch 46, batch 4850, loss[loss=0.2089, ctc_loss=0.137, cr_loss=0.3599, over 20791.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.369, over 4069892.10 frames. ], batch size: 53, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:44:48,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=828605.5, ans=0.125 2024-09-18 22:44:52,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=828605.5, ans=0.0 2024-09-18 22:45:53,740 INFO [train.py:1198] (0/2) Epoch 46, batch 4900, loss[loss=0.2014, ctc_loss=0.1297, cr_loss=0.3583, over 20993.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.368, over 4080960.82 frames. ], batch size: 52, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:46:04,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=828718.8333333334, ans=0.0 2024-09-18 22:46:22,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2024-09-18 22:46:29,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=828775.5, ans=0.025 2024-09-18 22:46:34,773 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.208e+02 2.346e+02 2.474e+02 3.972e+02, threshold=4.692e+02, percent-clipped=0.0 2024-09-18 22:46:35,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=828775.5, ans=0.07 2024-09-18 22:47:12,525 INFO [train.py:1198] (0/2) Epoch 46, batch 4950, loss[loss=0.1842, ctc_loss=0.1184, cr_loss=0.3292, over 20977.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3671, over 4078961.81 frames. ], batch size: 49, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:47:21,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=828860.5, ans=0.125 2024-09-18 22:47:24,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=828860.5, ans=0.95 2024-09-18 22:47:48,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=828917.1666666666, ans=0.0 2024-09-18 22:48:06,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=828945.5, ans=0.0 2024-09-18 22:48:11,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=828973.8333333334, ans=0.0 2024-09-18 22:48:18,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=828973.8333333334, ans=0.125 2024-09-18 22:48:27,350 INFO [train.py:1198] (0/2) Epoch 46, batch 5000, loss[loss=0.2125, ctc_loss=0.1388, cr_loss=0.3683, over 21071.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3675, over 4085255.31 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:48:57,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=829058.8333333334, ans=0.07 2024-09-18 22:49:07,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.269e+02 2.385e+02 2.525e+02 7.112e+02, threshold=4.771e+02, percent-clipped=1.0 2024-09-18 22:49:22,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=829087.1666666666, ans=0.2 2024-09-18 22:49:25,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=829115.5, ans=0.0 2024-09-18 22:49:41,294 INFO [train.py:1198] (0/2) Epoch 46, batch 5050, loss[loss=0.2049, ctc_loss=0.1339, cr_loss=0.355, over 20868.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1416, cr_loss=0.3662, over 4079551.32 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:49:41,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=829143.8333333334, ans=10.0 2024-09-18 22:49:56,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829172.1666666666, ans=0.1 2024-09-18 22:50:04,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-09-18 22:50:13,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=829200.5, ans=15.0 2024-09-18 22:50:18,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829200.5, ans=0.1 2024-09-18 22:50:31,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=829228.8333333334, ans=0.125 2024-09-18 22:50:37,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=829228.8333333334, ans=0.125 2024-09-18 22:50:55,083 INFO [train.py:1198] (0/2) Epoch 46, batch 5100, loss[loss=0.1937, ctc_loss=0.1258, cr_loss=0.3398, over 20970.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1413, cr_loss=0.3652, over 4082875.06 frames. ], batch size: 52, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:51:22,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=829313.8333333334, ans=0.125 2024-09-18 22:51:35,594 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.263e+02 2.420e+02 2.545e+02 3.069e+02, threshold=4.840e+02, percent-clipped=0.0 2024-09-18 22:52:10,000 INFO [train.py:1198] (0/2) Epoch 46, batch 5150, loss[loss=0.2294, ctc_loss=0.1507, cr_loss=0.3933, over 20964.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1421, cr_loss=0.3669, over 4089409.76 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:52:13,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=829427.1666666666, ans=0.2 2024-09-18 22:52:44,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=829483.8333333334, ans=0.125 2024-09-18 22:53:25,163 INFO [train.py:1198] (0/2) Epoch 46, batch 5200, loss[loss=0.2142, ctc_loss=0.1398, cr_loss=0.3723, over 20976.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3683, over 4077581.74 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:53:45,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-18 22:53:54,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.95 vs. limit=22.5 2024-09-18 22:54:05,799 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.230e+02 2.346e+02 2.494e+02 3.637e+02, threshold=4.691e+02, percent-clipped=0.0 2024-09-18 22:54:12,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=829653.8333333334, ans=0.125 2024-09-18 22:54:25,790 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=22.5 2024-09-18 22:54:36,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=829682.1666666666, ans=0.0 2024-09-18 22:54:40,167 INFO [train.py:1198] (0/2) Epoch 46, batch 5250, loss[loss=0.2473, ctc_loss=0.1636, cr_loss=0.4186, over 20670.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1422, cr_loss=0.3668, over 4079377.11 frames. ], batch size: 71, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:54:41,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=829710.5, ans=0.125 2024-09-18 22:54:58,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829738.8333333334, ans=0.1 2024-09-18 22:55:25,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=829795.5, ans=0.09899494936611666 2024-09-18 22:55:39,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=829795.5, ans=0.125 2024-09-18 22:55:45,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=829823.8333333334, ans=0.125 2024-09-18 22:55:56,811 INFO [train.py:1198] (0/2) Epoch 46, batch 5300, loss[loss=0.2125, ctc_loss=0.1371, cr_loss=0.377, over 20995.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1434, cr_loss=0.3681, over 4051900.13 frames. ], batch size: 61, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 22:56:16,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=829880.5, ans=0.0 2024-09-18 22:56:41,474 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.277e+02 2.415e+02 2.568e+02 4.148e+02, threshold=4.830e+02, percent-clipped=0.0 2024-09-18 22:56:44,125 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=22.5 2024-09-18 22:56:59,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2024-09-18 22:57:13,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=829993.8333333334, ans=0.2 2024-09-18 22:57:14,036 INFO [train.py:1198] (0/2) Epoch 46, batch 5350, loss[loss=0.2194, ctc_loss=0.1438, cr_loss=0.378, over 20867.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1433, cr_loss=0.3686, over 4067040.20 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:57:14,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=12.0 2024-09-18 22:57:17,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=829993.8333333334, ans=0.125 2024-09-18 22:57:27,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=830022.1666666666, ans=0.05 2024-09-18 22:57:30,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=830022.1666666666, ans=0.0 2024-09-18 22:57:36,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-09-18 22:57:58,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2024-09-18 22:58:10,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=830078.8333333334, ans=0.125 2024-09-18 22:58:27,968 INFO [train.py:1198] (0/2) Epoch 46, batch 5400, loss[loss=0.1832, ctc_loss=0.1183, cr_loss=0.3244, over 19988.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1432, cr_loss=0.368, over 4082510.80 frames. ], batch size: 44, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:58:43,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2024-09-18 22:59:09,881 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.265e+02 2.377e+02 2.529e+02 3.871e+02, threshold=4.754e+02, percent-clipped=0.0 2024-09-18 22:59:20,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=830220.5, ans=0.0 2024-09-18 22:59:40,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=830248.8333333334, ans=10.0 2024-09-18 22:59:42,814 INFO [train.py:1198] (0/2) Epoch 46, batch 5450, loss[loss=0.2095, ctc_loss=0.139, cr_loss=0.3523, over 20865.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.3692, over 4088060.14 frames. ], batch size: 65, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 22:59:50,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=830277.1666666666, ans=0.2 2024-09-18 23:00:30,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=830362.1666666666, ans=0.0 2024-09-18 23:00:53,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=22.5 2024-09-18 23:00:56,894 INFO [train.py:1198] (0/2) Epoch 46, batch 5500, loss[loss=0.2304, ctc_loss=0.1547, cr_loss=0.3787, over 20677.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1439, cr_loss=0.3695, over 4071188.31 frames. ], batch size: 71, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:01:03,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=830418.8333333334, ans=0.0 2024-09-18 23:01:31,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830475.5, ans=0.1 2024-09-18 23:01:31,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=830475.5, ans=0.125 2024-09-18 23:01:38,695 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.271e+02 2.381e+02 2.540e+02 3.416e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-18 23:01:50,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830503.8333333334, ans=0.1 2024-09-18 23:02:11,206 INFO [train.py:1198] (0/2) Epoch 46, batch 5550, loss[loss=0.2176, ctc_loss=0.1462, cr_loss=0.357, over 20966.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1436, cr_loss=0.3689, over 4072811.82 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:02:43,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=830617.1666666666, ans=0.125 2024-09-18 23:02:53,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=12.0 2024-09-18 23:03:19,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=830673.8333333334, ans=0.07 2024-09-18 23:03:21,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=830673.8333333334, ans=0.125 2024-09-18 23:03:25,180 INFO [train.py:1198] (0/2) Epoch 46, batch 5600, loss[loss=0.1775, ctc_loss=0.1154, cr_loss=0.3104, over 20926.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1434, cr_loss=0.3687, over 4070406.02 frames. ], batch size: 49, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:03:27,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830702.1666666666, ans=0.1 2024-09-18 23:03:39,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.16 vs. limit=10.0 2024-09-18 23:04:06,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.276e+02 2.421e+02 2.574e+02 4.473e+02, threshold=4.842e+02, percent-clipped=0.0 2024-09-18 23:04:41,662 INFO [train.py:1198] (0/2) Epoch 46, batch 5650, loss[loss=0.2215, ctc_loss=0.1488, cr_loss=0.3637, over 20694.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1431, cr_loss=0.3684, over 4080575.25 frames. ], batch size: 68, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:04:45,178 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:04:52,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=830843.8333333334, ans=0.125 2024-09-18 23:05:20,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=830900.5, ans=0.025 2024-09-18 23:05:34,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=830928.8333333334, ans=0.125 2024-09-18 23:05:40,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=830928.8333333334, ans=0.0 2024-09-18 23:05:41,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.38 vs. limit=15.0 2024-09-18 23:05:46,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830957.1666666666, ans=0.1 2024-09-18 23:05:58,328 INFO [train.py:1198] (0/2) Epoch 46, batch 5700, loss[loss=0.1917, ctc_loss=0.1248, cr_loss=0.3341, over 21061.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1414, cr_loss=0.3655, over 4099053.54 frames. ], batch size: 53, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:06:38,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=831042.1666666666, ans=0.5 2024-09-18 23:06:39,567 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.244e+02 2.366e+02 2.522e+02 4.217e+02, threshold=4.732e+02, percent-clipped=0.0 2024-09-18 23:06:39,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=831042.1666666666, ans=0.0 2024-09-18 23:06:51,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=831070.5, ans=0.125 2024-09-18 23:07:08,384 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:07:09,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=831098.8333333334, ans=0.125 2024-09-18 23:07:12,365 INFO [train.py:1198] (0/2) Epoch 46, batch 5750, loss[loss=0.22, ctc_loss=0.1448, cr_loss=0.3761, over 21014.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1414, cr_loss=0.3656, over 4107110.44 frames. ], batch size: 67, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:07:23,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=831127.1666666666, ans=0.125 2024-09-18 23:07:27,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=831155.5, ans=0.125 2024-09-18 23:08:04,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=831212.1666666666, ans=0.0 2024-09-18 23:08:13,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=831240.5, ans=0.125 2024-09-18 23:08:16,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=831240.5, ans=0.0 2024-09-18 23:08:25,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=22.5 2024-09-18 23:08:26,776 INFO [train.py:1198] (0/2) Epoch 46, batch 5800, loss[loss=0.2111, ctc_loss=0.1383, cr_loss=0.364, over 20847.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1418, cr_loss=0.3661, over 4098293.49 frames. ], batch size: 59, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:08:34,634 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:08:49,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=831297.1666666666, ans=0.0 2024-09-18 23:08:51,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=831297.1666666666, ans=0.07 2024-09-18 23:08:54,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=831297.1666666666, ans=0.125 2024-09-18 23:09:01,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=831325.5, ans=0.125 2024-09-18 23:09:08,688 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.225e+02 2.338e+02 2.574e+02 5.613e+02, threshold=4.675e+02, percent-clipped=1.0 2024-09-18 23:09:18,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=831353.8333333334, ans=0.2 2024-09-18 23:09:24,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=831353.8333333334, ans=0.0 2024-09-18 23:09:41,278 INFO [train.py:1198] (0/2) Epoch 46, batch 5850, loss[loss=0.2518, ctc_loss=0.1649, cr_loss=0.4346, over 20645.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.142, cr_loss=0.3667, over 4100622.36 frames. ], batch size: 66, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:09:41,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=831410.5, ans=0.125 2024-09-18 23:09:50,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=831410.5, ans=0.0 2024-09-18 23:10:32,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-18 23:10:55,553 INFO [train.py:1198] (0/2) Epoch 46, batch 5900, loss[loss=0.2005, ctc_loss=0.1325, cr_loss=0.3403, over 20865.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.142, cr_loss=0.3669, over 4108889.52 frames. ], batch size: 57, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:11:04,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=831552.1666666666, ans=0.0 2024-09-18 23:11:13,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=831580.5, ans=0.1 2024-09-18 23:11:20,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=831580.5, ans=0.125 2024-09-18 23:11:37,200 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.284e+02 2.401e+02 2.557e+02 3.628e+02, threshold=4.803e+02, percent-clipped=0.0 2024-09-18 23:12:01,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=831665.5, ans=0.1 2024-09-18 23:12:08,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=831693.8333333334, ans=0.2 2024-09-18 23:12:09,960 INFO [train.py:1198] (0/2) Epoch 46, batch 5950, loss[loss=0.2172, ctc_loss=0.1399, cr_loss=0.3865, over 20895.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3677, over 4110086.11 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:12:22,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=831693.8333333334, ans=0.125 2024-09-18 23:12:32,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=831722.1666666666, ans=0.125 2024-09-18 23:12:38,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=831750.5, ans=0.0 2024-09-18 23:12:38,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=12.0 2024-09-18 23:12:59,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=831778.8333333334, ans=0.2 2024-09-18 23:13:26,523 INFO [train.py:1198] (0/2) Epoch 46, batch 6000, loss[loss=0.2236, ctc_loss=0.1497, cr_loss=0.3699, over 20213.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3672, over 4116751.72 frames. ], batch size: 74, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:13:26,524 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 23:13:33,697 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.6804, 2.2457, 2.2138, 2.5121, 2.2654, 2.3850, 1.8404, 1.8054], device='cuda:0') 2024-09-18 23:13:45,365 INFO [train.py:1230] (0/2) Epoch 46, validation: loss=0.039, ctc_loss=0.039, cr_loss=1.565e-14, over 944034.00 frames. 2024-09-18 23:13:45,365 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 23:13:47,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=831835.5, ans=0.025 2024-09-18 23:13:50,454 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=22.5 2024-09-18 23:14:03,842 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2024-09-18 23:14:15,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=831892.1666666666, ans=0.1 2024-09-18 23:14:24,568 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=10.0 2024-09-18 23:14:26,839 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.890e+02 2.266e+02 2.435e+02 2.603e+02 3.470e+02, threshold=4.869e+02, percent-clipped=0.0 2024-09-18 23:14:27,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=831892.1666666666, ans=0.125 2024-09-18 23:14:55,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=831948.8333333334, ans=0.2 2024-09-18 23:14:59,696 INFO [train.py:1198] (0/2) Epoch 46, batch 6050, loss[loss=0.1865, ctc_loss=0.1234, cr_loss=0.3152, over 20964.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.3682, over 4112935.66 frames. ], batch size: 51, lr: 1.78e-03, grad_scale: 32.0 2024-09-18 23:15:03,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=831977.1666666666, ans=0.1 2024-09-18 23:15:09,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=831977.1666666666, ans=0.2 2024-09-18 23:16:14,964 INFO [train.py:1198] (0/2) Epoch 46, batch 6100, loss[loss=0.2397, ctc_loss=0.1593, cr_loss=0.4018, over 19533.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3676, over 4098746.20 frames. ], batch size: 90, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:16:22,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=832118.8333333334, ans=0.0 2024-09-18 23:16:39,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=832147.1666666666, ans=0.125 2024-09-18 23:16:51,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=832175.5, ans=0.125 2024-09-18 23:16:58,073 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.260e+02 2.376e+02 2.543e+02 5.489e+02, threshold=4.751e+02, percent-clipped=2.0 2024-09-18 23:17:27,033 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:17:29,418 INFO [train.py:1198] (0/2) Epoch 46, batch 6150, loss[loss=0.2441, ctc_loss=0.1655, cr_loss=0.3927, over 18535.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.3688, over 4082306.15 frames. ], batch size: 108, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:17:40,244 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:17:42,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2024-09-18 23:17:55,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=12.0 2024-09-18 23:18:05,219 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:18:13,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=832345.5, ans=0.125 2024-09-18 23:18:18,527 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:18:35,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=832373.8333333334, ans=0.025 2024-09-18 23:18:42,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=832402.1666666666, ans=0.025 2024-09-18 23:18:43,714 INFO [train.py:1198] (0/2) Epoch 46, batch 6200, loss[loss=0.2273, ctc_loss=0.1496, cr_loss=0.3888, over 21033.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3673, over 4075743.34 frames. ], batch size: 62, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:18:52,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=832402.1666666666, ans=0.1 2024-09-18 23:19:00,011 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.05 vs. limit=10.0 2024-09-18 23:19:15,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=832458.8333333334, ans=0.125 2024-09-18 23:19:26,669 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.069e+02 2.290e+02 2.419e+02 2.581e+02 5.817e+02, threshold=4.837e+02, percent-clipped=1.0 2024-09-18 23:19:57,865 INFO [train.py:1198] (0/2) Epoch 46, batch 6250, loss[loss=0.1806, ctc_loss=0.114, cr_loss=0.3332, over 20299.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3675, over 4066430.80 frames. ], batch size: 45, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:20:10,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=832543.8333333334, ans=0.0 2024-09-18 23:20:16,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=832572.1666666666, ans=0.0 2024-09-18 23:21:12,925 INFO [train.py:1198] (0/2) Epoch 46, batch 6300, loss[loss=0.263, ctc_loss=0.1844, cr_loss=0.393, over 13816.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1436, cr_loss=0.3684, over 4009035.32 frames. ], batch size: 149, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:21:44,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=832742.1666666666, ans=0.0 2024-09-18 23:21:44,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=832742.1666666666, ans=0.125 2024-09-18 23:21:55,151 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.136e+02 2.359e+02 2.520e+02 2.688e+02 4.055e+02, threshold=5.041e+02, percent-clipped=0.0 2024-09-18 23:22:24,614 INFO [train.py:1198] (0/2) Epoch 46, batch 6350, loss[loss=0.2325, ctc_loss=0.1585, cr_loss=0.3696, over 14172.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1479, cr_loss=0.3717, over 3851653.52 frames. ], batch size: 149, lr: 1.78e-03, grad_scale: 16.0 2024-09-18 23:22:36,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=832827.1666666666, ans=0.125 2024-09-18 23:23:16,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=832912.1666666666, ans=0.025 2024-09-18 23:23:20,911 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-18 23:23:23,048 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-46.pt 2024-09-18 23:24:13,796 INFO [train.py:1198] (0/2) Epoch 47, batch 0, loss[loss=0.1952, ctc_loss=0.1277, cr_loss=0.3375, over 20970.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1277, cr_loss=0.3375, over 20970.00 frames. ], batch size: 48, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:24:13,798 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 23:24:34,757 INFO [train.py:1230] (0/2) Epoch 47, validation: loss=0.03863, ctc_loss=0.03863, cr_loss=1.546e-14, over 944034.00 frames. 2024-09-18 23:24:34,758 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-18 23:24:48,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=832971.6666666666, ans=0.125 2024-09-18 23:24:54,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=832971.6666666666, ans=0.5 2024-09-18 23:25:20,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-09-18 23:25:26,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=833028.3333333334, ans=0.125 2024-09-18 23:25:32,212 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.461e+02 2.757e+02 3.058e+02 4.288e+02, threshold=5.514e+02, percent-clipped=0.0 2024-09-18 23:25:37,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.91 vs. limit=10.0 2024-09-18 23:25:39,099 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-18 23:25:50,246 INFO [train.py:1198] (0/2) Epoch 47, batch 50, loss[loss=0.2174, ctc_loss=0.1444, cr_loss=0.3647, over 20796.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1425, cr_loss=0.3665, over 925678.33 frames. ], batch size: 56, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:26:02,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=833085.0, ans=0.125 2024-09-18 23:26:16,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=833113.3333333334, ans=0.125 2024-09-18 23:26:39,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=833170.0, ans=0.0 2024-09-18 23:26:39,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=833170.0, ans=0.125 2024-09-18 23:26:40,743 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:27:05,912 INFO [train.py:1198] (0/2) Epoch 47, batch 100, loss[loss=0.256, ctc_loss=0.1746, cr_loss=0.4067, over 17894.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.145, cr_loss=0.3713, over 1609816.58 frames. ], batch size: 108, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:27:08,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2024-09-18 23:27:46,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=833283.3333333334, ans=0.0 2024-09-18 23:28:04,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=833311.6666666666, ans=0.025 2024-09-18 23:28:05,968 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.287e+02 2.413e+02 2.555e+02 3.150e+02, threshold=4.826e+02, percent-clipped=0.0 2024-09-18 23:28:12,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=833340.0, ans=0.5 2024-09-18 23:28:19,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=833340.0, ans=0.125 2024-09-18 23:28:21,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=833340.0, ans=0.015 2024-09-18 23:28:23,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=833368.3333333334, ans=0.125 2024-09-18 23:28:24,207 INFO [train.py:1198] (0/2) Epoch 47, batch 150, loss[loss=0.2299, ctc_loss=0.1524, cr_loss=0.3875, over 20954.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1431, cr_loss=0.3678, over 2155618.45 frames. ], batch size: 58, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:28:35,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=833368.3333333334, ans=0.2 2024-09-18 23:28:56,807 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=22.5 2024-09-18 23:29:39,814 INFO [train.py:1198] (0/2) Epoch 47, batch 200, loss[loss=0.2124, ctc_loss=0.1401, cr_loss=0.3612, over 20987.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1413, cr_loss=0.3644, over 2587752.55 frames. ], batch size: 61, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:30:40,724 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.264e+02 2.382e+02 2.575e+02 3.254e+02, threshold=4.764e+02, percent-clipped=0.0 2024-09-18 23:30:55,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=833623.3333333334, ans=0.125 2024-09-18 23:30:57,693 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:30:58,759 INFO [train.py:1198] (0/2) Epoch 47, batch 250, loss[loss=0.2559, ctc_loss=0.1713, cr_loss=0.423, over 18210.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1412, cr_loss=0.3646, over 2921386.41 frames. ], batch size: 108, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:31:03,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=833651.6666666666, ans=0.07 2024-09-18 23:31:15,914 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:31:30,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=833708.3333333334, ans=0.125 2024-09-18 23:31:33,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=833708.3333333334, ans=0.2 2024-09-18 23:31:44,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=833736.6666666666, ans=0.0 2024-09-18 23:32:11,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.25 vs. limit=15.0 2024-09-18 23:32:13,962 INFO [train.py:1198] (0/2) Epoch 47, batch 300, loss[loss=0.2565, ctc_loss=0.1714, cr_loss=0.4256, over 20665.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.141, cr_loss=0.3657, over 3194999.78 frames. ], batch size: 66, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:32:20,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=833793.3333333334, ans=0.0 2024-09-18 23:33:11,436 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.230e+02 2.339e+02 2.499e+02 8.658e+02, threshold=4.679e+02, percent-clipped=0.0 2024-09-18 23:33:19,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=833906.6666666666, ans=0.1 2024-09-18 23:33:29,880 INFO [train.py:1198] (0/2) Epoch 47, batch 350, loss[loss=0.1971, ctc_loss=0.1256, cr_loss=0.3573, over 20961.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1404, cr_loss=0.3645, over 3390174.69 frames. ], batch size: 51, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:33:37,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-09-18 23:34:20,406 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.51 vs. limit=15.0 2024-09-18 23:34:43,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=834048.3333333334, ans=0.125 2024-09-18 23:34:45,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834048.3333333334, ans=0.1 2024-09-18 23:34:47,994 INFO [train.py:1198] (0/2) Epoch 47, batch 400, loss[loss=0.2004, ctc_loss=0.1321, cr_loss=0.3414, over 21059.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1402, cr_loss=0.3641, over 3550413.85 frames. ], batch size: 53, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:35:06,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=834105.0, ans=0.0 2024-09-18 23:35:09,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=834105.0, ans=0.125 2024-09-18 23:35:44,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=834161.6666666666, ans=0.2 2024-09-18 23:35:45,492 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.055e+02 2.236e+02 2.401e+02 2.534e+02 4.806e+02, threshold=4.802e+02, percent-clipped=2.0 2024-09-18 23:35:45,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=834161.6666666666, ans=0.5 2024-09-18 23:36:06,471 INFO [train.py:1198] (0/2) Epoch 47, batch 450, loss[loss=0.2132, ctc_loss=0.1391, cr_loss=0.3702, over 20939.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1412, cr_loss=0.3656, over 3668951.14 frames. ], batch size: 60, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:36:27,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=834246.6666666666, ans=0.125 2024-09-18 23:36:29,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-09-18 23:36:53,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834303.3333333334, ans=0.1 2024-09-18 23:37:00,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=834303.3333333334, ans=0.0 2024-09-18 23:37:06,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.84 vs. limit=15.0 2024-09-18 23:37:21,199 INFO [train.py:1198] (0/2) Epoch 47, batch 500, loss[loss=0.1869, ctc_loss=0.1199, cr_loss=0.3347, over 20985.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3667, over 3766404.31 frames. ], batch size: 48, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:37:44,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=834388.3333333334, ans=0.0 2024-09-18 23:37:54,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=834416.6666666666, ans=0.0 2024-09-18 23:37:57,183 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2024-09-18 23:38:08,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=834445.0, ans=0.05 2024-09-18 23:38:13,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=834445.0, ans=0.0 2024-09-18 23:38:18,915 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.250e+02 2.368e+02 2.551e+02 3.301e+02, threshold=4.737e+02, percent-clipped=0.0 2024-09-18 23:38:22,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=15.0 2024-09-18 23:38:24,221 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2024-09-18 23:38:28,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=834473.3333333334, ans=0.125 2024-09-18 23:38:37,106 INFO [train.py:1198] (0/2) Epoch 47, batch 550, loss[loss=0.2389, ctc_loss=0.1583, cr_loss=0.4031, over 20628.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1422, cr_loss=0.3681, over 3841980.92 frames. ], batch size: 68, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:38:44,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=834501.6666666666, ans=0.0 2024-09-18 23:39:42,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=22.5 2024-09-18 23:39:54,868 INFO [train.py:1198] (0/2) Epoch 47, batch 600, loss[loss=0.2434, ctc_loss=0.1645, cr_loss=0.3945, over 20856.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1427, cr_loss=0.3687, over 3890856.35 frames. ], batch size: 65, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:40:10,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=834671.6666666666, ans=0.125 2024-09-18 23:40:52,381 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.241e+02 2.402e+02 2.571e+02 5.153e+02, threshold=4.804e+02, percent-clipped=1.0 2024-09-18 23:40:52,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=834728.3333333334, ans=0.0 2024-09-18 23:41:10,540 INFO [train.py:1198] (0/2) Epoch 47, batch 650, loss[loss=0.1715, ctc_loss=0.1113, cr_loss=0.3013, over 20957.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3667, over 3941752.73 frames. ], batch size: 50, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:41:10,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834785.0, ans=0.1 2024-09-18 23:41:21,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=834785.0, ans=0.2 2024-09-18 23:41:51,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834841.6666666666, ans=0.1 2024-09-18 23:41:58,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834870.0, ans=0.1 2024-09-18 23:42:05,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=834870.0, ans=0.025 2024-09-18 23:42:29,017 INFO [train.py:1198] (0/2) Epoch 47, batch 700, loss[loss=0.2234, ctc_loss=0.1449, cr_loss=0.3926, over 21054.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.368, over 3970814.50 frames. ], batch size: 56, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:43:25,882 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.323e+02 2.458e+02 2.598e+02 5.259e+02, threshold=4.917e+02, percent-clipped=1.0 2024-09-18 23:43:43,786 INFO [train.py:1198] (0/2) Epoch 47, batch 750, loss[loss=0.2249, ctc_loss=0.1468, cr_loss=0.3907, over 21034.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.3691, over 4006971.77 frames. ], batch size: 62, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:43:47,981 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=15.0 2024-09-18 23:44:16,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=835125.0, ans=0.125 2024-09-18 23:44:53,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=835181.6666666666, ans=0.2 2024-09-18 23:44:53,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=835181.6666666666, ans=0.125 2024-09-18 23:44:59,552 INFO [train.py:1198] (0/2) Epoch 47, batch 800, loss[loss=0.2457, ctc_loss=0.1614, cr_loss=0.4215, over 20776.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1438, cr_loss=0.3708, over 4020162.45 frames. ], batch size: 56, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:45:54,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835295.0, ans=0.1 2024-09-18 23:45:59,703 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.234e+02 2.340e+02 2.471e+02 3.114e+02, threshold=4.680e+02, percent-clipped=0.0 2024-09-18 23:46:06,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=835323.3333333334, ans=0.125 2024-09-18 23:46:09,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-09-18 23:46:17,396 INFO [train.py:1198] (0/2) Epoch 47, batch 850, loss[loss=0.1778, ctc_loss=0.1158, cr_loss=0.31, over 20990.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1439, cr_loss=0.3707, over 4043409.77 frames. ], batch size: 50, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:46:22,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=835351.6666666666, ans=0.0 2024-09-18 23:46:23,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=835351.6666666666, ans=0.125 2024-09-18 23:46:28,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=835351.6666666666, ans=0.0 2024-09-18 23:46:58,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=835408.3333333334, ans=0.125 2024-09-18 23:47:20,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.27 vs. limit=10.0 2024-09-18 23:47:35,851 INFO [train.py:1198] (0/2) Epoch 47, batch 900, loss[loss=0.2446, ctc_loss=0.1619, cr_loss=0.4136, over 20670.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1434, cr_loss=0.3701, over 4068073.19 frames. ], batch size: 68, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:48:25,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=835578.3333333334, ans=0.125 2024-09-18 23:48:33,152 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.249e+02 2.397e+02 2.521e+02 3.038e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-18 23:48:51,174 INFO [train.py:1198] (0/2) Epoch 47, batch 950, loss[loss=0.23, ctc_loss=0.1522, cr_loss=0.3888, over 21016.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.144, cr_loss=0.3712, over 4072957.64 frames. ], batch size: 61, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:48:53,435 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-18 23:49:06,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835663.3333333334, ans=0.1 2024-09-18 23:49:34,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=835720.0, ans=0.0 2024-09-18 23:49:57,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=835748.3333333334, ans=0.0 2024-09-18 23:50:06,088 INFO [train.py:1198] (0/2) Epoch 47, batch 1000, loss[loss=0.2013, ctc_loss=0.1349, cr_loss=0.3322, over 20263.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1434, cr_loss=0.3701, over 4077763.22 frames. ], batch size: 74, lr: 1.76e-03, grad_scale: 16.0 2024-09-18 23:50:08,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2024-09-18 23:50:45,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=835833.3333333334, ans=0.0 2024-09-18 23:50:45,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=835833.3333333334, ans=0.0 2024-09-18 23:50:47,575 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-09-18 23:51:02,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=835861.6666666666, ans=0.0 2024-09-18 23:51:07,964 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.275e+02 2.405e+02 2.540e+02 4.645e+02, threshold=4.809e+02, percent-clipped=0.0 2024-09-18 23:51:08,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=835890.0, ans=0.125 2024-09-18 23:51:20,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=835890.0, ans=0.2 2024-09-18 23:51:24,654 INFO [train.py:1198] (0/2) Epoch 47, batch 1050, loss[loss=0.205, ctc_loss=0.1348, cr_loss=0.3508, over 20943.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3688, over 4071693.21 frames. ], batch size: 48, lr: 1.76e-03, grad_scale: 16.0 2024-09-18 23:51:38,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=835946.6666666666, ans=0.125 2024-09-18 23:51:50,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=835946.6666666666, ans=0.125 2024-09-18 23:52:06,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=835975.0, ans=0.125 2024-09-18 23:52:39,737 INFO [train.py:1198] (0/2) Epoch 47, batch 1100, loss[loss=0.1955, ctc_loss=0.1281, cr_loss=0.3367, over 20945.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1432, cr_loss=0.3693, over 4071972.50 frames. ], batch size: 50, lr: 1.76e-03, grad_scale: 16.0 2024-09-18 23:52:48,001 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=22.5 2024-09-18 23:52:51,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=836060.0, ans=0.05 2024-09-18 23:53:08,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-18 23:53:11,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=836116.6666666666, ans=0.1 2024-09-18 23:53:41,085 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.275e+02 2.364e+02 2.538e+02 5.104e+02, threshold=4.729e+02, percent-clipped=1.0 2024-09-18 23:53:53,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=836173.3333333334, ans=0.025 2024-09-18 23:53:57,448 INFO [train.py:1198] (0/2) Epoch 47, batch 1150, loss[loss=0.2621, ctc_loss=0.176, cr_loss=0.4305, over 20958.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3674, over 4086674.96 frames. ], batch size: 64, lr: 1.76e-03, grad_scale: 16.0 2024-09-18 23:53:57,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=836201.6666666666, ans=0.025 2024-09-18 23:54:11,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=836230.0, ans=0.1 2024-09-18 23:54:16,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=12.0 2024-09-18 23:54:17,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=836230.0, ans=0.125 2024-09-18 23:54:32,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-09-18 23:54:45,418 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:55:13,751 INFO [train.py:1198] (0/2) Epoch 47, batch 1200, loss[loss=0.231, ctc_loss=0.1555, cr_loss=0.3776, over 20968.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.367, over 4082973.55 frames. ], batch size: 58, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:55:14,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=12.0 2024-09-18 23:55:16,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=836343.3333333334, ans=0.0 2024-09-18 23:55:51,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=836400.0, ans=0.125 2024-09-18 23:55:53,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=836400.0, ans=0.0 2024-09-18 23:55:54,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=836400.0, ans=0.125 2024-09-18 23:56:08,818 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:56:12,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.073e+02 2.286e+02 2.448e+02 2.685e+02 6.865e+02, threshold=4.896e+02, percent-clipped=1.0 2024-09-18 23:56:32,880 INFO [train.py:1198] (0/2) Epoch 47, batch 1250, loss[loss=0.1863, ctc_loss=0.1206, cr_loss=0.3283, over 21000.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3683, over 4084888.90 frames. ], batch size: 52, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:56:33,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=836485.0, ans=0.125 2024-09-18 23:56:34,887 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:57:23,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=836570.0, ans=0.1 2024-09-18 23:57:49,054 INFO [train.py:1198] (0/2) Epoch 47, batch 1300, loss[loss=0.2313, ctc_loss=0.1533, cr_loss=0.3898, over 20835.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3692, over 4077612.93 frames. ], batch size: 65, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:57:53,889 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:58:14,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=836655.0, ans=0.0 2024-09-18 23:58:24,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=836683.3333333334, ans=0.0 2024-09-18 23:58:50,524 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.253e+02 2.367e+02 2.483e+02 4.662e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-18 23:58:56,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=836740.0, ans=0.2 2024-09-18 23:59:07,232 INFO [train.py:1198] (0/2) Epoch 47, batch 1350, loss[loss=0.2351, ctc_loss=0.1541, cr_loss=0.4052, over 20688.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3676, over 4077505.27 frames. ], batch size: 71, lr: 1.76e-03, grad_scale: 32.0 2024-09-18 23:59:44,405 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.31 vs. limit=10.0 2024-09-18 23:59:46,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2024-09-18 23:59:48,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=22.5 2024-09-19 00:00:23,287 INFO [train.py:1198] (0/2) Epoch 47, batch 1400, loss[loss=0.1911, ctc_loss=0.1217, cr_loss=0.3471, over 20972.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3684, over 4075213.41 frames. ], batch size: 55, lr: 1.76e-03, grad_scale: 32.0 2024-09-19 00:01:04,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=836966.6666666666, ans=0.2 2024-09-19 00:01:21,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=836995.0, ans=0.125 2024-09-19 00:01:22,985 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.296e+02 2.414e+02 2.581e+02 3.367e+02, threshold=4.828e+02, percent-clipped=0.0 2024-09-19 00:01:26,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=837023.3333333334, ans=0.125 2024-09-19 00:01:39,754 INFO [train.py:1198] (0/2) Epoch 47, batch 1450, loss[loss=0.1808, ctc_loss=0.1167, cr_loss=0.3203, over 20934.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3687, over 4076913.00 frames. ], batch size: 49, lr: 1.76e-03, grad_scale: 32.0 2024-09-19 00:01:54,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-09-19 00:02:08,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=837080.0, ans=0.0 2024-09-19 00:02:11,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=837108.3333333334, ans=0.125 2024-09-19 00:02:28,087 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:02:57,527 INFO [train.py:1198] (0/2) Epoch 47, batch 1500, loss[loss=0.2439, ctc_loss=0.1634, cr_loss=0.4022, over 20976.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.369, over 4078515.51 frames. ], batch size: 64, lr: 1.76e-03, grad_scale: 32.0 2024-09-19 00:03:11,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=837221.6666666666, ans=0.125 2024-09-19 00:03:13,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=837221.6666666666, ans=0.0 2024-09-19 00:03:56,612 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.253e+02 2.430e+02 2.579e+02 3.421e+02, threshold=4.859e+02, percent-clipped=0.0 2024-09-19 00:04:01,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=837306.6666666666, ans=0.1 2024-09-19 00:04:04,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=837306.6666666666, ans=0.125 2024-09-19 00:04:13,209 INFO [train.py:1198] (0/2) Epoch 47, batch 1550, loss[loss=0.2106, ctc_loss=0.1393, cr_loss=0.3562, over 21020.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.368, over 4089914.36 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:04:16,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=837335.0, ans=0.1 2024-09-19 00:05:02,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-19 00:05:07,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=837420.0, ans=0.125 2024-09-19 00:05:22,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=837448.3333333334, ans=0.125 2024-09-19 00:05:30,809 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-09-19 00:05:31,510 INFO [train.py:1198] (0/2) Epoch 47, batch 1600, loss[loss=0.2357, ctc_loss=0.1553, cr_loss=0.4019, over 20962.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1427, cr_loss=0.3684, over 4089254.75 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:06:00,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=837533.3333333334, ans=0.0 2024-09-19 00:06:30,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.295e+02 2.387e+02 2.585e+02 3.288e+02, threshold=4.775e+02, percent-clipped=0.0 2024-09-19 00:06:39,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=837590.0, ans=0.07 2024-09-19 00:06:47,138 INFO [train.py:1198] (0/2) Epoch 47, batch 1650, loss[loss=0.222, ctc_loss=0.1441, cr_loss=0.3892, over 20994.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3681, over 4088648.44 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:06:50,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=837618.3333333334, ans=0.125 2024-09-19 00:06:50,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.74 vs. limit=22.5 2024-09-19 00:06:56,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=837618.3333333334, ans=0.125 2024-09-19 00:06:58,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=837618.3333333334, ans=0.02 2024-09-19 00:07:03,030 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=15.0 2024-09-19 00:07:38,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=837703.3333333334, ans=0.0 2024-09-19 00:08:03,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=837731.6666666666, ans=0.125 2024-09-19 00:08:06,082 INFO [train.py:1198] (0/2) Epoch 47, batch 1700, loss[loss=0.1667, ctc_loss=0.1089, cr_loss=0.2888, over 20934.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3684, over 4093165.20 frames. ], batch size: 49, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:08:12,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=837760.0, ans=0.125 2024-09-19 00:08:18,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=837760.0, ans=0.05 2024-09-19 00:08:24,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=837788.3333333334, ans=0.07 2024-09-19 00:09:00,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=837845.0, ans=0.125 2024-09-19 00:09:04,359 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.220e+02 2.390e+02 2.523e+02 4.047e+02, threshold=4.780e+02, percent-clipped=0.0 2024-09-19 00:09:04,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=837873.3333333334, ans=0.125 2024-09-19 00:09:20,849 INFO [train.py:1198] (0/2) Epoch 47, batch 1750, loss[loss=0.2157, ctc_loss=0.1439, cr_loss=0.3586, over 20969.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.368, over 4087276.85 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:09:24,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=837901.6666666666, ans=0.2 2024-09-19 00:09:41,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2024-09-19 00:10:14,934 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=12.0 2024-09-19 00:10:16,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=837986.6666666666, ans=0.1 2024-09-19 00:10:38,013 INFO [train.py:1198] (0/2) Epoch 47, batch 1800, loss[loss=0.2022, ctc_loss=0.1283, cr_loss=0.3693, over 20996.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3689, over 4076682.12 frames. ], batch size: 48, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:10:50,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=22.5 2024-09-19 00:11:07,458 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-19 00:11:09,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.09 vs. limit=6.0 2024-09-19 00:11:16,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=838100.0, ans=0.125 2024-09-19 00:11:27,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=12.0 2024-09-19 00:11:33,949 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=8.0 2024-09-19 00:11:37,360 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.253e+02 2.391e+02 2.616e+02 3.539e+02, threshold=4.781e+02, percent-clipped=0.0 2024-09-19 00:11:53,850 INFO [train.py:1198] (0/2) Epoch 47, batch 1850, loss[loss=0.2209, ctc_loss=0.1444, cr_loss=0.3827, over 21058.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.368, over 4086823.19 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:12:01,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=838185.0, ans=0.0 2024-09-19 00:13:09,720 INFO [train.py:1198] (0/2) Epoch 47, batch 1900, loss[loss=0.1616, ctc_loss=0.1028, cr_loss=0.2939, over 19886.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3669, over 4093941.51 frames. ], batch size: 44, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:13:19,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=838326.6666666666, ans=0.0 2024-09-19 00:13:46,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=12.0 2024-09-19 00:13:50,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=838383.3333333334, ans=0.125 2024-09-19 00:14:09,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=838411.6666666666, ans=0.1 2024-09-19 00:14:10,749 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.235e+02 2.407e+02 2.574e+02 4.651e+02, threshold=4.814e+02, percent-clipped=0.0 2024-09-19 00:14:20,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=838440.0, ans=0.025 2024-09-19 00:14:27,653 INFO [train.py:1198] (0/2) Epoch 47, batch 1950, loss[loss=0.1798, ctc_loss=0.115, cr_loss=0.3243, over 21061.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.367, over 4089511.35 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:15:42,981 INFO [train.py:1198] (0/2) Epoch 47, batch 2000, loss[loss=0.1836, ctc_loss=0.1173, cr_loss=0.3318, over 20293.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.3679, over 4077102.54 frames. ], batch size: 45, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:15:47,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=838610.0, ans=0.125 2024-09-19 00:16:05,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=838638.3333333334, ans=0.125 2024-09-19 00:16:12,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=12.0 2024-09-19 00:16:13,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=838638.3333333334, ans=0.0 2024-09-19 00:16:14,354 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-296000.pt 2024-09-19 00:16:20,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=838666.6666666666, ans=0.125 2024-09-19 00:16:29,809 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2024-09-19 00:16:46,992 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.253e+02 2.394e+02 2.559e+02 3.952e+02, threshold=4.788e+02, percent-clipped=0.0 2024-09-19 00:16:53,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=838723.3333333334, ans=0.125 2024-09-19 00:17:01,840 INFO [train.py:1198] (0/2) Epoch 47, batch 2050, loss[loss=0.2306, ctc_loss=0.1535, cr_loss=0.3858, over 20749.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3675, over 4083268.76 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:17:41,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=838808.3333333334, ans=0.125 2024-09-19 00:18:15,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=838893.3333333334, ans=0.125 2024-09-19 00:18:17,161 INFO [train.py:1198] (0/2) Epoch 47, batch 2100, loss[loss=0.1938, ctc_loss=0.128, cr_loss=0.3289, over 21001.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.142, cr_loss=0.3673, over 4096317.86 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:19:06,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2024-09-19 00:19:07,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=12.0 2024-09-19 00:19:09,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2024-09-19 00:19:20,628 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.293e+02 2.407e+02 2.603e+02 3.621e+02, threshold=4.813e+02, percent-clipped=0.0 2024-09-19 00:19:35,726 INFO [train.py:1198] (0/2) Epoch 47, batch 2150, loss[loss=0.2302, ctc_loss=0.152, cr_loss=0.3911, over 21014.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1414, cr_loss=0.3667, over 4104684.82 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:20:03,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=839063.3333333334, ans=0.125 2024-09-19 00:20:51,792 INFO [train.py:1198] (0/2) Epoch 47, batch 2200, loss[loss=0.1681, ctc_loss=0.1074, cr_loss=0.3035, over 20980.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.367, over 4109823.69 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:21:02,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=839176.6666666666, ans=0.2 2024-09-19 00:21:54,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.250e+02 2.374e+02 2.571e+02 4.795e+02, threshold=4.748e+02, percent-clipped=0.0 2024-09-19 00:22:09,687 INFO [train.py:1198] (0/2) Epoch 47, batch 2250, loss[loss=0.2073, ctc_loss=0.1355, cr_loss=0.359, over 20790.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3675, over 4109526.49 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:22:43,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=839375.0, ans=0.125 2024-09-19 00:22:54,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=839403.3333333334, ans=0.2 2024-09-19 00:23:03,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=839403.3333333334, ans=0.125 2024-09-19 00:23:19,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=839431.6666666666, ans=0.125 2024-09-19 00:23:25,287 INFO [train.py:1198] (0/2) Epoch 47, batch 2300, loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3681, over 21059.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3678, over 4104905.12 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:23:55,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=839516.6666666666, ans=0.0 2024-09-19 00:24:03,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-09-19 00:24:21,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=839545.0, ans=0.0 2024-09-19 00:24:25,354 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.258e+02 2.391e+02 2.525e+02 3.714e+02, threshold=4.782e+02, percent-clipped=0.0 2024-09-19 00:24:34,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=839573.3333333334, ans=0.2 2024-09-19 00:24:40,165 INFO [train.py:1198] (0/2) Epoch 47, batch 2350, loss[loss=0.2521, ctc_loss=0.17, cr_loss=0.4109, over 20690.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1423, cr_loss=0.3684, over 4107533.30 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 16.0 2024-09-19 00:25:29,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=839686.6666666666, ans=0.2 2024-09-19 00:25:56,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=839743.3333333334, ans=0.125 2024-09-19 00:25:57,793 INFO [train.py:1198] (0/2) Epoch 47, batch 2400, loss[loss=0.1912, ctc_loss=0.1232, cr_loss=0.3398, over 20958.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3687, over 4112157.52 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:25:58,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=839743.3333333334, ans=0.0 2024-09-19 00:26:17,927 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:26:51,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=839828.3333333334, ans=0.0 2024-09-19 00:26:58,587 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.222e+02 2.397e+02 2.581e+02 3.490e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-19 00:26:58,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839856.6666666666, ans=0.1 2024-09-19 00:27:08,160 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:27:13,781 INFO [train.py:1198] (0/2) Epoch 47, batch 2450, loss[loss=0.2509, ctc_loss=0.1686, cr_loss=0.4118, over 19998.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1424, cr_loss=0.3687, over 4106362.59 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:27:33,855 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:28:07,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=839970.0, ans=0.125 2024-09-19 00:28:18,132 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-09-19 00:28:32,582 INFO [train.py:1198] (0/2) Epoch 47, batch 2500, loss[loss=0.2211, ctc_loss=0.1463, cr_loss=0.3738, over 20959.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.142, cr_loss=0.3687, over 4109513.09 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:28:35,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=840026.6666666666, ans=0.025 2024-09-19 00:28:36,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=22.5 2024-09-19 00:28:56,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=840055.0, ans=0.2 2024-09-19 00:28:59,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=840055.0, ans=0.0 2024-09-19 00:29:32,926 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.260e+02 2.361e+02 2.506e+02 3.338e+02, threshold=4.722e+02, percent-clipped=0.0 2024-09-19 00:29:37,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=840140.0, ans=0.0 2024-09-19 00:29:47,906 INFO [train.py:1198] (0/2) Epoch 47, batch 2550, loss[loss=0.2009, ctc_loss=0.1303, cr_loss=0.3531, over 21057.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1424, cr_loss=0.369, over 4115436.37 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:29:49,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840168.3333333334, ans=0.1 2024-09-19 00:30:16,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=840225.0, ans=0.0 2024-09-19 00:30:19,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=840225.0, ans=0.0 2024-09-19 00:30:36,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=840253.3333333334, ans=0.0 2024-09-19 00:30:39,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-19 00:30:42,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840253.3333333334, ans=0.1 2024-09-19 00:30:42,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=15.0 2024-09-19 00:30:43,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=840253.3333333334, ans=0.025 2024-09-19 00:31:06,125 INFO [train.py:1198] (0/2) Epoch 47, batch 2600, loss[loss=0.2342, ctc_loss=0.156, cr_loss=0.3912, over 19296.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1415, cr_loss=0.3673, over 4110744.58 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:31:06,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840310.0, ans=0.1 2024-09-19 00:31:09,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840310.0, ans=0.1 2024-09-19 00:31:09,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=840310.0, ans=0.125 2024-09-19 00:31:22,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.80 vs. limit=10.0 2024-09-19 00:31:41,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=840366.6666666666, ans=0.0 2024-09-19 00:31:50,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840395.0, ans=0.1 2024-09-19 00:32:06,799 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.293e+02 2.446e+02 2.585e+02 3.438e+02, threshold=4.891e+02, percent-clipped=0.0 2024-09-19 00:32:22,168 INFO [train.py:1198] (0/2) Epoch 47, batch 2650, loss[loss=0.2413, ctc_loss=0.1608, cr_loss=0.4026, over 20731.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3684, over 4091576.21 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:33:04,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=840508.3333333334, ans=0.125 2024-09-19 00:33:23,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=840536.6666666666, ans=0.125 2024-09-19 00:33:38,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=12.0 2024-09-19 00:33:40,704 INFO [train.py:1198] (0/2) Epoch 47, batch 2700, loss[loss=0.2018, ctc_loss=0.131, cr_loss=0.3538, over 20796.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1423, cr_loss=0.3681, over 4103181.67 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:33:48,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=840593.3333333334, ans=0.125 2024-09-19 00:34:22,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=840650.0, ans=0.0 2024-09-19 00:34:41,323 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.246e+02 2.332e+02 2.535e+02 3.753e+02, threshold=4.664e+02, percent-clipped=0.0 2024-09-19 00:34:56,464 INFO [train.py:1198] (0/2) Epoch 47, batch 2750, loss[loss=0.1852, ctc_loss=0.1185, cr_loss=0.3336, over 20413.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3686, over 4090172.35 frames. ], batch size: 45, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:35:28,919 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-09-19 00:35:44,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=840820.0, ans=0.125 2024-09-19 00:36:14,603 INFO [train.py:1198] (0/2) Epoch 47, batch 2800, loss[loss=0.2151, ctc_loss=0.1427, cr_loss=0.3621, over 20603.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.144, cr_loss=0.3702, over 4094244.80 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:36:25,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=840876.6666666666, ans=0.125 2024-09-19 00:36:37,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=840905.0, ans=0.0 2024-09-19 00:36:56,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=22.5 2024-09-19 00:37:15,264 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.263e+02 2.383e+02 2.536e+02 3.961e+02, threshold=4.767e+02, percent-clipped=0.0 2024-09-19 00:37:27,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=840990.0, ans=0.0 2024-09-19 00:37:30,434 INFO [train.py:1198] (0/2) Epoch 47, batch 2850, loss[loss=0.2099, ctc_loss=0.138, cr_loss=0.3592, over 20834.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3694, over 4094358.60 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:38:05,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=841075.0, ans=0.09899494936611666 2024-09-19 00:38:05,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-19 00:38:14,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=841103.3333333334, ans=0.04949747468305833 2024-09-19 00:38:44,930 INFO [train.py:1198] (0/2) Epoch 47, batch 2900, loss[loss=0.2072, ctc_loss=0.1333, cr_loss=0.3694, over 20960.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1444, cr_loss=0.3707, over 4069244.49 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:39:10,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=841188.3333333334, ans=0.125 2024-09-19 00:39:17,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=841216.6666666666, ans=0.1 2024-09-19 00:39:33,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=841245.0, ans=0.125 2024-09-19 00:39:48,368 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.287e+02 2.434e+02 2.608e+02 4.093e+02, threshold=4.867e+02, percent-clipped=0.0 2024-09-19 00:40:02,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=841301.6666666666, ans=0.125 2024-09-19 00:40:03,671 INFO [train.py:1198] (0/2) Epoch 47, batch 2950, loss[loss=0.2679, ctc_loss=0.1822, cr_loss=0.4287, over 20675.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1441, cr_loss=0.3706, over 4075426.91 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:40:26,863 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=12.0 2024-09-19 00:40:31,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=841330.0, ans=0.125 2024-09-19 00:40:56,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-19 00:41:07,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=841415.0, ans=0.95 2024-09-19 00:41:16,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-09-19 00:41:17,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=841443.3333333334, ans=0.125 2024-09-19 00:41:19,068 INFO [train.py:1198] (0/2) Epoch 47, batch 3000, loss[loss=0.2244, ctc_loss=0.1505, cr_loss=0.3695, over 20930.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1433, cr_loss=0.3692, over 4079687.84 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:41:19,069 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 00:41:38,016 INFO [train.py:1230] (0/2) Epoch 47, validation: loss=0.03886, ctc_loss=0.03886, cr_loss=1.566e-14, over 944034.00 frames. 2024-09-19 00:41:38,016 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-19 00:42:04,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=841471.6666666666, ans=0.0 2024-09-19 00:42:23,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=841500.0, ans=0.125 2024-09-19 00:42:41,484 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.079e+02 2.289e+02 2.426e+02 2.578e+02 4.191e+02, threshold=4.852e+02, percent-clipped=0.0 2024-09-19 00:42:44,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=841556.6666666666, ans=0.0 2024-09-19 00:42:53,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=841556.6666666666, ans=0.125 2024-09-19 00:42:53,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=841556.6666666666, ans=0.0 2024-09-19 00:42:56,372 INFO [train.py:1198] (0/2) Epoch 47, batch 3050, loss[loss=0.2042, ctc_loss=0.1333, cr_loss=0.3544, over 20988.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3686, over 4085586.53 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:43:16,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2024-09-19 00:43:34,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=841641.6666666666, ans=0.0 2024-09-19 00:43:40,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=841670.0, ans=0.2 2024-09-19 00:43:53,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=841670.0, ans=0.1 2024-09-19 00:44:11,683 INFO [train.py:1198] (0/2) Epoch 47, batch 3100, loss[loss=0.2231, ctc_loss=0.1453, cr_loss=0.3888, over 20803.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1418, cr_loss=0.3668, over 4092966.28 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:44:16,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=12.0 2024-09-19 00:45:14,947 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.246e+02 2.409e+02 2.625e+02 4.109e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-19 00:45:27,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=841840.0, ans=0.125 2024-09-19 00:45:30,131 INFO [train.py:1198] (0/2) Epoch 47, batch 3150, loss[loss=0.2169, ctc_loss=0.1437, cr_loss=0.366, over 20943.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3667, over 4097545.32 frames. ], batch size: 49, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:46:03,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=841925.0, ans=0.125 2024-09-19 00:46:45,679 INFO [train.py:1198] (0/2) Epoch 47, batch 3200, loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3691, over 20678.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1418, cr_loss=0.3674, over 4109225.25 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:46:55,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=842010.0, ans=0.05 2024-09-19 00:47:42,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=842095.0, ans=0.125 2024-09-19 00:47:49,492 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.237e+02 2.388e+02 2.562e+02 3.736e+02, threshold=4.775e+02, percent-clipped=0.0 2024-09-19 00:47:53,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=842123.3333333334, ans=0.125 2024-09-19 00:48:02,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842123.3333333334, ans=0.1 2024-09-19 00:48:04,948 INFO [train.py:1198] (0/2) Epoch 47, batch 3250, loss[loss=0.1992, ctc_loss=0.1296, cr_loss=0.3479, over 20981.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3663, over 4098173.90 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:48:34,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=842208.3333333334, ans=0.035 2024-09-19 00:48:40,422 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:49:15,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=22.5 2024-09-19 00:49:20,610 INFO [train.py:1198] (0/2) Epoch 47, batch 3300, loss[loss=0.2023, ctc_loss=0.1301, cr_loss=0.3607, over 20952.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1419, cr_loss=0.3672, over 4095093.81 frames. ], batch size: 50, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:49:30,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=842293.3333333334, ans=0.0 2024-09-19 00:49:38,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=22.5 2024-09-19 00:49:39,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.17 vs. limit=10.0 2024-09-19 00:49:54,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=842350.0, ans=0.0 2024-09-19 00:50:12,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842378.3333333334, ans=0.1 2024-09-19 00:50:24,652 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.249e+02 2.399e+02 2.589e+02 3.401e+02, threshold=4.799e+02, percent-clipped=0.0 2024-09-19 00:50:29,955 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-09-19 00:50:31,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=842406.6666666666, ans=0.125 2024-09-19 00:50:37,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=842406.6666666666, ans=0.0 2024-09-19 00:50:39,710 INFO [train.py:1198] (0/2) Epoch 47, batch 3350, loss[loss=0.2507, ctc_loss=0.1707, cr_loss=0.3997, over 18182.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1423, cr_loss=0.3685, over 4101511.51 frames. ], batch size: 108, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:50:41,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=842435.0, ans=0.0 2024-09-19 00:51:09,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=842491.6666666666, ans=0.2 2024-09-19 00:51:31,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=842520.0, ans=0.0 2024-09-19 00:51:36,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=842520.0, ans=0.025 2024-09-19 00:51:40,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842548.3333333334, ans=0.1 2024-09-19 00:51:52,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=842548.3333333334, ans=0.0 2024-09-19 00:51:55,390 INFO [train.py:1198] (0/2) Epoch 47, batch 3400, loss[loss=0.1924, ctc_loss=0.1261, cr_loss=0.3316, over 21025.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1424, cr_loss=0.3685, over 4103818.59 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:52:10,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=842605.0, ans=0.025 2024-09-19 00:52:39,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=842661.6666666666, ans=0.125 2024-09-19 00:52:48,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=842661.6666666666, ans=0.0 2024-09-19 00:52:56,043 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.239e+02 2.371e+02 2.561e+02 3.572e+02, threshold=4.742e+02, percent-clipped=0.0 2024-09-19 00:53:11,274 INFO [train.py:1198] (0/2) Epoch 47, batch 3450, loss[loss=0.1988, ctc_loss=0.1282, cr_loss=0.3531, over 20962.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3675, over 4101324.62 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:53:25,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=842718.3333333334, ans=0.0 2024-09-19 00:53:32,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=842746.6666666666, ans=10.0 2024-09-19 00:53:49,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=842775.0, ans=0.05 2024-09-19 00:53:58,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=842803.3333333334, ans=0.2 2024-09-19 00:54:29,737 INFO [train.py:1198] (0/2) Epoch 47, batch 3500, loss[loss=0.2301, ctc_loss=0.1523, cr_loss=0.3888, over 21006.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3674, over 4091815.16 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:54:35,373 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=8.0 2024-09-19 00:54:51,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842888.3333333334, ans=0.1 2024-09-19 00:55:15,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=842945.0, ans=0.0 2024-09-19 00:55:24,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=842945.0, ans=0.125 2024-09-19 00:55:29,948 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.060e+02 2.306e+02 2.433e+02 2.594e+02 5.603e+02, threshold=4.866e+02, percent-clipped=1.0 2024-09-19 00:55:45,193 INFO [train.py:1198] (0/2) Epoch 47, batch 3550, loss[loss=0.2253, ctc_loss=0.1486, cr_loss=0.3837, over 19889.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1423, cr_loss=0.3681, over 4098147.64 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:55:45,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=843001.6666666666, ans=0.0 2024-09-19 00:56:07,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=843030.0, ans=0.2 2024-09-19 00:56:10,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=843030.0, ans=0.125 2024-09-19 00:56:27,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=843058.3333333334, ans=0.125 2024-09-19 00:56:41,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=843086.6666666666, ans=0.1 2024-09-19 00:56:43,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.31 vs. limit=15.0 2024-09-19 00:56:52,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-09-19 00:56:53,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=843115.0, ans=0.125 2024-09-19 00:57:03,824 INFO [train.py:1198] (0/2) Epoch 47, batch 3600, loss[loss=0.2177, ctc_loss=0.144, cr_loss=0.3684, over 20822.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1424, cr_loss=0.3689, over 4111327.62 frames. ], batch size: 65, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:57:13,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=843143.3333333334, ans=0.0 2024-09-19 00:57:14,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=843143.3333333334, ans=0.125 2024-09-19 00:57:26,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.91 vs. limit=10.0 2024-09-19 00:57:36,570 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-09-19 00:58:04,351 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.279e+02 2.386e+02 2.544e+02 3.394e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-19 00:58:15,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=843256.6666666666, ans=0.0 2024-09-19 00:58:18,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.99 vs. limit=12.0 2024-09-19 00:58:19,349 INFO [train.py:1198] (0/2) Epoch 47, batch 3650, loss[loss=0.2277, ctc_loss=0.1532, cr_loss=0.3721, over 21051.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1428, cr_loss=0.3694, over 4099009.61 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:58:53,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=843341.6666666666, ans=0.125 2024-09-19 00:59:38,732 INFO [train.py:1198] (0/2) Epoch 47, batch 3700, loss[loss=0.2008, ctc_loss=0.1303, cr_loss=0.3526, over 20780.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.3683, over 4087368.33 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 00:59:40,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843426.6666666666, ans=0.1 2024-09-19 01:00:16,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=843483.3333333334, ans=0.125 2024-09-19 01:00:16,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=843483.3333333334, ans=0.0 2024-09-19 01:00:28,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=843511.6666666666, ans=0.035 2024-09-19 01:00:34,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=843511.6666666666, ans=0.2 2024-09-19 01:00:39,006 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.298e+02 2.401e+02 2.614e+02 4.058e+02, threshold=4.801e+02, percent-clipped=0.0 2024-09-19 01:00:39,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=22.5 2024-09-19 01:00:48,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=843540.0, ans=0.125 2024-09-19 01:00:54,019 INFO [train.py:1198] (0/2) Epoch 47, batch 3750, loss[loss=0.2119, ctc_loss=0.1401, cr_loss=0.359, over 20941.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.368, over 4083245.59 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:01:26,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=843625.0, ans=0.125 2024-09-19 01:01:46,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=843653.3333333334, ans=0.2 2024-09-19 01:01:48,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=843653.3333333334, ans=0.025 2024-09-19 01:01:57,573 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:02:09,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=843681.6666666666, ans=0.125 2024-09-19 01:02:11,801 INFO [train.py:1198] (0/2) Epoch 47, batch 3800, loss[loss=0.2218, ctc_loss=0.1473, cr_loss=0.3729, over 21060.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3689, over 4090502.37 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:02:15,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=843710.0, ans=0.125 2024-09-19 01:02:22,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=843710.0, ans=0.0 2024-09-19 01:02:24,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=843710.0, ans=0.2 2024-09-19 01:02:41,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843766.6666666666, ans=0.1 2024-09-19 01:02:41,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-19 01:02:50,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=843766.6666666666, ans=0.125 2024-09-19 01:03:12,119 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.050e+02 2.250e+02 2.380e+02 2.563e+02 3.540e+02, threshold=4.760e+02, percent-clipped=0.0 2024-09-19 01:03:13,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=843823.3333333334, ans=0.125 2024-09-19 01:03:21,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=843823.3333333334, ans=0.125 2024-09-19 01:03:24,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2024-09-19 01:03:26,887 INFO [train.py:1198] (0/2) Epoch 47, batch 3850, loss[loss=0.1871, ctc_loss=0.1207, cr_loss=0.3321, over 20952.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1434, cr_loss=0.3692, over 4086701.48 frames. ], batch size: 49, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:03:39,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=843851.6666666666, ans=0.125 2024-09-19 01:03:43,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.53 vs. limit=15.0 2024-09-19 01:04:45,667 INFO [train.py:1198] (0/2) Epoch 47, batch 3900, loss[loss=0.1769, ctc_loss=0.113, cr_loss=0.3197, over 20956.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1429, cr_loss=0.3677, over 4085751.05 frames. ], batch size: 48, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:04:46,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-09-19 01:05:45,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 2.300e+02 2.390e+02 2.560e+02 4.831e+02, threshold=4.780e+02, percent-clipped=1.0 2024-09-19 01:05:50,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=844106.6666666666, ans=0.0 2024-09-19 01:05:55,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=844106.6666666666, ans=0.125 2024-09-19 01:06:00,680 INFO [train.py:1198] (0/2) Epoch 47, batch 3950, loss[loss=0.2216, ctc_loss=0.1483, cr_loss=0.3665, over 20977.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1434, cr_loss=0.369, over 4087201.79 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:06:01,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=844135.0, ans=0.0 2024-09-19 01:06:02,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.61 vs. limit=10.0 2024-09-19 01:06:17,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=844163.3333333334, ans=0.0 2024-09-19 01:06:26,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=844163.3333333334, ans=0.125 2024-09-19 01:06:44,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=844220.0, ans=0.125 2024-09-19 01:07:05,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=844248.3333333334, ans=0.125 2024-09-19 01:07:15,620 INFO [train.py:1198] (0/2) Epoch 47, batch 4000, loss[loss=0.1976, ctc_loss=0.1301, cr_loss=0.3374, over 20814.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3689, over 4098195.44 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:07:35,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=844305.0, ans=0.0 2024-09-19 01:08:06,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=844361.6666666666, ans=0.0 2024-09-19 01:08:10,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844361.6666666666, ans=0.1 2024-09-19 01:08:13,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=844361.6666666666, ans=0.05 2024-09-19 01:08:19,278 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.285e+02 2.372e+02 2.493e+02 3.081e+02, threshold=4.743e+02, percent-clipped=0.0 2024-09-19 01:08:31,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=844390.0, ans=0.0 2024-09-19 01:08:34,374 INFO [train.py:1198] (0/2) Epoch 47, batch 4050, loss[loss=0.2209, ctc_loss=0.1461, cr_loss=0.3744, over 20835.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3681, over 4091685.51 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 64.0 2024-09-19 01:09:01,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-19 01:09:05,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=844475.0, ans=0.0 2024-09-19 01:09:19,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=844503.3333333334, ans=0.125 2024-09-19 01:09:37,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844531.6666666666, ans=0.1 2024-09-19 01:09:49,389 INFO [train.py:1198] (0/2) Epoch 47, batch 4100, loss[loss=0.225, ctc_loss=0.1477, cr_loss=0.3867, over 20756.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.368, over 4095237.11 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:10:01,771 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:10:13,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=844588.3333333334, ans=0.04949747468305833 2024-09-19 01:10:21,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=844616.6666666666, ans=0.125 2024-09-19 01:10:48,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=844645.0, ans=0.0 2024-09-19 01:10:54,385 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.254e+02 2.350e+02 2.481e+02 3.420e+02, threshold=4.699e+02, percent-clipped=0.0 2024-09-19 01:11:07,751 INFO [train.py:1198] (0/2) Epoch 47, batch 4150, loss[loss=0.2069, ctc_loss=0.1331, cr_loss=0.369, over 20760.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3677, over 4096383.72 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:11:09,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=844701.6666666666, ans=0.025 2024-09-19 01:12:05,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=844786.6666666666, ans=0.0 2024-09-19 01:12:09,037 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.80 vs. limit=10.0 2024-09-19 01:12:23,357 INFO [train.py:1198] (0/2) Epoch 47, batch 4200, loss[loss=0.2318, ctc_loss=0.152, cr_loss=0.3992, over 20954.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.3677, over 4100733.09 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:12:49,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=844871.6666666666, ans=0.125 2024-09-19 01:12:55,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=22.5 2024-09-19 01:13:00,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=844900.0, ans=0.125 2024-09-19 01:13:07,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=844900.0, ans=0.025 2024-09-19 01:13:13,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=844928.3333333334, ans=0.125 2024-09-19 01:13:28,199 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.299e+02 2.454e+02 2.631e+02 3.444e+02, threshold=4.907e+02, percent-clipped=0.0 2024-09-19 01:13:41,571 INFO [train.py:1198] (0/2) Epoch 47, batch 4250, loss[loss=0.197, ctc_loss=0.1287, cr_loss=0.3414, over 20998.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3677, over 4091635.99 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:13:55,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=845013.3333333334, ans=0.1 2024-09-19 01:14:15,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=845041.6666666666, ans=0.125 2024-09-19 01:14:24,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=12.0 2024-09-19 01:14:57,677 INFO [train.py:1198] (0/2) Epoch 47, batch 4300, loss[loss=0.203, ctc_loss=0.1331, cr_loss=0.3495, over 20966.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3676, over 4094465.12 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:15:11,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=845155.0, ans=0.1 2024-09-19 01:15:12,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.49 vs. limit=12.0 2024-09-19 01:15:23,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=845155.0, ans=0.0 2024-09-19 01:15:34,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=845183.3333333334, ans=0.125 2024-09-19 01:15:46,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=845211.6666666666, ans=0.0 2024-09-19 01:15:47,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=845211.6666666666, ans=0.0 2024-09-19 01:15:53,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=845211.6666666666, ans=0.035 2024-09-19 01:15:54,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=845211.6666666666, ans=0.1 2024-09-19 01:16:02,029 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.329e+02 2.471e+02 2.593e+02 4.269e+02, threshold=4.941e+02, percent-clipped=0.0 2024-09-19 01:16:13,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=845240.0, ans=0.2 2024-09-19 01:16:15,565 INFO [train.py:1198] (0/2) Epoch 47, batch 4350, loss[loss=0.268, ctc_loss=0.1821, cr_loss=0.4291, over 18235.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1437, cr_loss=0.3691, over 4090802.45 frames. ], batch size: 108, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:16:32,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=845296.6666666666, ans=0.0 2024-09-19 01:16:43,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=845296.6666666666, ans=15.0 2024-09-19 01:16:53,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=845325.0, ans=0.0 2024-09-19 01:17:06,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.17 vs. limit=10.0 2024-09-19 01:17:13,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=845353.3333333334, ans=0.0 2024-09-19 01:17:19,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=845381.6666666666, ans=0.07 2024-09-19 01:17:19,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=845381.6666666666, ans=0.125 2024-09-19 01:17:31,439 INFO [train.py:1198] (0/2) Epoch 47, batch 4400, loss[loss=0.2634, ctc_loss=0.178, cr_loss=0.4273, over 18608.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1442, cr_loss=0.3704, over 4077509.24 frames. ], batch size: 108, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:17:36,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=845410.0, ans=0.0 2024-09-19 01:17:44,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.49 vs. limit=15.0 2024-09-19 01:17:54,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=845438.3333333334, ans=0.125 2024-09-19 01:17:56,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=845438.3333333334, ans=0.09899494936611666 2024-09-19 01:18:24,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=845495.0, ans=0.125 2024-09-19 01:18:26,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=845495.0, ans=0.0 2024-09-19 01:18:33,596 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.273e+02 2.423e+02 2.576e+02 6.541e+02, threshold=4.846e+02, percent-clipped=2.0 2024-09-19 01:18:40,206 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2024-09-19 01:18:50,294 INFO [train.py:1198] (0/2) Epoch 47, batch 4450, loss[loss=0.1948, ctc_loss=0.1241, cr_loss=0.3534, over 20766.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1432, cr_loss=0.3694, over 4093907.10 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:18:52,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-19 01:19:41,376 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=15.0 2024-09-19 01:20:05,594 INFO [train.py:1198] (0/2) Epoch 47, batch 4500, loss[loss=0.207, ctc_loss=0.1351, cr_loss=0.3595, over 20873.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1425, cr_loss=0.3678, over 4082758.59 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:20:10,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=845693.3333333334, ans=0.125 2024-09-19 01:20:31,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=845721.6666666666, ans=0.125 2024-09-19 01:21:07,445 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.280e+02 2.435e+02 2.608e+02 3.118e+02, threshold=4.870e+02, percent-clipped=0.0 2024-09-19 01:21:21,241 INFO [train.py:1198] (0/2) Epoch 47, batch 4550, loss[loss=0.1848, ctc_loss=0.1197, cr_loss=0.3258, over 21001.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1418, cr_loss=0.3663, over 4081804.42 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:21:50,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=845863.3333333334, ans=0.125 2024-09-19 01:22:31,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.25 vs. limit=15.0 2024-09-19 01:22:39,659 INFO [train.py:1198] (0/2) Epoch 47, batch 4600, loss[loss=0.2111, ctc_loss=0.1397, cr_loss=0.3573, over 21013.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1419, cr_loss=0.3659, over 4077760.24 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:22:56,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=846005.0, ans=0.125 2024-09-19 01:23:07,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=846005.0, ans=0.025 2024-09-19 01:23:12,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=846033.3333333334, ans=0.125 2024-09-19 01:23:24,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=846061.6666666666, ans=0.125 2024-09-19 01:23:42,303 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.276e+02 2.409e+02 2.600e+02 4.253e+02, threshold=4.818e+02, percent-clipped=0.0 2024-09-19 01:23:55,987 INFO [train.py:1198] (0/2) Epoch 47, batch 4650, loss[loss=0.2409, ctc_loss=0.1616, cr_loss=0.3963, over 20670.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1414, cr_loss=0.3656, over 4085227.52 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:24:11,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=846146.6666666666, ans=0.0 2024-09-19 01:24:13,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-09-19 01:24:26,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=846146.6666666666, ans=0.2 2024-09-19 01:24:36,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=846175.0, ans=0.125 2024-09-19 01:25:14,015 INFO [train.py:1198] (0/2) Epoch 47, batch 4700, loss[loss=0.1989, ctc_loss=0.1315, cr_loss=0.337, over 20967.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1409, cr_loss=0.3652, over 4096015.19 frames. ], batch size: 48, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:25:34,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=846288.3333333334, ans=0.125 2024-09-19 01:25:48,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=846316.6666666666, ans=0.0 2024-09-19 01:25:52,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=846316.6666666666, ans=0.0 2024-09-19 01:26:15,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=846373.3333333334, ans=0.125 2024-09-19 01:26:16,883 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.278e+02 2.423e+02 2.538e+02 3.189e+02, threshold=4.845e+02, percent-clipped=0.0 2024-09-19 01:26:30,222 INFO [train.py:1198] (0/2) Epoch 47, batch 4750, loss[loss=0.2474, ctc_loss=0.1639, cr_loss=0.4176, over 20965.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1408, cr_loss=0.3651, over 4099583.45 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:26:31,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=846401.6666666666, ans=0.125 2024-09-19 01:27:47,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=846543.3333333334, ans=0.125 2024-09-19 01:27:48,515 INFO [train.py:1198] (0/2) Epoch 47, batch 4800, loss[loss=0.2006, ctc_loss=0.1305, cr_loss=0.3509, over 21030.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1414, cr_loss=0.3662, over 4090201.28 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:27:52,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-19 01:28:41,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=846628.3333333334, ans=0.2 2024-09-19 01:28:50,662 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.264e+02 2.373e+02 2.525e+02 3.424e+02, threshold=4.745e+02, percent-clipped=0.0 2024-09-19 01:29:04,103 INFO [train.py:1198] (0/2) Epoch 47, batch 4850, loss[loss=0.2193, ctc_loss=0.1439, cr_loss=0.3774, over 20954.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1418, cr_loss=0.3669, over 4090976.76 frames. ], batch size: 50, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:29:06,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=846685.0, ans=0.2 2024-09-19 01:29:12,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=846685.0, ans=0.0 2024-09-19 01:29:33,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=846741.6666666666, ans=0.95 2024-09-19 01:30:22,376 INFO [train.py:1198] (0/2) Epoch 47, batch 4900, loss[loss=0.2018, ctc_loss=0.1306, cr_loss=0.3557, over 20935.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.3679, over 4088676.58 frames. ], batch size: 49, lr: 1.75e-03, grad_scale: 32.0 2024-09-19 01:30:51,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=846883.3333333334, ans=0.0 2024-09-19 01:31:13,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=846911.6666666666, ans=0.125 2024-09-19 01:31:23,344 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.275e+02 2.390e+02 2.530e+02 4.993e+02, threshold=4.779e+02, percent-clipped=1.0 2024-09-19 01:31:36,833 INFO [train.py:1198] (0/2) Epoch 47, batch 4950, loss[loss=0.2517, ctc_loss=0.1645, cr_loss=0.436, over 20959.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1427, cr_loss=0.3685, over 4096261.90 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:31:38,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=846968.3333333334, ans=0.125 2024-09-19 01:31:38,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=846968.3333333334, ans=0.0 2024-09-19 01:32:07,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-09-19 01:32:08,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=847025.0, ans=0.125 2024-09-19 01:32:21,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=847053.3333333334, ans=0.2 2024-09-19 01:32:50,710 INFO [train.py:1198] (0/2) Epoch 47, batch 5000, loss[loss=0.2165, ctc_loss=0.1411, cr_loss=0.377, over 20972.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3682, over 4086678.13 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:32:58,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=847110.0, ans=0.125 2024-09-19 01:33:14,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-19 01:33:51,609 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.274e+02 2.407e+02 2.593e+02 3.251e+02, threshold=4.813e+02, percent-clipped=0.0 2024-09-19 01:34:04,902 INFO [train.py:1198] (0/2) Epoch 47, batch 5050, loss[loss=0.2432, ctc_loss=0.1652, cr_loss=0.3899, over 19414.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3678, over 4091526.63 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:34:14,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=847251.6666666666, ans=0.0 2024-09-19 01:34:31,684 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:34:47,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=847336.6666666666, ans=0.0 2024-09-19 01:34:55,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=847336.6666666666, ans=0.125 2024-09-19 01:34:59,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=847336.6666666666, ans=0.0 2024-09-19 01:35:17,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-09-19 01:35:21,091 INFO [train.py:1198] (0/2) Epoch 47, batch 5100, loss[loss=0.2484, ctc_loss=0.1615, cr_loss=0.4346, over 20665.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3683, over 4092252.73 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:35:40,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=847421.6666666666, ans=0.0 2024-09-19 01:35:45,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=847421.6666666666, ans=0.05 2024-09-19 01:36:21,746 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.293e+02 2.411e+02 2.627e+02 4.162e+02, threshold=4.822e+02, percent-clipped=0.0 2024-09-19 01:36:35,169 INFO [train.py:1198] (0/2) Epoch 47, batch 5150, loss[loss=0.2345, ctc_loss=0.1585, cr_loss=0.3797, over 20848.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3687, over 4083324.39 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:36:45,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=847535.0, ans=0.125 2024-09-19 01:36:56,919 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.65 vs. limit=6.0 2024-09-19 01:37:23,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=847620.0, ans=0.125 2024-09-19 01:37:49,773 INFO [train.py:1198] (0/2) Epoch 47, batch 5200, loss[loss=0.219, ctc_loss=0.1447, cr_loss=0.3715, over 20935.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1426, cr_loss=0.3686, over 4091197.43 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:38:20,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=847733.3333333334, ans=0.125 2024-09-19 01:38:27,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=847733.3333333334, ans=0.0 2024-09-19 01:38:32,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=847733.3333333334, ans=0.125 2024-09-19 01:38:34,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2024-09-19 01:38:51,259 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.265e+02 2.390e+02 2.554e+02 3.857e+02, threshold=4.779e+02, percent-clipped=0.0 2024-09-19 01:39:03,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-09-19 01:39:04,374 INFO [train.py:1198] (0/2) Epoch 47, batch 5250, loss[loss=0.2435, ctc_loss=0.1634, cr_loss=0.4002, over 19646.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3674, over 4089603.52 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:39:07,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=847818.3333333334, ans=0.125 2024-09-19 01:39:18,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847846.6666666666, ans=0.1 2024-09-19 01:39:28,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=847846.6666666666, ans=0.2 2024-09-19 01:39:41,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=847875.0, ans=0.0 2024-09-19 01:40:02,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2024-09-19 01:40:20,743 INFO [train.py:1198] (0/2) Epoch 47, batch 5300, loss[loss=0.2402, ctc_loss=0.1598, cr_loss=0.4016, over 20681.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1414, cr_loss=0.3661, over 4089296.40 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:40:31,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=847960.0, ans=0.125 2024-09-19 01:41:00,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=848016.6666666666, ans=0.125 2024-09-19 01:41:22,478 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.281e+02 2.400e+02 2.592e+02 4.852e+02, threshold=4.800e+02, percent-clipped=1.0 2024-09-19 01:41:31,783 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:41:36,064 INFO [train.py:1198] (0/2) Epoch 47, batch 5350, loss[loss=0.2312, ctc_loss=0.1528, cr_loss=0.3919, over 20867.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1406, cr_loss=0.3656, over 4105364.09 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:41:46,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=848101.6666666666, ans=0.125 2024-09-19 01:41:50,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-09-19 01:42:23,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=848186.6666666666, ans=0.1 2024-09-19 01:42:34,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=848215.0, ans=0.0 2024-09-19 01:42:47,670 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=12.0 2024-09-19 01:42:49,843 INFO [train.py:1198] (0/2) Epoch 47, batch 5400, loss[loss=0.2492, ctc_loss=0.1669, cr_loss=0.4114, over 20442.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.367, over 4102639.42 frames. ], batch size: 74, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:42:54,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=848243.3333333334, ans=0.2 2024-09-19 01:43:12,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=848271.6666666666, ans=0.95 2024-09-19 01:43:36,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=848328.3333333334, ans=0.025 2024-09-19 01:43:46,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=848328.3333333334, ans=10.0 2024-09-19 01:43:48,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=848356.6666666666, ans=0.125 2024-09-19 01:43:50,972 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.247e+02 2.380e+02 2.538e+02 4.427e+02, threshold=4.760e+02, percent-clipped=0.0 2024-09-19 01:44:02,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=22.5 2024-09-19 01:44:04,423 INFO [train.py:1198] (0/2) Epoch 47, batch 5450, loss[loss=0.2255, ctc_loss=0.1532, cr_loss=0.3614, over 20675.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1429, cr_loss=0.3688, over 4093820.14 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:44:16,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=848385.0, ans=0.05 2024-09-19 01:44:41,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.56 vs. limit=10.0 2024-09-19 01:45:20,910 INFO [train.py:1198] (0/2) Epoch 47, batch 5500, loss[loss=0.2362, ctc_loss=0.1572, cr_loss=0.3948, over 20980.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1428, cr_loss=0.3686, over 4090010.19 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2024-09-19 01:45:21,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=848526.6666666666, ans=0.125 2024-09-19 01:45:22,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=848526.6666666666, ans=0.125 2024-09-19 01:46:05,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=848611.6666666666, ans=0.0 2024-09-19 01:46:08,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=848611.6666666666, ans=0.125 2024-09-19 01:46:12,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-09-19 01:46:22,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=848640.0, ans=0.2 2024-09-19 01:46:23,384 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.326e+02 2.476e+02 2.674e+02 5.105e+02, threshold=4.951e+02, percent-clipped=2.0 2024-09-19 01:46:30,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=848640.0, ans=0.125 2024-09-19 01:46:32,415 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:46:35,185 INFO [train.py:1198] (0/2) Epoch 47, batch 5550, loss[loss=0.2167, ctc_loss=0.1424, cr_loss=0.3711, over 21026.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1425, cr_loss=0.3681, over 4085535.57 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2024-09-19 01:46:41,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=848668.3333333334, ans=0.2 2024-09-19 01:46:51,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=848696.6666666666, ans=0.125 2024-09-19 01:47:09,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848725.0, ans=0.1 2024-09-19 01:47:25,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=848753.3333333334, ans=0.0 2024-09-19 01:47:29,344 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=22.5 2024-09-19 01:47:33,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=848781.6666666666, ans=0.125 2024-09-19 01:47:37,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=848781.6666666666, ans=0.2 2024-09-19 01:47:49,446 INFO [train.py:1198] (0/2) Epoch 47, batch 5600, loss[loss=0.216, ctc_loss=0.1408, cr_loss=0.3763, over 21067.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1418, cr_loss=0.3673, over 4092023.52 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:48:00,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=848810.0, ans=0.1 2024-09-19 01:48:10,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848838.3333333334, ans=0.1 2024-09-19 01:48:10,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=848838.3333333334, ans=0.1 2024-09-19 01:48:40,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=848895.0, ans=0.0 2024-09-19 01:48:51,840 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.281e+02 2.404e+02 2.553e+02 6.736e+02, threshold=4.807e+02, percent-clipped=1.0 2024-09-19 01:49:05,835 INFO [train.py:1198] (0/2) Epoch 47, batch 5650, loss[loss=0.2155, ctc_loss=0.1438, cr_loss=0.3585, over 20664.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1418, cr_loss=0.3679, over 4084074.15 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:49:09,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=22.5 2024-09-19 01:49:45,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=22.5 2024-09-19 01:50:13,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=849065.0, ans=0.125 2024-09-19 01:50:19,628 INFO [train.py:1198] (0/2) Epoch 47, batch 5700, loss[loss=0.2295, ctc_loss=0.1512, cr_loss=0.3912, over 20655.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3675, over 4092352.70 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:50:22,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.65 vs. limit=10.0 2024-09-19 01:50:33,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=849121.6666666666, ans=0.0 2024-09-19 01:50:52,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=849150.0, ans=0.125 2024-09-19 01:50:59,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=849150.0, ans=0.125 2024-09-19 01:51:11,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=849178.3333333334, ans=0.2 2024-09-19 01:51:16,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=849178.3333333334, ans=0.125 2024-09-19 01:51:21,375 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.257e+02 2.398e+02 2.543e+02 4.347e+02, threshold=4.797e+02, percent-clipped=0.0 2024-09-19 01:51:33,125 INFO [train.py:1198] (0/2) Epoch 47, batch 5750, loss[loss=0.216, ctc_loss=0.1451, cr_loss=0.3546, over 20826.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1429, cr_loss=0.369, over 4087046.88 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:52:47,317 INFO [train.py:1198] (0/2) Epoch 47, batch 5800, loss[loss=0.2162, ctc_loss=0.1392, cr_loss=0.3851, over 21038.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3683, over 4098132.76 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:53:53,493 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.270e+02 2.397e+02 2.536e+02 3.007e+02, threshold=4.793e+02, percent-clipped=0.0 2024-09-19 01:54:03,704 INFO [train.py:1198] (0/2) Epoch 47, batch 5850, loss[loss=0.2057, ctc_loss=0.1361, cr_loss=0.3479, over 20882.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.369, over 4102781.46 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2024-09-19 01:54:14,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=849518.3333333334, ans=0.125 2024-09-19 01:54:21,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=849546.6666666666, ans=0.0 2024-09-19 01:54:33,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2024-09-19 01:54:35,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=849575.0, ans=0.2 2024-09-19 01:55:05,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=849631.6666666666, ans=0.125 2024-09-19 01:55:18,629 INFO [train.py:1198] (0/2) Epoch 47, batch 5900, loss[loss=0.2218, ctc_loss=0.1478, cr_loss=0.3704, over 20861.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3674, over 4084421.42 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2024-09-19 01:55:48,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=849716.6666666666, ans=0.125 2024-09-19 01:55:51,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=849716.6666666666, ans=0.125 2024-09-19 01:55:55,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-19 01:56:02,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=849745.0, ans=0.125 2024-09-19 01:56:22,870 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.054e+02 2.249e+02 2.392e+02 2.565e+02 5.866e+02, threshold=4.784e+02, percent-clipped=1.0 2024-09-19 01:56:33,248 INFO [train.py:1198] (0/2) Epoch 47, batch 5950, loss[loss=0.2178, ctc_loss=0.1407, cr_loss=0.3851, over 20767.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3673, over 4094564.32 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2024-09-19 01:56:37,043 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2024-09-19 01:56:40,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=849801.6666666666, ans=0.0 2024-09-19 01:57:00,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=849830.0, ans=0.125 2024-09-19 01:57:20,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=849886.6666666666, ans=0.0 2024-09-19 01:57:48,921 INFO [train.py:1198] (0/2) Epoch 47, batch 6000, loss[loss=0.2367, ctc_loss=0.1581, cr_loss=0.393, over 21009.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1419, cr_loss=0.3666, over 4102749.33 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:57:48,922 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 01:58:08,160 INFO [train.py:1230] (0/2) Epoch 47, validation: loss=0.03884, ctc_loss=0.03884, cr_loss=1.564e-14, over 944034.00 frames. 2024-09-19 01:58:08,160 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-19 01:58:36,095 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-300000.pt 2024-09-19 01:59:12,437 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.238e+02 2.392e+02 2.505e+02 3.423e+02, threshold=4.784e+02, percent-clipped=0.0 2024-09-19 01:59:13,140 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-19 01:59:23,725 INFO [train.py:1198] (0/2) Epoch 47, batch 6050, loss[loss=0.2213, ctc_loss=0.1457, cr_loss=0.3778, over 21016.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1417, cr_loss=0.3664, over 4110113.91 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 01:59:28,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=850085.0, ans=0.125 2024-09-19 01:59:34,385 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:59:51,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=850141.6666666666, ans=0.125 2024-09-19 02:00:05,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=850141.6666666666, ans=0.125 2024-09-19 02:00:35,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=850198.3333333334, ans=0.025 2024-09-19 02:00:37,807 INFO [train.py:1198] (0/2) Epoch 47, batch 6100, loss[loss=0.2461, ctc_loss=0.1705, cr_loss=0.3778, over 14574.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3663, over 4088466.77 frames. ], batch size: 149, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:01:07,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=850283.3333333334, ans=0.0 2024-09-19 02:01:42,912 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.021e+02 2.282e+02 2.395e+02 2.596e+02 4.212e+02, threshold=4.790e+02, percent-clipped=0.0 2024-09-19 02:01:53,492 INFO [train.py:1198] (0/2) Epoch 47, batch 6150, loss[loss=0.2199, ctc_loss=0.1441, cr_loss=0.3789, over 20986.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1418, cr_loss=0.3663, over 4069547.07 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:02:04,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=850368.3333333334, ans=0.0 2024-09-19 02:02:13,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2024-09-19 02:02:32,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=850425.0, ans=0.0 2024-09-19 02:03:08,091 INFO [train.py:1198] (0/2) Epoch 47, batch 6200, loss[loss=0.238, ctc_loss=0.1628, cr_loss=0.3758, over 14700.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1422, cr_loss=0.3676, over 4066352.54 frames. ], batch size: 150, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:03:13,582 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2024-09-19 02:03:23,485 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:03:34,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2024-09-19 02:03:40,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2024-09-19 02:03:56,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=850595.0, ans=0.125 2024-09-19 02:04:02,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=850595.0, ans=0.95 2024-09-19 02:04:12,284 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.267e+02 2.460e+02 2.715e+02 5.797e+02, threshold=4.919e+02, percent-clipped=2.0 2024-09-19 02:04:22,866 INFO [train.py:1198] (0/2) Epoch 47, batch 6250, loss[loss=0.2103, ctc_loss=0.1354, cr_loss=0.3744, over 20974.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1431, cr_loss=0.3687, over 4032533.44 frames. ], batch size: 50, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:05:27,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=12.0 2024-09-19 02:05:37,912 INFO [train.py:1198] (0/2) Epoch 47, batch 6300, loss[loss=0.2138, ctc_loss=0.1395, cr_loss=0.3716, over 20958.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1432, cr_loss=0.3679, over 4006306.61 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:05:50,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=850793.3333333334, ans=0.04949747468305833 2024-09-19 02:06:11,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=850850.0, ans=10.0 2024-09-19 02:06:13,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=850850.0, ans=0.125 2024-09-19 02:06:39,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=850906.6666666666, ans=0.2 2024-09-19 02:06:40,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.042e+02 2.351e+02 2.619e+02 2.877e+02 4.452e+02, threshold=5.237e+02, percent-clipped=0.0 2024-09-19 02:06:50,787 INFO [train.py:1198] (0/2) Epoch 47, batch 6350, loss[loss=0.2606, ctc_loss=0.1787, cr_loss=0.4096, over 14396.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1467, cr_loss=0.3704, over 3837295.28 frames. ], batch size: 149, lr: 1.74e-03, grad_scale: 32.0 2024-09-19 02:06:58,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=850935.0, ans=0.125 2024-09-19 02:07:12,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=850963.3333333334, ans=0.0 2024-09-19 02:07:21,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=850991.6666666666, ans=0.125 2024-09-19 02:07:34,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.79 vs. limit=15.0 2024-09-19 02:07:39,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=851020.0, ans=0.1 2024-09-19 02:07:49,110 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-47.pt 2024-09-19 02:08:38,364 INFO [train.py:1198] (0/2) Epoch 48, batch 0, loss[loss=0.2383, ctc_loss=0.1594, cr_loss=0.3945, over 21050.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1594, cr_loss=0.3945, over 21050.00 frames. ], batch size: 62, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:08:38,365 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 02:08:56,382 INFO [train.py:1230] (0/2) Epoch 48, validation: loss=0.03874, ctc_loss=0.03874, cr_loss=1.596e-14, over 944034.00 frames. 2024-09-19 02:08:56,383 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-19 02:09:20,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=851079.5, ans=0.125 2024-09-19 02:09:35,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=851107.8333333334, ans=0.125 2024-09-19 02:10:09,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=851164.5, ans=0.125 2024-09-19 02:10:12,175 INFO [train.py:1198] (0/2) Epoch 48, batch 50, loss[loss=0.1934, ctc_loss=0.1274, cr_loss=0.3297, over 20973.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.369, over 916772.11 frames. ], batch size: 64, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:10:16,836 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.313e+02 2.640e+02 2.936e+02 3.540e+02, threshold=5.280e+02, percent-clipped=0.0 2024-09-19 02:10:21,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=851192.8333333334, ans=0.2 2024-09-19 02:11:00,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=851277.8333333334, ans=0.025 2024-09-19 02:11:05,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.30 vs. limit=15.0 2024-09-19 02:11:06,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=851277.8333333334, ans=0.125 2024-09-19 02:11:27,138 INFO [train.py:1198] (0/2) Epoch 48, batch 100, loss[loss=0.1683, ctc_loss=0.1074, cr_loss=0.3045, over 21009.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1424, cr_loss=0.3699, over 1618429.39 frames. ], batch size: 52, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:11:28,398 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-09-19 02:11:37,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2024-09-19 02:12:06,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851391.1666666666, ans=0.1 2024-09-19 02:12:23,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=851419.5, ans=0.0 2024-09-19 02:12:23,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=851419.5, ans=0.125 2024-09-19 02:12:42,244 INFO [train.py:1198] (0/2) Epoch 48, batch 150, loss[loss=0.2217, ctc_loss=0.1469, cr_loss=0.3742, over 20838.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.3678, over 2172991.09 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:12:48,476 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.249e+02 2.369e+02 2.580e+02 1.034e+03, threshold=4.738e+02, percent-clipped=1.0 2024-09-19 02:12:50,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=851476.1666666666, ans=0.0 2024-09-19 02:12:58,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.91 vs. limit=15.0 2024-09-19 02:13:11,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=851504.5, ans=0.125 2024-09-19 02:13:39,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=851561.1666666666, ans=0.025 2024-09-19 02:14:03,671 INFO [train.py:1198] (0/2) Epoch 48, batch 200, loss[loss=0.2087, ctc_loss=0.1379, cr_loss=0.3538, over 20966.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3665, over 2594946.47 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:14:25,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=851646.1666666666, ans=0.125 2024-09-19 02:14:39,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.92 vs. limit=10.0 2024-09-19 02:14:41,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.06 vs. limit=6.0 2024-09-19 02:14:48,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=851702.8333333334, ans=0.0 2024-09-19 02:14:55,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=851702.8333333334, ans=0.125 2024-09-19 02:15:19,756 INFO [train.py:1198] (0/2) Epoch 48, batch 250, loss[loss=0.2237, ctc_loss=0.1479, cr_loss=0.3792, over 20978.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3675, over 2932399.01 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:15:25,828 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.256e+02 2.366e+02 2.529e+02 4.135e+02, threshold=4.732e+02, percent-clipped=0.0 2024-09-19 02:15:26,532 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-09-19 02:15:32,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=851759.5, ans=0.2 2024-09-19 02:15:47,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=851787.8333333334, ans=0.0 2024-09-19 02:16:01,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=851816.1666666666, ans=0.125 2024-09-19 02:16:04,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=15.0 2024-09-19 02:16:28,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=851872.8333333334, ans=0.0 2024-09-19 02:16:35,313 INFO [train.py:1198] (0/2) Epoch 48, batch 300, loss[loss=0.2284, ctc_loss=0.15, cr_loss=0.3923, over 20127.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3693, over 3199825.76 frames. ], batch size: 80, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:16:41,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=851901.1666666666, ans=0.125 2024-09-19 02:16:41,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=851901.1666666666, ans=0.2 2024-09-19 02:17:05,657 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:17:50,604 INFO [train.py:1198] (0/2) Epoch 48, batch 350, loss[loss=0.2065, ctc_loss=0.1355, cr_loss=0.3551, over 21044.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1406, cr_loss=0.3656, over 3392785.72 frames. ], batch size: 62, lr: 1.72e-03, grad_scale: 16.0 2024-09-19 02:17:56,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.073e+02 2.240e+02 2.381e+02 2.503e+02 4.082e+02, threshold=4.763e+02, percent-clipped=0.0 2024-09-19 02:18:00,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=22.5 2024-09-19 02:19:04,937 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:19:09,108 INFO [train.py:1198] (0/2) Epoch 48, batch 400, loss[loss=0.1991, ctc_loss=0.1309, cr_loss=0.3411, over 21032.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.14, cr_loss=0.3644, over 3553120.24 frames. ], batch size: 62, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:19:23,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=852212.8333333334, ans=0.125 2024-09-19 02:19:50,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-09-19 02:19:59,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=852269.5, ans=0.1 2024-09-19 02:20:18,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=852297.8333333334, ans=0.025 2024-09-19 02:20:27,569 INFO [train.py:1198] (0/2) Epoch 48, batch 450, loss[loss=0.2395, ctc_loss=0.1594, cr_loss=0.4006, over 20988.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1398, cr_loss=0.3639, over 3666263.27 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:20:33,451 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.278e+02 2.381e+02 2.515e+02 3.550e+02, threshold=4.761e+02, percent-clipped=0.0 2024-09-19 02:21:19,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=852411.1666666666, ans=0.2 2024-09-19 02:21:40,887 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=22.5 2024-09-19 02:21:43,053 INFO [train.py:1198] (0/2) Epoch 48, batch 500, loss[loss=0.189, ctc_loss=0.1255, cr_loss=0.3176, over 20991.00 frames. ], tot_loss[loss=0.2112, ctc_loss=0.1388, cr_loss=0.3623, over 3767306.31 frames. ], batch size: 48, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:21:59,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=852496.1666666666, ans=0.1 2024-09-19 02:22:31,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=852552.8333333334, ans=0.0 2024-09-19 02:22:32,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=852552.8333333334, ans=0.125 2024-09-19 02:22:58,198 INFO [train.py:1198] (0/2) Epoch 48, batch 550, loss[loss=0.2129, ctc_loss=0.1392, cr_loss=0.3684, over 20874.00 frames. ], tot_loss[loss=0.2103, ctc_loss=0.1381, cr_loss=0.3609, over 3849546.31 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:23:00,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=852609.5, ans=0.125 2024-09-19 02:23:04,145 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.882e+02 2.207e+02 2.361e+02 2.535e+02 5.293e+02, threshold=4.723e+02, percent-clipped=1.0 2024-09-19 02:23:04,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=852609.5, ans=0.2 2024-09-19 02:23:21,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=852637.8333333334, ans=0.0 2024-09-19 02:24:12,620 INFO [train.py:1198] (0/2) Epoch 48, batch 600, loss[loss=0.2549, ctc_loss=0.1676, cr_loss=0.4367, over 20665.00 frames. ], tot_loss[loss=0.2111, ctc_loss=0.1387, cr_loss=0.3621, over 3913886.44 frames. ], batch size: 66, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:24:26,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=852779.5, ans=0.0 2024-09-19 02:24:58,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=852807.8333333334, ans=0.0 2024-09-19 02:25:02,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=852836.1666666666, ans=0.0 2024-09-19 02:25:31,633 INFO [train.py:1198] (0/2) Epoch 48, batch 650, loss[loss=0.2225, ctc_loss=0.1485, cr_loss=0.3701, over 21027.00 frames. ], tot_loss[loss=0.2117, ctc_loss=0.1391, cr_loss=0.363, over 3959319.50 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:25:40,618 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 2.271e+02 2.376e+02 2.518e+02 3.814e+02, threshold=4.751e+02, percent-clipped=0.0 2024-09-19 02:26:18,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=22.5 2024-09-19 02:26:50,938 INFO [train.py:1198] (0/2) Epoch 48, batch 700, loss[loss=0.234, ctc_loss=0.1519, cr_loss=0.4105, over 20805.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1414, cr_loss=0.3676, over 3989553.02 frames. ], batch size: 65, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:27:44,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=853119.5, ans=0.025 2024-09-19 02:27:50,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2024-09-19 02:27:50,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=853147.8333333334, ans=22.5 2024-09-19 02:27:59,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=853147.8333333334, ans=0.125 2024-09-19 02:28:06,795 INFO [train.py:1198] (0/2) Epoch 48, batch 750, loss[loss=0.1984, ctc_loss=0.1286, cr_loss=0.3488, over 20956.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1408, cr_loss=0.3664, over 4023129.38 frames. ], batch size: 49, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:28:11,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=853176.1666666666, ans=0.125 2024-09-19 02:28:12,754 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.312e+02 2.445e+02 2.580e+02 3.410e+02, threshold=4.891e+02, percent-clipped=0.0 2024-09-19 02:28:28,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=853204.5, ans=0.125 2024-09-19 02:28:41,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=853232.8333333334, ans=0.0 2024-09-19 02:29:18,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=853289.5, ans=0.125 2024-09-19 02:29:21,650 INFO [train.py:1198] (0/2) Epoch 48, batch 800, loss[loss=0.2212, ctc_loss=0.1446, cr_loss=0.3827, over 21019.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1412, cr_loss=0.3668, over 4037242.87 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:29:29,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=853317.8333333334, ans=0.2 2024-09-19 02:29:45,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=853346.1666666666, ans=0.125 2024-09-19 02:29:50,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=853374.5, ans=0.2 2024-09-19 02:30:13,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=853402.8333333334, ans=0.0 2024-09-19 02:30:39,930 INFO [train.py:1198] (0/2) Epoch 48, batch 850, loss[loss=0.2104, ctc_loss=0.1384, cr_loss=0.3601, over 20666.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.3677, over 4045298.97 frames. ], batch size: 66, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:30:47,414 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.256e+02 2.398e+02 2.561e+02 3.164e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-19 02:31:20,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=15.0 2024-09-19 02:31:58,691 INFO [train.py:1198] (0/2) Epoch 48, batch 900, loss[loss=0.2194, ctc_loss=0.1447, cr_loss=0.3733, over 20979.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3668, over 4064352.47 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:32:26,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853629.5, ans=0.1 2024-09-19 02:32:31,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=853657.8333333334, ans=0.04949747468305833 2024-09-19 02:32:39,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=853657.8333333334, ans=0.125 2024-09-19 02:32:47,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=853686.1666666666, ans=0.0 2024-09-19 02:32:54,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=853686.1666666666, ans=0.0 2024-09-19 02:33:14,335 INFO [train.py:1198] (0/2) Epoch 48, batch 950, loss[loss=0.2338, ctc_loss=0.156, cr_loss=0.3891, over 20854.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3673, over 4080821.68 frames. ], batch size: 65, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:33:18,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=15.0 2024-09-19 02:33:21,821 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.987e+02 2.247e+02 2.334e+02 2.480e+02 3.525e+02, threshold=4.669e+02, percent-clipped=0.0 2024-09-19 02:33:35,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=853771.1666666666, ans=0.125 2024-09-19 02:34:16,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=853856.1666666666, ans=0.125 2024-09-19 02:34:19,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=853856.1666666666, ans=0.125 2024-09-19 02:34:20,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=853856.1666666666, ans=0.125 2024-09-19 02:34:26,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=853856.1666666666, ans=0.0 2024-09-19 02:34:29,488 INFO [train.py:1198] (0/2) Epoch 48, batch 1000, loss[loss=0.1904, ctc_loss=0.1217, cr_loss=0.3435, over 20996.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1422, cr_loss=0.3681, over 4080304.97 frames. ], batch size: 48, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:34:40,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=853884.5, ans=0.025 2024-09-19 02:34:41,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853884.5, ans=0.1 2024-09-19 02:35:12,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=853941.1666666666, ans=0.0 2024-09-19 02:35:33,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=853997.8333333334, ans=0.0 2024-09-19 02:35:42,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=853997.8333333334, ans=0.0 2024-09-19 02:35:42,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=853997.8333333334, ans=0.125 2024-09-19 02:35:45,330 INFO [train.py:1198] (0/2) Epoch 48, batch 1050, loss[loss=0.2438, ctc_loss=0.1636, cr_loss=0.4008, over 20674.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1414, cr_loss=0.3665, over 4090313.43 frames. ], batch size: 71, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:35:51,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=854026.1666666666, ans=0.125 2024-09-19 02:35:52,813 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.278e+02 2.387e+02 2.529e+02 3.134e+02, threshold=4.774e+02, percent-clipped=0.0 2024-09-19 02:36:30,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=854082.8333333334, ans=0.025 2024-09-19 02:36:43,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-19 02:37:00,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=854139.5, ans=0.125 2024-09-19 02:37:05,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=854167.8333333334, ans=0.125 2024-09-19 02:37:06,960 INFO [train.py:1198] (0/2) Epoch 48, batch 1100, loss[loss=0.2024, ctc_loss=0.1319, cr_loss=0.3523, over 20988.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3664, over 4080249.72 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:37:10,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=854167.8333333334, ans=0.0 2024-09-19 02:37:37,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=854224.5, ans=0.0 2024-09-19 02:37:37,991 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=8.0 2024-09-19 02:37:50,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=854252.8333333334, ans=0.125 2024-09-19 02:38:22,300 INFO [train.py:1198] (0/2) Epoch 48, batch 1150, loss[loss=0.1999, ctc_loss=0.133, cr_loss=0.3344, over 20890.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.3669, over 4072022.23 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:38:30,023 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.276e+02 2.373e+02 2.575e+02 3.097e+02, threshold=4.746e+02, percent-clipped=0.0 2024-09-19 02:38:39,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=854337.8333333334, ans=0.125 2024-09-19 02:38:42,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=854337.8333333334, ans=0.125 2024-09-19 02:38:52,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=854366.1666666666, ans=0.0 2024-09-19 02:39:21,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=854422.8333333334, ans=0.125 2024-09-19 02:39:37,980 INFO [train.py:1198] (0/2) Epoch 48, batch 1200, loss[loss=0.2685, ctc_loss=0.1868, cr_loss=0.4085, over 13837.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1416, cr_loss=0.3661, over 4083610.47 frames. ], batch size: 149, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:40:34,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=854536.1666666666, ans=0.125 2024-09-19 02:40:54,041 INFO [train.py:1198] (0/2) Epoch 48, batch 1250, loss[loss=0.2511, ctc_loss=0.1661, cr_loss=0.4251, over 21009.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1421, cr_loss=0.3671, over 4070563.88 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:41:01,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.279e+02 2.417e+02 2.564e+02 4.180e+02, threshold=4.834e+02, percent-clipped=0.0 2024-09-19 02:41:25,176 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-19 02:41:26,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=854649.5, ans=0.0 2024-09-19 02:41:26,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=854649.5, ans=0.0 2024-09-19 02:41:30,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=854649.5, ans=0.0 2024-09-19 02:41:34,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=854649.5, ans=15.0 2024-09-19 02:41:36,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=854649.5, ans=0.0 2024-09-19 02:42:12,464 INFO [train.py:1198] (0/2) Epoch 48, batch 1300, loss[loss=0.2107, ctc_loss=0.1359, cr_loss=0.3741, over 21020.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1413, cr_loss=0.3652, over 4067473.15 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:42:20,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=854734.5, ans=0.125 2024-09-19 02:42:32,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=854762.8333333334, ans=0.125 2024-09-19 02:42:33,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=854762.8333333334, ans=0.0 2024-09-19 02:42:56,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=854791.1666666666, ans=0.2 2024-09-19 02:43:04,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=854819.5, ans=0.125 2024-09-19 02:43:19,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=854847.8333333334, ans=0.125 2024-09-19 02:43:30,809 INFO [train.py:1198] (0/2) Epoch 48, batch 1350, loss[loss=0.1848, ctc_loss=0.1194, cr_loss=0.3271, over 20249.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1411, cr_loss=0.3652, over 4076489.49 frames. ], batch size: 45, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:43:35,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=854876.1666666666, ans=0.0 2024-09-19 02:43:38,443 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.258e+02 2.428e+02 2.599e+02 6.040e+02, threshold=4.857e+02, percent-clipped=1.0 2024-09-19 02:43:38,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=854876.1666666666, ans=0.09899494936611666 2024-09-19 02:43:40,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=854876.1666666666, ans=0.125 2024-09-19 02:44:09,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854932.8333333334, ans=0.1 2024-09-19 02:44:44,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-09-19 02:44:47,131 INFO [train.py:1198] (0/2) Epoch 48, batch 1400, loss[loss=0.2263, ctc_loss=0.1492, cr_loss=0.3857, over 20654.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.1398, cr_loss=0.3629, over 4091162.23 frames. ], batch size: 66, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:44:47,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=855017.8333333334, ans=0.025 2024-09-19 02:44:56,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=855017.8333333334, ans=0.0 2024-09-19 02:45:08,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=855046.1666666666, ans=0.0 2024-09-19 02:45:15,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=22.5 2024-09-19 02:45:30,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=855102.8333333334, ans=0.0 2024-09-19 02:45:34,746 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.84 vs. limit=10.0 2024-09-19 02:45:42,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=855102.8333333334, ans=0.125 2024-09-19 02:45:42,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=855102.8333333334, ans=0.125 2024-09-19 02:46:02,223 INFO [train.py:1198] (0/2) Epoch 48, batch 1450, loss[loss=0.2378, ctc_loss=0.1585, cr_loss=0.3962, over 20733.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.1401, cr_loss=0.3636, over 4093690.19 frames. ], batch size: 71, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:46:02,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=855159.5, ans=0.0 2024-09-19 02:46:04,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-19 02:46:05,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=855159.5, ans=0.035 2024-09-19 02:46:09,912 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.284e+02 2.416e+02 2.562e+02 3.091e+02, threshold=4.833e+02, percent-clipped=0.0 2024-09-19 02:46:32,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=855216.1666666666, ans=0.125 2024-09-19 02:47:07,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=855272.8333333334, ans=0.125 2024-09-19 02:47:17,317 INFO [train.py:1198] (0/2) Epoch 48, batch 1500, loss[loss=0.2115, ctc_loss=0.1383, cr_loss=0.3661, over 20986.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.1402, cr_loss=0.3636, over 4090608.81 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:47:28,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=855301.1666666666, ans=0.1 2024-09-19 02:48:31,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=855414.5, ans=0.2 2024-09-19 02:48:35,891 INFO [train.py:1198] (0/2) Epoch 48, batch 1550, loss[loss=0.2132, ctc_loss=0.141, cr_loss=0.3613, over 21029.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1409, cr_loss=0.3652, over 4098033.22 frames. ], batch size: 62, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:48:45,374 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=22.5 2024-09-19 02:48:46,136 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.222e+02 2.350e+02 2.459e+02 8.607e+02, threshold=4.700e+02, percent-clipped=1.0 2024-09-19 02:49:16,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=855499.5, ans=10.0 2024-09-19 02:49:21,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=855499.5, ans=0.0 2024-09-19 02:49:49,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=855556.1666666666, ans=0.125 2024-09-19 02:49:53,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=855584.5, ans=0.0 2024-09-19 02:49:54,376 INFO [train.py:1198] (0/2) Epoch 48, batch 1600, loss[loss=0.1966, ctc_loss=0.1304, cr_loss=0.3308, over 21061.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1402, cr_loss=0.364, over 4106438.41 frames. ], batch size: 53, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:50:09,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=855612.8333333334, ans=0.0 2024-09-19 02:50:58,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=855697.8333333334, ans=0.125 2024-09-19 02:51:05,002 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=22.5 2024-09-19 02:51:10,356 INFO [train.py:1198] (0/2) Epoch 48, batch 1650, loss[loss=0.1856, ctc_loss=0.1177, cr_loss=0.3394, over 21010.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1405, cr_loss=0.3655, over 4103281.37 frames. ], batch size: 52, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:51:17,702 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.214e+02 2.364e+02 2.505e+02 4.946e+02, threshold=4.728e+02, percent-clipped=1.0 2024-09-19 02:51:22,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=855726.1666666666, ans=0.2 2024-09-19 02:51:36,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=855754.5, ans=0.0 2024-09-19 02:51:53,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=22.5 2024-09-19 02:51:57,625 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2024-09-19 02:52:11,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855839.5, ans=0.1 2024-09-19 02:52:17,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=22.5 2024-09-19 02:52:25,290 INFO [train.py:1198] (0/2) Epoch 48, batch 1700, loss[loss=0.2194, ctc_loss=0.1455, cr_loss=0.3696, over 20986.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1404, cr_loss=0.3653, over 4102143.93 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:52:28,951 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-19 02:52:59,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=855924.5, ans=0.0 2024-09-19 02:53:43,974 INFO [train.py:1198] (0/2) Epoch 48, batch 1750, loss[loss=0.2189, ctc_loss=0.1417, cr_loss=0.3858, over 20771.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1404, cr_loss=0.3652, over 4097251.17 frames. ], batch size: 53, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:53:51,536 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.840e+02 2.260e+02 2.406e+02 2.570e+02 4.239e+02, threshold=4.812e+02, percent-clipped=0.0 2024-09-19 02:53:53,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=856009.5, ans=0.0 2024-09-19 02:53:56,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2024-09-19 02:54:02,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856037.8333333334, ans=0.1 2024-09-19 02:54:11,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=856037.8333333334, ans=0.125 2024-09-19 02:54:17,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856066.1666666666, ans=0.1 2024-09-19 02:54:31,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=22.5 2024-09-19 02:54:40,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=856094.5, ans=0.125 2024-09-19 02:55:00,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856122.8333333334, ans=0.1 2024-09-19 02:55:02,788 INFO [train.py:1198] (0/2) Epoch 48, batch 1800, loss[loss=0.2555, ctc_loss=0.175, cr_loss=0.4026, over 18234.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.14, cr_loss=0.3641, over 4102537.10 frames. ], batch size: 108, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:55:09,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=856151.1666666666, ans=0.0 2024-09-19 02:55:22,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856179.5, ans=0.1 2024-09-19 02:56:00,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=856236.1666666666, ans=0.025 2024-09-19 02:56:18,190 INFO [train.py:1198] (0/2) Epoch 48, batch 1850, loss[loss=0.2598, ctc_loss=0.1821, cr_loss=0.3886, over 14067.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.3656, over 4090482.25 frames. ], batch size: 150, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:56:18,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=856292.8333333334, ans=0.0 2024-09-19 02:56:25,739 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.285e+02 2.402e+02 2.576e+02 3.154e+02, threshold=4.804e+02, percent-clipped=0.0 2024-09-19 02:57:08,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=856377.8333333334, ans=0.0 2024-09-19 02:57:33,422 INFO [train.py:1198] (0/2) Epoch 48, batch 1900, loss[loss=0.2211, ctc_loss=0.1468, cr_loss=0.3714, over 20958.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3667, over 4095305.45 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:57:50,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=856462.8333333334, ans=0.125 2024-09-19 02:57:53,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=856462.8333333334, ans=0.0 2024-09-19 02:58:13,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=856491.1666666666, ans=0.125 2024-09-19 02:58:48,646 INFO [train.py:1198] (0/2) Epoch 48, batch 1950, loss[loss=0.1953, ctc_loss=0.1277, cr_loss=0.3379, over 20871.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3667, over 4089435.75 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 02:58:48,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=856576.1666666666, ans=0.2 2024-09-19 02:58:59,336 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.283e+02 2.415e+02 2.594e+02 7.163e+02, threshold=4.830e+02, percent-clipped=2.0 2024-09-19 02:59:28,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=12.0 2024-09-19 03:00:01,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=856689.5, ans=0.125 2024-09-19 03:00:10,229 INFO [train.py:1198] (0/2) Epoch 48, batch 2000, loss[loss=0.2054, ctc_loss=0.1328, cr_loss=0.3634, over 21069.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1424, cr_loss=0.3674, over 4079989.26 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:00:17,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=856717.8333333334, ans=0.0 2024-09-19 03:00:18,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856717.8333333334, ans=0.1 2024-09-19 03:00:27,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=856746.1666666666, ans=0.0 2024-09-19 03:00:43,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=856774.5, ans=0.0 2024-09-19 03:01:25,380 INFO [train.py:1198] (0/2) Epoch 48, batch 2050, loss[loss=0.2277, ctc_loss=0.1514, cr_loss=0.3812, over 20835.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1426, cr_loss=0.3677, over 4080262.21 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:01:32,850 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.302e+02 2.419e+02 2.649e+02 3.552e+02, threshold=4.838e+02, percent-clipped=0.0 2024-09-19 03:01:49,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856887.8333333334, ans=0.1 2024-09-19 03:01:49,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=856887.8333333334, ans=0.125 2024-09-19 03:02:03,805 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.097e-03 2024-09-19 03:02:25,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=856972.8333333334, ans=0.025 2024-09-19 03:02:29,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=856972.8333333334, ans=0.2 2024-09-19 03:02:30,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-19 03:02:34,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=856972.8333333334, ans=0.0 2024-09-19 03:02:35,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=856972.8333333334, ans=0.0 2024-09-19 03:02:41,446 INFO [train.py:1198] (0/2) Epoch 48, batch 2100, loss[loss=0.2607, ctc_loss=0.1747, cr_loss=0.4303, over 20851.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1428, cr_loss=0.368, over 4079280.61 frames. ], batch size: 65, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:02:42,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2024-09-19 03:02:54,166 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2024-09-19 03:02:58,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=857029.5, ans=0.0 2024-09-19 03:03:20,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=857057.8333333334, ans=0.035 2024-09-19 03:03:27,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=857086.1666666666, ans=0.2 2024-09-19 03:03:33,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=22.5 2024-09-19 03:03:56,888 INFO [train.py:1198] (0/2) Epoch 48, batch 2150, loss[loss=0.2219, ctc_loss=0.147, cr_loss=0.3748, over 21011.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3681, over 4087794.99 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:03:58,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=857142.8333333334, ans=0.0 2024-09-19 03:03:58,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=857142.8333333334, ans=0.1 2024-09-19 03:04:00,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=22.5 2024-09-19 03:04:04,448 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.281e+02 2.415e+02 2.543e+02 3.843e+02, threshold=4.831e+02, percent-clipped=0.0 2024-09-19 03:04:04,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=857142.8333333334, ans=0.125 2024-09-19 03:04:39,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=857199.5, ans=0.0 2024-09-19 03:04:46,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2024-09-19 03:04:57,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=857227.8333333334, ans=0.125 2024-09-19 03:05:11,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=857256.1666666666, ans=0.0 2024-09-19 03:05:15,657 INFO [train.py:1198] (0/2) Epoch 48, batch 2200, loss[loss=0.2229, ctc_loss=0.148, cr_loss=0.3749, over 21035.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.143, cr_loss=0.3686, over 4093098.32 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:05:55,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=857341.1666666666, ans=0.0 2024-09-19 03:06:10,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=857369.5, ans=0.0 2024-09-19 03:06:29,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=857397.8333333334, ans=0.125 2024-09-19 03:06:34,006 INFO [train.py:1198] (0/2) Epoch 48, batch 2250, loss[loss=0.2536, ctc_loss=0.1739, cr_loss=0.3985, over 14214.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1437, cr_loss=0.3698, over 4089618.17 frames. ], batch size: 150, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:06:37,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=857426.1666666666, ans=0.125 2024-09-19 03:06:41,559 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.294e+02 2.466e+02 2.611e+02 3.349e+02, threshold=4.932e+02, percent-clipped=0.0 2024-09-19 03:07:22,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=857511.1666666666, ans=0.0 2024-09-19 03:07:50,006 INFO [train.py:1198] (0/2) Epoch 48, batch 2300, loss[loss=0.2523, ctc_loss=0.1722, cr_loss=0.4005, over 20113.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.144, cr_loss=0.3703, over 4089792.01 frames. ], batch size: 80, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:08:03,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=857596.1666666666, ans=0.125 2024-09-19 03:08:14,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=22.5 2024-09-19 03:08:26,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=857624.5, ans=0.2 2024-09-19 03:08:43,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=22.5 2024-09-19 03:09:06,063 INFO [train.py:1198] (0/2) Epoch 48, batch 2350, loss[loss=0.2034, ctc_loss=0.1322, cr_loss=0.3556, over 20870.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1431, cr_loss=0.37, over 4086709.74 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:09:07,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=857709.5, ans=0.0 2024-09-19 03:09:13,541 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.266e+02 2.430e+02 2.610e+02 3.250e+02, threshold=4.860e+02, percent-clipped=0.0 2024-09-19 03:09:14,296 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-19 03:09:19,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=857737.8333333334, ans=0.2 2024-09-19 03:09:26,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-19 03:09:29,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=857737.8333333334, ans=0.125 2024-09-19 03:09:56,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=857794.5, ans=0.125 2024-09-19 03:09:58,049 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-09-19 03:10:20,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=857851.1666666666, ans=0.0 2024-09-19 03:10:21,263 INFO [train.py:1198] (0/2) Epoch 48, batch 2400, loss[loss=0.1592, ctc_loss=0.1022, cr_loss=0.2851, over 20939.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1442, cr_loss=0.3716, over 4093013.28 frames. ], batch size: 49, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:10:30,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=857851.1666666666, ans=0.09899494936611666 2024-09-19 03:11:14,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=857936.1666666666, ans=0.025 2024-09-19 03:11:17,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-09-19 03:11:25,327 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=12.0 2024-09-19 03:11:42,568 INFO [train.py:1198] (0/2) Epoch 48, batch 2450, loss[loss=0.2025, ctc_loss=0.1311, cr_loss=0.3573, over 21056.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1427, cr_loss=0.3697, over 4091380.18 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:11:50,005 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.253e+02 2.414e+02 2.569e+02 3.182e+02, threshold=4.827e+02, percent-clipped=0.0 2024-09-19 03:12:50,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2024-09-19 03:12:57,453 INFO [train.py:1198] (0/2) Epoch 48, batch 2500, loss[loss=0.2516, ctc_loss=0.168, cr_loss=0.4182, over 18446.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1434, cr_loss=0.3711, over 4088024.88 frames. ], batch size: 108, lr: 1.72e-03, grad_scale: 32.0 2024-09-19 03:13:52,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=858219.5, ans=0.09899494936611666 2024-09-19 03:14:04,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=858247.8333333334, ans=0.07 2024-09-19 03:14:12,618 INFO [train.py:1198] (0/2) Epoch 48, batch 2550, loss[loss=0.2321, ctc_loss=0.1518, cr_loss=0.4017, over 20694.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1437, cr_loss=0.3714, over 4097100.78 frames. ], batch size: 68, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:14:20,033 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.289e+02 2.379e+02 2.553e+02 3.056e+02, threshold=4.758e+02, percent-clipped=0.0 2024-09-19 03:14:50,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=858332.8333333334, ans=0.0 2024-09-19 03:15:02,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=858361.1666666666, ans=0.0 2024-09-19 03:15:27,907 INFO [train.py:1198] (0/2) Epoch 48, batch 2600, loss[loss=0.2452, ctc_loss=0.1699, cr_loss=0.3767, over 14203.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1427, cr_loss=0.3699, over 4098524.76 frames. ], batch size: 150, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:16:07,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-09-19 03:16:39,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=858531.1666666666, ans=0.0 2024-09-19 03:16:45,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=22.5 2024-09-19 03:16:46,233 INFO [train.py:1198] (0/2) Epoch 48, batch 2650, loss[loss=0.2271, ctc_loss=0.1513, cr_loss=0.379, over 20629.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1432, cr_loss=0.3704, over 4099564.39 frames. ], batch size: 66, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:16:53,808 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.268e+02 2.414e+02 2.540e+02 6.222e+02, threshold=4.829e+02, percent-clipped=1.0 2024-09-19 03:17:02,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-09-19 03:17:33,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=858644.5, ans=0.2 2024-09-19 03:17:37,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.42 vs. limit=15.0 2024-09-19 03:18:04,335 INFO [train.py:1198] (0/2) Epoch 48, batch 2700, loss[loss=0.1835, ctc_loss=0.1196, cr_loss=0.3194, over 20950.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.142, cr_loss=0.3683, over 4107143.14 frames. ], batch size: 50, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:18:15,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=858701.1666666666, ans=0.125 2024-09-19 03:18:25,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=858729.5, ans=0.125 2024-09-19 03:18:30,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=858729.5, ans=0.0 2024-09-19 03:18:33,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=858757.8333333334, ans=0.025 2024-09-19 03:18:33,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-09-19 03:18:41,232 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=22.5 2024-09-19 03:18:54,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=858786.1666666666, ans=0.04949747468305833 2024-09-19 03:19:19,562 INFO [train.py:1198] (0/2) Epoch 48, batch 2750, loss[loss=0.2177, ctc_loss=0.1424, cr_loss=0.3767, over 20266.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1413, cr_loss=0.367, over 4114408.59 frames. ], batch size: 74, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:19:27,153 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.314e+02 2.423e+02 2.576e+02 5.267e+02, threshold=4.845e+02, percent-clipped=1.0 2024-09-19 03:19:29,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=858842.8333333334, ans=0.125 2024-09-19 03:19:37,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=858871.1666666666, ans=0.125 2024-09-19 03:20:00,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=858899.5, ans=0.125 2024-09-19 03:20:15,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=858927.8333333334, ans=0.09899494936611666 2024-09-19 03:20:15,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=858927.8333333334, ans=0.125 2024-09-19 03:20:19,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.12 vs. limit=10.0 2024-09-19 03:20:25,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2024-09-19 03:20:34,131 INFO [train.py:1198] (0/2) Epoch 48, batch 2800, loss[loss=0.1867, ctc_loss=0.1206, cr_loss=0.3305, over 20784.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1409, cr_loss=0.3659, over 4106796.43 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 64.0 2024-09-19 03:20:43,801 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-09-19 03:21:05,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=859041.1666666666, ans=0.2 2024-09-19 03:21:09,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=859041.1666666666, ans=0.05 2024-09-19 03:21:12,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=859041.1666666666, ans=0.125 2024-09-19 03:21:49,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=859126.1666666666, ans=0.0 2024-09-19 03:21:50,209 INFO [train.py:1198] (0/2) Epoch 48, batch 2850, loss[loss=0.2094, ctc_loss=0.1379, cr_loss=0.3576, over 20777.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1407, cr_loss=0.3661, over 4113277.38 frames. ], batch size: 53, lr: 1.71e-03, grad_scale: 64.0 2024-09-19 03:22:00,559 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.254e+02 2.381e+02 2.510e+02 2.916e+02, threshold=4.762e+02, percent-clipped=0.0 2024-09-19 03:22:02,857 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2024-09-19 03:22:32,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859182.8333333334, ans=0.1 2024-09-19 03:22:34,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=859182.8333333334, ans=0.125 2024-09-19 03:22:38,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=859211.1666666666, ans=0.0 2024-09-19 03:22:43,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=859211.1666666666, ans=0.0 2024-09-19 03:23:01,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=859239.5, ans=0.0 2024-09-19 03:23:11,360 INFO [train.py:1198] (0/2) Epoch 48, batch 2900, loss[loss=0.1885, ctc_loss=0.1234, cr_loss=0.3257, over 20959.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1409, cr_loss=0.3661, over 4110323.71 frames. ], batch size: 51, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:23:19,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=859267.8333333334, ans=0.0 2024-09-19 03:23:41,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=859324.5, ans=0.125 2024-09-19 03:24:26,473 INFO [train.py:1198] (0/2) Epoch 48, batch 2950, loss[loss=0.2215, ctc_loss=0.1468, cr_loss=0.3734, over 20939.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1413, cr_loss=0.3676, over 4115631.24 frames. ], batch size: 60, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:24:31,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=859409.5, ans=0.125 2024-09-19 03:24:35,531 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.306e+02 2.439e+02 2.642e+02 5.256e+02, threshold=4.877e+02, percent-clipped=1.0 2024-09-19 03:25:06,163 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:25:13,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=859494.5, ans=0.125 2024-09-19 03:25:13,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=859494.5, ans=0.5 2024-09-19 03:25:16,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=859494.5, ans=0.125 2024-09-19 03:25:18,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859494.5, ans=0.1 2024-09-19 03:25:35,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2024-09-19 03:25:42,282 INFO [train.py:1198] (0/2) Epoch 48, batch 3000, loss[loss=0.2011, ctc_loss=0.1302, cr_loss=0.3543, over 20794.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1407, cr_loss=0.3662, over 4113254.83 frames. ], batch size: 53, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:25:42,283 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 03:26:00,289 INFO [train.py:1230] (0/2) Epoch 48, validation: loss=0.03871, ctc_loss=0.03871, cr_loss=1.57e-14, over 944034.00 frames. 2024-09-19 03:26:00,289 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-19 03:26:03,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-19 03:26:11,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-09-19 03:26:24,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=859579.5, ans=0.125 2024-09-19 03:26:36,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859607.8333333334, ans=0.1 2024-09-19 03:26:38,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-19 03:26:56,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=859636.1666666666, ans=0.125 2024-09-19 03:27:05,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=859664.5, ans=0.0 2024-09-19 03:27:16,189 INFO [train.py:1198] (0/2) Epoch 48, batch 3050, loss[loss=0.2196, ctc_loss=0.1442, cr_loss=0.3769, over 20978.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1412, cr_loss=0.3656, over 4077992.63 frames. ], batch size: 55, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:27:29,147 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.256e+02 2.379e+02 2.592e+02 3.671e+02, threshold=4.757e+02, percent-clipped=0.0 2024-09-19 03:27:30,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=859692.8333333334, ans=0.2 2024-09-19 03:28:04,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=859777.8333333334, ans=10.0 2024-09-19 03:28:05,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=859777.8333333334, ans=0.2 2024-09-19 03:28:19,592 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-19 03:28:31,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=859806.1666666666, ans=0.125 2024-09-19 03:28:38,145 INFO [train.py:1198] (0/2) Epoch 48, batch 3100, loss[loss=0.197, ctc_loss=0.1287, cr_loss=0.3417, over 20986.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.14, cr_loss=0.364, over 4089716.55 frames. ], batch size: 52, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:29:40,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=859947.8333333334, ans=0.2 2024-09-19 03:29:43,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859947.8333333334, ans=0.1 2024-09-19 03:29:53,589 INFO [train.py:1198] (0/2) Epoch 48, batch 3150, loss[loss=0.1923, ctc_loss=0.1285, cr_loss=0.3193, over 20986.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1405, cr_loss=0.3647, over 4095249.41 frames. ], batch size: 55, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:30:02,662 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.276e+02 2.418e+02 2.611e+02 6.707e+02, threshold=4.835e+02, percent-clipped=1.0 2024-09-19 03:30:07,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860004.5, ans=0.1 2024-09-19 03:30:35,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=860032.8333333334, ans=0.125 2024-09-19 03:31:09,037 INFO [train.py:1198] (0/2) Epoch 48, batch 3200, loss[loss=0.1946, ctc_loss=0.1266, cr_loss=0.3397, over 21016.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.141, cr_loss=0.3658, over 4099425.51 frames. ], batch size: 62, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:31:19,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-19 03:31:58,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=860202.8333333334, ans=0.125 2024-09-19 03:32:25,140 INFO [train.py:1198] (0/2) Epoch 48, batch 3250, loss[loss=0.2041, ctc_loss=0.1349, cr_loss=0.3464, over 21079.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3682, over 4090452.10 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:32:34,241 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.309e+02 2.430e+02 2.595e+02 3.227e+02, threshold=4.860e+02, percent-clipped=0.0 2024-09-19 03:33:37,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=860372.8333333334, ans=0.025 2024-09-19 03:33:44,292 INFO [train.py:1198] (0/2) Epoch 48, batch 3300, loss[loss=0.2225, ctc_loss=0.1463, cr_loss=0.381, over 20975.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3676, over 4094002.49 frames. ], batch size: 58, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:33:58,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=860429.5, ans=0.125 2024-09-19 03:35:00,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=860514.5, ans=0.125 2024-09-19 03:35:02,876 INFO [train.py:1198] (0/2) Epoch 48, batch 3350, loss[loss=0.2009, ctc_loss=0.1319, cr_loss=0.345, over 19452.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.368, over 4089244.28 frames. ], batch size: 43, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:35:11,907 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.281e+02 2.389e+02 2.508e+02 3.043e+02, threshold=4.778e+02, percent-clipped=0.0 2024-09-19 03:35:42,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=860599.5, ans=0.125 2024-09-19 03:36:03,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=860656.1666666666, ans=0.125 2024-09-19 03:36:17,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=860684.5, ans=0.2 2024-09-19 03:36:18,986 INFO [train.py:1198] (0/2) Epoch 48, batch 3400, loss[loss=0.2204, ctc_loss=0.1434, cr_loss=0.3847, over 21027.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.142, cr_loss=0.3684, over 4094244.12 frames. ], batch size: 63, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:36:34,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=860712.8333333334, ans=0.125 2024-09-19 03:36:52,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=860741.1666666666, ans=0.125 2024-09-19 03:36:54,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=860741.1666666666, ans=0.125 2024-09-19 03:37:34,283 INFO [train.py:1198] (0/2) Epoch 48, batch 3450, loss[loss=0.1904, ctc_loss=0.1229, cr_loss=0.3375, over 20975.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1423, cr_loss=0.3686, over 4091501.06 frames. ], batch size: 51, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:37:36,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=860826.1666666666, ans=0.0 2024-09-19 03:37:43,122 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.068e+02 2.300e+02 2.426e+02 2.554e+02 6.184e+02, threshold=4.852e+02, percent-clipped=1.0 2024-09-19 03:37:43,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=860826.1666666666, ans=0.0 2024-09-19 03:37:46,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=860826.1666666666, ans=0.0 2024-09-19 03:37:47,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=860854.5, ans=0.0 2024-09-19 03:38:09,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=860882.8333333334, ans=0.0 2024-09-19 03:38:23,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=860911.1666666666, ans=0.035 2024-09-19 03:38:30,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=860911.1666666666, ans=0.125 2024-09-19 03:38:49,504 INFO [train.py:1198] (0/2) Epoch 48, batch 3500, loss[loss=0.26, ctc_loss=0.1803, cr_loss=0.3986, over 14191.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1432, cr_loss=0.3698, over 4093898.33 frames. ], batch size: 150, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:39:10,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=860996.1666666666, ans=0.125 2024-09-19 03:39:16,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=860996.1666666666, ans=0.125 2024-09-19 03:39:26,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2024-09-19 03:39:37,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=861052.8333333334, ans=0.0 2024-09-19 03:40:10,415 INFO [train.py:1198] (0/2) Epoch 48, batch 3550, loss[loss=0.1889, ctc_loss=0.1228, cr_loss=0.3307, over 20884.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1434, cr_loss=0.369, over 4069066.55 frames. ], batch size: 54, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:40:19,618 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.284e+02 2.398e+02 2.559e+02 3.127e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-19 03:40:24,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=861137.8333333334, ans=0.0 2024-09-19 03:41:00,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=861194.5, ans=0.025 2024-09-19 03:41:26,480 INFO [train.py:1198] (0/2) Epoch 48, batch 3600, loss[loss=0.1825, ctc_loss=0.1182, cr_loss=0.3218, over 20963.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3693, over 4070580.43 frames. ], batch size: 50, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:42:05,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=861307.8333333334, ans=0.2 2024-09-19 03:42:08,527 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-304000.pt 2024-09-19 03:42:18,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=861336.1666666666, ans=0.125 2024-09-19 03:42:42,415 INFO [train.py:1198] (0/2) Epoch 48, batch 3650, loss[loss=0.2489, ctc_loss=0.1655, cr_loss=0.4167, over 20673.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1438, cr_loss=0.3695, over 4061208.01 frames. ], batch size: 66, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:42:51,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.290e+02 2.415e+02 2.618e+02 3.744e+02, threshold=4.829e+02, percent-clipped=0.0 2024-09-19 03:43:06,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=861421.1666666666, ans=0.125 2024-09-19 03:43:25,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=861449.5, ans=10.0 2024-09-19 03:43:38,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=861477.8333333334, ans=0.025 2024-09-19 03:43:47,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861506.1666666666, ans=0.1 2024-09-19 03:43:58,426 INFO [train.py:1198] (0/2) Epoch 48, batch 3700, loss[loss=0.2284, ctc_loss=0.1486, cr_loss=0.399, over 20650.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.368, over 4072387.21 frames. ], batch size: 66, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:44:06,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-09-19 03:44:30,430 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:45:00,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=861647.8333333334, ans=0.0 2024-09-19 03:45:16,833 INFO [train.py:1198] (0/2) Epoch 48, batch 3750, loss[loss=0.2291, ctc_loss=0.1497, cr_loss=0.3968, over 20847.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1431, cr_loss=0.3692, over 4077057.78 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:45:26,121 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.284e+02 2.451e+02 2.639e+02 4.339e+02, threshold=4.902e+02, percent-clipped=0.0 2024-09-19 03:45:38,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=861704.5, ans=0.1 2024-09-19 03:46:08,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=861761.1666666666, ans=0.125 2024-09-19 03:46:13,907 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2024-09-19 03:46:23,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861789.5, ans=0.1 2024-09-19 03:46:35,860 INFO [train.py:1198] (0/2) Epoch 48, batch 3800, loss[loss=0.2031, ctc_loss=0.1311, cr_loss=0.3597, over 20980.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1434, cr_loss=0.3687, over 4075705.81 frames. ], batch size: 51, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:47:44,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=861931.1666666666, ans=0.1 2024-09-19 03:47:51,460 INFO [train.py:1198] (0/2) Epoch 48, batch 3850, loss[loss=0.2083, ctc_loss=0.1368, cr_loss=0.3573, over 20966.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.1432, cr_loss=0.3688, over 4081990.66 frames. ], batch size: 58, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:47:57,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=861959.5, ans=0.0 2024-09-19 03:48:00,556 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.004e+02 2.244e+02 2.366e+02 2.611e+02 5.786e+02, threshold=4.732e+02, percent-clipped=1.0 2024-09-19 03:48:19,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861987.8333333334, ans=0.1 2024-09-19 03:48:34,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=862016.1666666666, ans=0.1 2024-09-19 03:49:06,851 INFO [train.py:1198] (0/2) Epoch 48, batch 3900, loss[loss=0.2401, ctc_loss=0.1579, cr_loss=0.4109, over 20870.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1427, cr_loss=0.3681, over 4088173.36 frames. ], batch size: 57, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:49:10,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=862101.1666666666, ans=0.125 2024-09-19 03:49:25,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862129.5, ans=0.1 2024-09-19 03:49:32,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=862129.5, ans=0.0 2024-09-19 03:49:33,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2024-09-19 03:50:03,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=862186.1666666666, ans=0.2 2024-09-19 03:50:13,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=862214.5, ans=0.125 2024-09-19 03:50:23,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=862214.5, ans=0.125 2024-09-19 03:50:25,887 INFO [train.py:1198] (0/2) Epoch 48, batch 3950, loss[loss=0.2597, ctc_loss=0.1778, cr_loss=0.4092, over 14851.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3681, over 4085878.63 frames. ], batch size: 149, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:50:35,012 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.028e+02 2.312e+02 2.455e+02 2.589e+02 4.106e+02, threshold=4.911e+02, percent-clipped=0.0 2024-09-19 03:50:41,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=862271.1666666666, ans=0.125 2024-09-19 03:50:49,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.09 vs. limit=10.0 2024-09-19 03:50:53,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=862271.1666666666, ans=0.125 2024-09-19 03:51:44,996 INFO [train.py:1198] (0/2) Epoch 48, batch 4000, loss[loss=0.2256, ctc_loss=0.1486, cr_loss=0.3851, over 19260.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1423, cr_loss=0.3674, over 4089485.25 frames. ], batch size: 90, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:52:21,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=862441.1666666666, ans=0.04949747468305833 2024-09-19 03:52:24,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=862441.1666666666, ans=0.1 2024-09-19 03:52:33,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=862469.5, ans=0.0 2024-09-19 03:53:00,252 INFO [train.py:1198] (0/2) Epoch 48, batch 4050, loss[loss=0.1929, ctc_loss=0.1288, cr_loss=0.3205, over 21013.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1426, cr_loss=0.3679, over 4091399.20 frames. ], batch size: 62, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:53:08,982 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.240e+02 2.415e+02 2.538e+02 3.576e+02, threshold=4.829e+02, percent-clipped=0.0 2024-09-19 03:53:13,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862554.5, ans=0.1 2024-09-19 03:53:36,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=862582.8333333334, ans=0.025 2024-09-19 03:54:15,334 INFO [train.py:1198] (0/2) Epoch 48, batch 4100, loss[loss=0.2081, ctc_loss=0.1363, cr_loss=0.3587, over 20976.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.143, cr_loss=0.3691, over 4085946.26 frames. ], batch size: 58, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:54:35,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2024-09-19 03:54:41,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=862696.1666666666, ans=0.07 2024-09-19 03:55:30,726 INFO [train.py:1198] (0/2) Epoch 48, batch 4150, loss[loss=0.2168, ctc_loss=0.1415, cr_loss=0.3768, over 20908.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1427, cr_loss=0.3686, over 4088553.00 frames. ], batch size: 54, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:55:35,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=862809.5, ans=0.1 2024-09-19 03:55:39,684 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.224e+02 2.365e+02 2.504e+02 3.395e+02, threshold=4.731e+02, percent-clipped=0.0 2024-09-19 03:55:41,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=862809.5, ans=0.125 2024-09-19 03:55:45,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=862837.8333333334, ans=0.125 2024-09-19 03:55:46,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=862837.8333333334, ans=0.2 2024-09-19 03:55:53,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=862837.8333333334, ans=0.0 2024-09-19 03:56:00,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=862837.8333333334, ans=10.0 2024-09-19 03:56:05,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=862866.1666666666, ans=22.5 2024-09-19 03:56:20,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=12.0 2024-09-19 03:56:34,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862922.8333333334, ans=0.1 2024-09-19 03:56:51,298 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.04 vs. limit=15.0 2024-09-19 03:56:51,800 INFO [train.py:1198] (0/2) Epoch 48, batch 4200, loss[loss=0.1849, ctc_loss=0.1193, cr_loss=0.328, over 20971.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1429, cr_loss=0.3696, over 4091512.70 frames. ], batch size: 50, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:57:29,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=863007.8333333334, ans=0.0 2024-09-19 03:57:37,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863036.1666666666, ans=0.1 2024-09-19 03:57:41,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.07 vs. limit=10.0 2024-09-19 03:57:44,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=863036.1666666666, ans=0.125 2024-09-19 03:58:01,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=863064.5, ans=0.2 2024-09-19 03:58:07,144 INFO [train.py:1198] (0/2) Epoch 48, batch 4250, loss[loss=0.2182, ctc_loss=0.1418, cr_loss=0.3816, over 20975.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.143, cr_loss=0.3699, over 4090068.61 frames. ], batch size: 52, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:58:16,261 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.324e+02 2.431e+02 2.611e+02 4.083e+02, threshold=4.862e+02, percent-clipped=0.0 2024-09-19 03:58:28,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=863121.1666666666, ans=0.125 2024-09-19 03:58:38,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=863149.5, ans=0.125 2024-09-19 03:58:43,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=863149.5, ans=0.015 2024-09-19 03:58:44,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=863149.5, ans=0.0 2024-09-19 03:59:21,993 INFO [train.py:1198] (0/2) Epoch 48, batch 4300, loss[loss=0.242, ctc_loss=0.1644, cr_loss=0.388, over 21044.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1427, cr_loss=0.3697, over 4101539.33 frames. ], batch size: 62, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 03:59:23,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=863234.5, ans=0.2 2024-09-19 03:59:52,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=863291.1666666666, ans=0.1 2024-09-19 03:59:54,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=863291.1666666666, ans=0.125 2024-09-19 04:00:01,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=863291.1666666666, ans=0.125 2024-09-19 04:00:06,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=863319.5, ans=0.125 2024-09-19 04:00:21,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=863347.8333333334, ans=0.0 2024-09-19 04:00:38,088 INFO [train.py:1198] (0/2) Epoch 48, batch 4350, loss[loss=0.2148, ctc_loss=0.1414, cr_loss=0.367, over 20992.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1424, cr_loss=0.3683, over 4099940.77 frames. ], batch size: 55, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:00:47,138 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.255e+02 2.397e+02 2.540e+02 3.175e+02, threshold=4.793e+02, percent-clipped=0.0 2024-09-19 04:00:56,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=863404.5, ans=0.125 2024-09-19 04:01:00,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=863404.5, ans=0.125 2024-09-19 04:01:56,773 INFO [train.py:1198] (0/2) Epoch 48, batch 4400, loss[loss=0.219, ctc_loss=0.1453, cr_loss=0.3685, over 20828.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1424, cr_loss=0.3686, over 4104398.75 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:02:09,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=863517.8333333334, ans=0.0 2024-09-19 04:02:47,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=863602.8333333334, ans=0.125 2024-09-19 04:03:08,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=863631.1666666666, ans=0.2 2024-09-19 04:03:15,480 INFO [train.py:1198] (0/2) Epoch 48, batch 4450, loss[loss=0.1791, ctc_loss=0.1166, cr_loss=0.3124, over 20944.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1423, cr_loss=0.3681, over 4095034.95 frames. ], batch size: 48, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:03:24,450 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.278e+02 2.417e+02 2.577e+02 3.487e+02, threshold=4.834e+02, percent-clipped=0.0 2024-09-19 04:03:37,633 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.65 vs. limit=6.0 2024-09-19 04:03:50,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=863716.1666666666, ans=0.125 2024-09-19 04:03:56,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=863716.1666666666, ans=0.025 2024-09-19 04:04:07,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=863744.5, ans=0.1 2024-09-19 04:04:13,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=863744.5, ans=0.125 2024-09-19 04:04:20,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=863772.8333333334, ans=0.1 2024-09-19 04:04:31,269 INFO [train.py:1198] (0/2) Epoch 48, batch 4500, loss[loss=0.1762, ctc_loss=0.1139, cr_loss=0.3111, over 20979.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3661, over 4085147.01 frames. ], batch size: 52, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:04:39,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=863801.1666666666, ans=0.1 2024-09-19 04:05:14,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=863857.8333333334, ans=0.125 2024-09-19 04:05:19,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2024-09-19 04:05:20,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=863886.1666666666, ans=0.125 2024-09-19 04:05:47,270 INFO [train.py:1198] (0/2) Epoch 48, batch 4550, loss[loss=0.1796, ctc_loss=0.1161, cr_loss=0.3174, over 19840.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1407, cr_loss=0.3647, over 4090374.40 frames. ], batch size: 44, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:05:56,193 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.314e+02 2.436e+02 2.564e+02 3.850e+02, threshold=4.872e+02, percent-clipped=0.0 2024-09-19 04:06:18,142 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-19 04:06:28,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=863999.5, ans=0.2 2024-09-19 04:07:02,848 INFO [train.py:1198] (0/2) Epoch 48, batch 4600, loss[loss=0.2305, ctc_loss=0.152, cr_loss=0.3924, over 20814.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.141, cr_loss=0.3658, over 4104202.69 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:07:07,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=864084.5, ans=0.125 2024-09-19 04:07:07,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=864084.5, ans=0.1 2024-09-19 04:07:33,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2024-09-19 04:08:11,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=864197.8333333334, ans=0.125 2024-09-19 04:08:24,295 INFO [train.py:1198] (0/2) Epoch 48, batch 4650, loss[loss=0.2555, ctc_loss=0.1705, cr_loss=0.4251, over 20674.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3662, over 4091819.35 frames. ], batch size: 66, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:08:33,496 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.287e+02 2.427e+02 2.595e+02 1.027e+03, threshold=4.854e+02, percent-clipped=1.0 2024-09-19 04:08:44,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=864254.5, ans=0.125 2024-09-19 04:09:39,783 INFO [train.py:1198] (0/2) Epoch 48, batch 4700, loss[loss=0.2279, ctc_loss=0.1484, cr_loss=0.3976, over 20755.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.3653, over 4103663.00 frames. ], batch size: 53, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:09:49,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=864367.8333333334, ans=0.1 2024-09-19 04:10:23,325 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.95 vs. limit=10.0 2024-09-19 04:10:55,106 INFO [train.py:1198] (0/2) Epoch 48, batch 4750, loss[loss=0.2097, ctc_loss=0.1394, cr_loss=0.3516, over 20953.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3674, over 4101803.61 frames. ], batch size: 58, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:11:04,293 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.276e+02 2.425e+02 2.615e+02 3.429e+02, threshold=4.850e+02, percent-clipped=0.0 2024-09-19 04:11:22,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=864537.8333333334, ans=0.125 2024-09-19 04:11:43,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=864594.5, ans=0.125 2024-09-19 04:12:03,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=864622.8333333334, ans=0.0 2024-09-19 04:12:10,791 INFO [train.py:1198] (0/2) Epoch 48, batch 4800, loss[loss=0.1874, ctc_loss=0.1226, cr_loss=0.3235, over 20843.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1415, cr_loss=0.3677, over 4103645.21 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:12:23,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=22.5 2024-09-19 04:12:43,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=864707.8333333334, ans=0.1 2024-09-19 04:12:47,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=864707.8333333334, ans=0.125 2024-09-19 04:12:51,041 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=12.0 2024-09-19 04:13:23,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=864764.5, ans=0.0 2024-09-19 04:13:29,679 INFO [train.py:1198] (0/2) Epoch 48, batch 4850, loss[loss=0.2141, ctc_loss=0.1405, cr_loss=0.368, over 20979.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1413, cr_loss=0.368, over 4102816.30 frames. ], batch size: 55, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:13:40,155 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.289e+02 2.411e+02 2.617e+02 4.852e+02, threshold=4.823e+02, percent-clipped=1.0 2024-09-19 04:13:55,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=864821.1666666666, ans=0.125 2024-09-19 04:14:00,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=864849.5, ans=0.125 2024-09-19 04:14:42,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=864906.1666666666, ans=0.125 2024-09-19 04:14:43,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=864906.1666666666, ans=0.125 2024-09-19 04:14:48,123 INFO [train.py:1198] (0/2) Epoch 48, batch 4900, loss[loss=0.1808, ctc_loss=0.1167, cr_loss=0.3208, over 20996.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1408, cr_loss=0.3664, over 4100966.57 frames. ], batch size: 48, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:15:08,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=864962.8333333334, ans=0.125 2024-09-19 04:15:52,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-09-19 04:16:03,323 INFO [train.py:1198] (0/2) Epoch 48, batch 4950, loss[loss=0.1759, ctc_loss=0.1127, cr_loss=0.3161, over 21002.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1404, cr_loss=0.3653, over 4100942.21 frames. ], batch size: 48, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:16:10,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=865076.1666666666, ans=0.125 2024-09-19 04:16:13,728 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.276e+02 2.404e+02 2.529e+02 3.480e+02, threshold=4.808e+02, percent-clipped=0.0 2024-09-19 04:16:44,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=12.0 2024-09-19 04:16:55,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=865161.1666666666, ans=0.125 2024-09-19 04:17:13,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=865189.5, ans=0.0 2024-09-19 04:17:17,539 INFO [train.py:1198] (0/2) Epoch 48, batch 5000, loss[loss=0.2071, ctc_loss=0.1365, cr_loss=0.3526, over 20971.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1406, cr_loss=0.3662, over 4101017.75 frames. ], batch size: 58, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:17:25,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=865217.8333333334, ans=0.125 2024-09-19 04:18:12,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=865302.8333333334, ans=0.1 2024-09-19 04:18:31,794 INFO [train.py:1198] (0/2) Epoch 48, batch 5050, loss[loss=0.221, ctc_loss=0.1451, cr_loss=0.3794, over 20981.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1408, cr_loss=0.3671, over 4111910.77 frames. ], batch size: 67, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:18:32,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=865359.5, ans=0.02 2024-09-19 04:18:42,334 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.233e+02 2.415e+02 2.584e+02 4.916e+02, threshold=4.830e+02, percent-clipped=1.0 2024-09-19 04:18:44,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=865359.5, ans=0.0 2024-09-19 04:18:50,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.06 vs. limit=15.0 2024-09-19 04:19:19,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=865444.5, ans=0.0 2024-09-19 04:19:21,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=865444.5, ans=0.125 2024-09-19 04:19:46,458 INFO [train.py:1198] (0/2) Epoch 48, batch 5100, loss[loss=0.234, ctc_loss=0.1533, cr_loss=0.4036, over 20829.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1414, cr_loss=0.368, over 4111425.50 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:20:08,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.20 vs. limit=10.0 2024-09-19 04:20:13,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=865529.5, ans=0.125 2024-09-19 04:20:27,744 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2024-09-19 04:20:43,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=865586.1666666666, ans=0.025 2024-09-19 04:21:00,991 INFO [train.py:1198] (0/2) Epoch 48, batch 5150, loss[loss=0.2091, ctc_loss=0.1369, cr_loss=0.361, over 20825.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.141, cr_loss=0.3671, over 4114050.40 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:21:02,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=865642.8333333334, ans=0.125 2024-09-19 04:21:06,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=865642.8333333334, ans=15.0 2024-09-19 04:21:12,775 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.986e+02 2.256e+02 2.401e+02 2.512e+02 4.424e+02, threshold=4.802e+02, percent-clipped=0.0 2024-09-19 04:21:22,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=865671.1666666666, ans=0.07 2024-09-19 04:21:25,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=865671.1666666666, ans=0.1 2024-09-19 04:21:29,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=865671.1666666666, ans=0.125 2024-09-19 04:21:40,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=865699.5, ans=0.035 2024-09-19 04:21:54,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=865727.8333333334, ans=0.0 2024-09-19 04:21:59,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=865727.8333333334, ans=15.0 2024-09-19 04:22:04,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=865756.1666666666, ans=0.1 2024-09-19 04:22:06,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=865756.1666666666, ans=0.1 2024-09-19 04:22:13,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=865756.1666666666, ans=0.125 2024-09-19 04:22:17,484 INFO [train.py:1198] (0/2) Epoch 48, batch 5200, loss[loss=0.1975, ctc_loss=0.1264, cr_loss=0.3554, over 20901.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1411, cr_loss=0.3664, over 4098130.82 frames. ], batch size: 54, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:22:20,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=865784.5, ans=0.125 2024-09-19 04:22:29,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=865784.5, ans=0.125 2024-09-19 04:22:41,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-09-19 04:22:51,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=865841.1666666666, ans=0.2 2024-09-19 04:23:05,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=865869.5, ans=0.125 2024-09-19 04:23:31,535 INFO [train.py:1198] (0/2) Epoch 48, batch 5250, loss[loss=0.1826, ctc_loss=0.1175, cr_loss=0.3255, over 20966.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1407, cr_loss=0.3663, over 4111453.73 frames. ], batch size: 48, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:23:43,737 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.283e+02 2.408e+02 2.633e+02 5.231e+02, threshold=4.816e+02, percent-clipped=1.0 2024-09-19 04:24:48,728 INFO [train.py:1198] (0/2) Epoch 48, batch 5300, loss[loss=0.2495, ctc_loss=0.1643, cr_loss=0.4256, over 20955.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1411, cr_loss=0.3664, over 4098430.64 frames. ], batch size: 60, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:24:53,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-09-19 04:25:57,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=866181.1666666666, ans=0.1 2024-09-19 04:26:03,243 INFO [train.py:1198] (0/2) Epoch 48, batch 5350, loss[loss=0.2192, ctc_loss=0.1419, cr_loss=0.3862, over 20970.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1413, cr_loss=0.3672, over 4104856.96 frames. ], batch size: 55, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:26:08,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=866209.5, ans=0.125 2024-09-19 04:26:15,076 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.320e+02 2.437e+02 2.636e+02 3.633e+02, threshold=4.874e+02, percent-clipped=0.0 2024-09-19 04:26:36,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=866266.1666666666, ans=0.125 2024-09-19 04:26:43,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=866266.1666666666, ans=0.1 2024-09-19 04:26:57,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-19 04:27:00,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=866294.5, ans=0.1 2024-09-19 04:27:16,503 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:27:17,593 INFO [train.py:1198] (0/2) Epoch 48, batch 5400, loss[loss=0.1983, ctc_loss=0.1281, cr_loss=0.3512, over 20981.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1407, cr_loss=0.3663, over 4107103.92 frames. ], batch size: 52, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:27:34,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=866379.5, ans=0.125 2024-09-19 04:28:32,126 INFO [train.py:1198] (0/2) Epoch 48, batch 5450, loss[loss=0.1995, ctc_loss=0.1312, cr_loss=0.3415, over 20777.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1406, cr_loss=0.3666, over 4096542.67 frames. ], batch size: 53, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:28:41,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=866492.8333333334, ans=0.125 2024-09-19 04:28:44,270 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:28:45,318 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.254e+02 2.383e+02 2.550e+02 3.825e+02, threshold=4.765e+02, percent-clipped=0.0 2024-09-19 04:29:25,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=866577.8333333334, ans=0.0 2024-09-19 04:29:38,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=866606.1666666666, ans=0.0 2024-09-19 04:29:45,984 INFO [train.py:1198] (0/2) Epoch 48, batch 5500, loss[loss=0.2054, ctc_loss=0.1326, cr_loss=0.3641, over 21014.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.3656, over 4095559.77 frames. ], batch size: 61, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:29:52,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2024-09-19 04:29:58,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2024-09-19 04:30:14,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=866691.1666666666, ans=0.1 2024-09-19 04:30:36,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-19 04:30:46,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-19 04:31:01,757 INFO [train.py:1198] (0/2) Epoch 48, batch 5550, loss[loss=0.2113, ctc_loss=0.1416, cr_loss=0.3481, over 20883.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1416, cr_loss=0.3673, over 4097373.01 frames. ], batch size: 54, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:31:15,188 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.255e+02 2.403e+02 2.586e+02 4.346e+02, threshold=4.807e+02, percent-clipped=0.0 2024-09-19 04:31:49,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=866861.1666666666, ans=0.125 2024-09-19 04:32:16,350 INFO [train.py:1198] (0/2) Epoch 48, batch 5600, loss[loss=0.2303, ctc_loss=0.1512, cr_loss=0.3954, over 20948.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1418, cr_loss=0.3676, over 4113283.77 frames. ], batch size: 64, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:33:33,110 INFO [train.py:1198] (0/2) Epoch 48, batch 5650, loss[loss=0.2029, ctc_loss=0.1315, cr_loss=0.357, over 21076.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.3669, over 4096216.96 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:33:47,623 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.233e+02 2.390e+02 2.509e+02 3.656e+02, threshold=4.779e+02, percent-clipped=0.0 2024-09-19 04:34:00,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=867087.8333333334, ans=0.125 2024-09-19 04:34:26,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=867144.5, ans=0.0 2024-09-19 04:34:47,567 INFO [train.py:1198] (0/2) Epoch 48, batch 5700, loss[loss=0.2297, ctc_loss=0.1508, cr_loss=0.3947, over 20962.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1418, cr_loss=0.3677, over 4100375.33 frames. ], batch size: 64, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:34:48,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-09-19 04:34:49,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867201.1666666666, ans=0.1 2024-09-19 04:34:55,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=867201.1666666666, ans=0.0 2024-09-19 04:35:25,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=867257.8333333334, ans=0.125 2024-09-19 04:35:25,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=867257.8333333334, ans=0.0 2024-09-19 04:35:38,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=867286.1666666666, ans=0.0 2024-09-19 04:35:55,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=15.0 2024-09-19 04:35:59,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867314.5, ans=0.1 2024-09-19 04:36:02,072 INFO [train.py:1198] (0/2) Epoch 48, batch 5750, loss[loss=0.2286, ctc_loss=0.1501, cr_loss=0.3926, over 20953.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1415, cr_loss=0.3679, over 4098893.21 frames. ], batch size: 64, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:36:03,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=22.5 2024-09-19 04:36:16,884 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.236e+02 2.330e+02 2.492e+02 3.466e+02, threshold=4.661e+02, percent-clipped=0.0 2024-09-19 04:36:29,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=867371.1666666666, ans=0.125 2024-09-19 04:36:35,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=867399.5, ans=0.0 2024-09-19 04:36:39,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=867399.5, ans=0.2 2024-09-19 04:36:47,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.08 vs. limit=10.0 2024-09-19 04:37:01,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=22.5 2024-09-19 04:37:03,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=22.5 2024-09-19 04:37:16,496 INFO [train.py:1198] (0/2) Epoch 48, batch 5800, loss[loss=0.2316, ctc_loss=0.1502, cr_loss=0.4074, over 20650.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1409, cr_loss=0.366, over 4095365.72 frames. ], batch size: 68, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:38:03,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=867569.5, ans=0.125 2024-09-19 04:38:31,108 INFO [train.py:1198] (0/2) Epoch 48, batch 5850, loss[loss=0.1768, ctc_loss=0.1131, cr_loss=0.3186, over 20997.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.141, cr_loss=0.3664, over 4100505.80 frames. ], batch size: 48, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:38:39,304 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-09-19 04:38:46,313 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.317e+02 2.437e+02 2.557e+02 4.299e+02, threshold=4.874e+02, percent-clipped=0.0 2024-09-19 04:39:47,816 INFO [train.py:1198] (0/2) Epoch 48, batch 5900, loss[loss=0.2129, ctc_loss=0.1379, cr_loss=0.3751, over 20820.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1397, cr_loss=0.3644, over 4102697.29 frames. ], batch size: 59, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:39:51,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=867767.8333333334, ans=0.125 2024-09-19 04:40:10,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=867796.1666666666, ans=0.0 2024-09-19 04:40:38,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=867852.8333333334, ans=0.1 2024-09-19 04:40:48,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867881.1666666666, ans=0.1 2024-09-19 04:41:01,823 INFO [train.py:1198] (0/2) Epoch 48, batch 5950, loss[loss=0.216, ctc_loss=0.1416, cr_loss=0.372, over 21064.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1408, cr_loss=0.3663, over 4095396.72 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 16.0 2024-09-19 04:41:02,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=867909.5, ans=0.0 2024-09-19 04:41:16,699 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.293e+02 2.409e+02 2.557e+02 3.514e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-19 04:41:17,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=867937.8333333334, ans=0.125 2024-09-19 04:41:21,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867937.8333333334, ans=0.1 2024-09-19 04:41:23,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=22.5 2024-09-19 04:41:32,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=867966.1666666666, ans=0.2 2024-09-19 04:41:36,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=15.0 2024-09-19 04:41:47,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867994.5, ans=0.1 2024-09-19 04:41:58,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2024-09-19 04:42:18,323 INFO [train.py:1198] (0/2) Epoch 48, batch 6000, loss[loss=0.2284, ctc_loss=0.151, cr_loss=0.3866, over 20712.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3674, over 4078271.18 frames. ], batch size: 68, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:42:18,325 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 04:42:34,935 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6743, 4.2754, 4.2135, 4.2373], device='cuda:0') 2024-09-19 04:42:36,274 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9865, 4.6998, 4.5449, 4.1404], device='cuda:0') 2024-09-19 04:42:36,913 INFO [train.py:1230] (0/2) Epoch 48, validation: loss=0.03868, ctc_loss=0.03868, cr_loss=1.598e-14, over 944034.00 frames. 2024-09-19 04:42:36,914 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-19 04:43:01,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=868079.5, ans=0.07 2024-09-19 04:43:34,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=868136.1666666666, ans=0.125 2024-09-19 04:43:50,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=868192.8333333334, ans=0.1 2024-09-19 04:43:51,437 INFO [train.py:1198] (0/2) Epoch 48, batch 6050, loss[loss=0.2164, ctc_loss=0.1424, cr_loss=0.37, over 20777.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1418, cr_loss=0.3671, over 4087977.40 frames. ], batch size: 56, lr: 1.71e-03, grad_scale: 32.0 2024-09-19 04:43:51,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=868192.8333333334, ans=0.125 2024-09-19 04:43:56,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.05 vs. limit=6.0 2024-09-19 04:44:06,495 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.280e+02 2.421e+02 2.646e+02 3.730e+02, threshold=4.841e+02, percent-clipped=0.0 2024-09-19 04:44:10,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=868221.1666666666, ans=0.035 2024-09-19 04:44:34,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=868249.5, ans=0.0 2024-09-19 04:45:06,519 INFO [train.py:1198] (0/2) Epoch 48, batch 6100, loss[loss=0.2322, ctc_loss=0.1547, cr_loss=0.3874, over 20867.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1421, cr_loss=0.3675, over 4085587.70 frames. ], batch size: 65, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:45:15,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=868334.5, ans=0.125 2024-09-19 04:45:52,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=22.5 2024-09-19 04:45:56,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=868419.5, ans=0.125 2024-09-19 04:46:10,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=868447.8333333334, ans=0.2 2024-09-19 04:46:20,309 INFO [train.py:1198] (0/2) Epoch 48, batch 6150, loss[loss=0.2096, ctc_loss=0.1386, cr_loss=0.3553, over 20990.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1428, cr_loss=0.3687, over 4076641.24 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:46:23,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=868476.1666666666, ans=0.0 2024-09-19 04:46:32,535 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:46:33,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=868504.5, ans=0.125 2024-09-19 04:46:35,027 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.274e+02 2.392e+02 2.563e+02 3.142e+02, threshold=4.783e+02, percent-clipped=0.0 2024-09-19 04:46:49,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=22.5 2024-09-19 04:46:53,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=868532.8333333334, ans=0.125 2024-09-19 04:46:59,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=22.5 2024-09-19 04:47:11,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=868561.1666666666, ans=0.0 2024-09-19 04:47:33,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=868617.8333333334, ans=0.125 2024-09-19 04:47:35,174 INFO [train.py:1198] (0/2) Epoch 48, batch 6200, loss[loss=0.2106, ctc_loss=0.1393, cr_loss=0.3568, over 21040.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1426, cr_loss=0.3674, over 4060202.02 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:48:01,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=868646.1666666666, ans=0.125 2024-09-19 04:48:48,582 INFO [train.py:1198] (0/2) Epoch 48, batch 6250, loss[loss=0.2618, ctc_loss=0.176, cr_loss=0.4293, over 18343.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1435, cr_loss=0.369, over 4030370.30 frames. ], batch size: 108, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:49:03,073 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.282e+02 2.464e+02 2.645e+02 5.276e+02, threshold=4.928e+02, percent-clipped=1.0 2024-09-19 04:49:03,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=868787.8333333334, ans=0.0 2024-09-19 04:49:13,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=868787.8333333334, ans=0.0 2024-09-19 04:49:17,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=868816.1666666666, ans=0.125 2024-09-19 04:49:35,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=868844.5, ans=0.1 2024-09-19 04:49:39,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=868844.5, ans=0.125 2024-09-19 04:50:03,014 INFO [train.py:1198] (0/2) Epoch 48, batch 6300, loss[loss=0.2332, ctc_loss=0.1583, cr_loss=0.3744, over 21037.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1453, cr_loss=0.3717, over 4012260.52 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:50:54,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=868986.1666666666, ans=0.1 2024-09-19 04:50:56,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=868986.1666666666, ans=0.1 2024-09-19 04:51:16,103 INFO [train.py:1198] (0/2) Epoch 48, batch 6350, loss[loss=0.2405, ctc_loss=0.1629, cr_loss=0.3882, over 13812.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1494, cr_loss=0.3738, over 3806875.71 frames. ], batch size: 149, lr: 1.70e-03, grad_scale: 32.0 2024-09-19 04:51:30,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.505e+02 2.792e+02 3.033e+02 6.518e+02, threshold=5.583e+02, percent-clipped=1.0 2024-09-19 04:51:51,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.87 vs. limit=10.0 2024-09-19 04:52:14,487 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-48.pt 2024-09-19 04:53:04,393 INFO [train.py:1198] (0/2) Epoch 49, batch 0, loss[loss=0.2109, ctc_loss=0.1371, cr_loss=0.3692, over 20857.00 frames. ], tot_loss[loss=0.2109, ctc_loss=0.1371, cr_loss=0.3692, over 20857.00 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:53:04,394 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 04:53:22,698 INFO [train.py:1230] (0/2) Epoch 49, validation: loss=0.03846, ctc_loss=0.03846, cr_loss=1.688e-14, over 944034.00 frames. 2024-09-19 04:53:22,699 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-19 04:53:34,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-19 04:53:41,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=869187.3333333334, ans=0.025 2024-09-19 04:54:34,079 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:54:38,095 INFO [train.py:1198] (0/2) Epoch 49, batch 50, loss[loss=0.2334, ctc_loss=0.1573, cr_loss=0.3805, over 20927.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1425, cr_loss=0.3666, over 920943.09 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:55:07,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=869357.3333333334, ans=0.0 2024-09-19 04:55:08,573 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.336e+02 2.495e+02 2.828e+02 4.684e+02, threshold=4.990e+02, percent-clipped=0.0 2024-09-19 04:55:31,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-19 04:55:54,103 INFO [train.py:1198] (0/2) Epoch 49, batch 100, loss[loss=0.2161, ctc_loss=0.142, cr_loss=0.3706, over 20996.00 frames. ], tot_loss[loss=0.2107, ctc_loss=0.1386, cr_loss=0.3603, over 1621206.87 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:56:00,541 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:56:07,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=869470.6666666666, ans=0.125 2024-09-19 04:56:13,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=869470.6666666666, ans=0.0 2024-09-19 04:56:33,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=869499.0, ans=0.125 2024-09-19 04:56:50,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.52 vs. limit=5.0 2024-09-19 04:56:54,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=869555.6666666666, ans=0.0 2024-09-19 04:57:11,510 INFO [train.py:1198] (0/2) Epoch 49, batch 150, loss[loss=0.2166, ctc_loss=0.1405, cr_loss=0.3804, over 21011.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3664, over 2166467.22 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:57:41,806 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.225e+02 2.348e+02 2.491e+02 3.092e+02, threshold=4.696e+02, percent-clipped=0.0 2024-09-19 04:57:46,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=869640.6666666666, ans=0.2 2024-09-19 04:58:27,421 INFO [train.py:1198] (0/2) Epoch 49, batch 200, loss[loss=0.2028, ctc_loss=0.1295, cr_loss=0.3668, over 20893.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1418, cr_loss=0.3688, over 2603867.37 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:58:32,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=869725.6666666666, ans=0.125 2024-09-19 04:59:15,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=869810.6666666666, ans=0.125 2024-09-19 04:59:46,469 INFO [train.py:1198] (0/2) Epoch 49, batch 250, loss[loss=0.2048, ctc_loss=0.1322, cr_loss=0.363, over 21037.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1402, cr_loss=0.3653, over 2940138.95 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 04:59:48,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=869867.3333333334, ans=0.95 2024-09-19 04:59:57,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.21 vs. limit=10.0 2024-09-19 05:00:16,646 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.266e+02 2.387e+02 2.568e+02 3.879e+02, threshold=4.775e+02, percent-clipped=0.0 2024-09-19 05:00:34,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=869952.3333333334, ans=0.2 2024-09-19 05:00:54,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=869980.6666666666, ans=0.025 2024-09-19 05:00:56,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=869980.6666666666, ans=0.0 2024-09-19 05:01:01,809 INFO [train.py:1198] (0/2) Epoch 49, batch 300, loss[loss=0.2136, ctc_loss=0.1398, cr_loss=0.3691, over 20859.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1414, cr_loss=0.3668, over 3188041.74 frames. ], batch size: 65, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 05:01:30,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=870065.6666666666, ans=0.1 2024-09-19 05:02:16,911 INFO [train.py:1198] (0/2) Epoch 49, batch 350, loss[loss=0.2496, ctc_loss=0.1654, cr_loss=0.4211, over 20051.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1418, cr_loss=0.3669, over 3381908.02 frames. ], batch size: 80, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 05:02:23,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=870150.6666666666, ans=0.07 2024-09-19 05:02:38,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=870179.0, ans=0.125 2024-09-19 05:02:50,369 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.257e+02 2.370e+02 2.569e+02 3.205e+02, threshold=4.740e+02, percent-clipped=0.0 2024-09-19 05:02:54,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=12.0 2024-09-19 05:03:15,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=870235.6666666666, ans=0.1 2024-09-19 05:03:35,926 INFO [train.py:1198] (0/2) Epoch 49, batch 400, loss[loss=0.1977, ctc_loss=0.1269, cr_loss=0.3538, over 20957.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1411, cr_loss=0.3656, over 3544038.19 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 05:04:12,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=870349.0, ans=0.0 2024-09-19 05:04:54,247 INFO [train.py:1198] (0/2) Epoch 49, batch 450, loss[loss=0.2152, ctc_loss=0.1444, cr_loss=0.3537, over 20967.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3674, over 3663977.71 frames. ], batch size: 64, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 05:05:18,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=870462.3333333334, ans=0.07 2024-09-19 05:05:24,589 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.285e+02 2.393e+02 2.528e+02 6.540e+02, threshold=4.786e+02, percent-clipped=1.0 2024-09-19 05:05:26,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=870490.6666666666, ans=0.125 2024-09-19 05:06:09,394 INFO [train.py:1198] (0/2) Epoch 49, batch 500, loss[loss=0.2129, ctc_loss=0.1365, cr_loss=0.3821, over 20794.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.367, over 3771001.53 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2024-09-19 05:06:11,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-19 05:06:44,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=870632.3333333334, ans=0.125 2024-09-19 05:06:49,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=870632.3333333334, ans=0.1 2024-09-19 05:06:59,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=870660.6666666666, ans=0.125 2024-09-19 05:07:04,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=22.5 2024-09-19 05:07:10,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=870689.0, ans=0.2 2024-09-19 05:07:24,734 INFO [train.py:1198] (0/2) Epoch 49, batch 550, loss[loss=0.245, ctc_loss=0.1602, cr_loss=0.4239, over 19376.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1416, cr_loss=0.3677, over 3847188.15 frames. ], batch size: 90, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:07:29,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=870717.3333333334, ans=0.125 2024-09-19 05:07:29,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=870717.3333333334, ans=10.0 2024-09-19 05:07:29,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=870717.3333333334, ans=0.125 2024-09-19 05:07:54,629 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.217e+02 2.376e+02 2.506e+02 3.935e+02, threshold=4.752e+02, percent-clipped=0.0 2024-09-19 05:08:09,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.05 vs. limit=15.0 2024-09-19 05:08:42,914 INFO [train.py:1198] (0/2) Epoch 49, batch 600, loss[loss=0.2216, ctc_loss=0.1484, cr_loss=0.3659, over 21016.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1411, cr_loss=0.3661, over 3903384.84 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:08:59,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=870887.3333333334, ans=0.0 2024-09-19 05:09:20,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2024-09-19 05:09:36,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=870944.0, ans=0.1 2024-09-19 05:09:51,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=870972.3333333334, ans=0.125 2024-09-19 05:09:57,903 INFO [train.py:1198] (0/2) Epoch 49, batch 650, loss[loss=0.1675, ctc_loss=0.1099, cr_loss=0.288, over 20961.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1408, cr_loss=0.3643, over 3927584.45 frames. ], batch size: 49, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:09:58,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871000.6666666666, ans=0.1 2024-09-19 05:10:30,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.267e+02 2.419e+02 2.577e+02 3.151e+02, threshold=4.838e+02, percent-clipped=0.0 2024-09-19 05:10:30,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=871057.3333333334, ans=0.125 2024-09-19 05:11:02,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=871114.0, ans=0.125 2024-09-19 05:11:15,234 INFO [train.py:1198] (0/2) Epoch 49, batch 700, loss[loss=0.239, ctc_loss=0.1567, cr_loss=0.4116, over 20844.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1411, cr_loss=0.3659, over 3967980.58 frames. ], batch size: 65, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:11:20,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=871142.3333333334, ans=0.1 2024-09-19 05:11:42,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=871170.6666666666, ans=0.0 2024-09-19 05:12:30,431 INFO [train.py:1198] (0/2) Epoch 49, batch 750, loss[loss=0.1767, ctc_loss=0.1153, cr_loss=0.307, over 19917.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1412, cr_loss=0.3667, over 4007852.40 frames. ], batch size: 44, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:12:43,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=871284.0, ans=0.125 2024-09-19 05:13:00,667 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.271e+02 2.384e+02 2.575e+02 4.807e+02, threshold=4.767e+02, percent-clipped=0.0 2024-09-19 05:13:20,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=871369.0, ans=0.1 2024-09-19 05:13:46,022 INFO [train.py:1198] (0/2) Epoch 49, batch 800, loss[loss=0.2038, ctc_loss=0.1327, cr_loss=0.3553, over 21038.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3682, over 4025435.64 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:14:01,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=871454.0, ans=0.0 2024-09-19 05:14:44,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=871510.6666666666, ans=0.0 2024-09-19 05:15:05,310 INFO [train.py:1198] (0/2) Epoch 49, batch 850, loss[loss=0.2344, ctc_loss=0.1563, cr_loss=0.3902, over 20840.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3685, over 4056252.09 frames. ], batch size: 65, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:15:30,265 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=15.0 2024-09-19 05:15:35,551 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.360e+02 2.457e+02 2.650e+02 3.350e+02, threshold=4.914e+02, percent-clipped=0.0 2024-09-19 05:15:51,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-09-19 05:16:03,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2024-09-19 05:16:11,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=871680.6666666666, ans=0.0 2024-09-19 05:16:23,677 INFO [train.py:1198] (0/2) Epoch 49, batch 900, loss[loss=0.2337, ctc_loss=0.1557, cr_loss=0.3899, over 21012.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1426, cr_loss=0.3691, over 4061331.23 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:16:27,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-09-19 05:16:58,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=871765.6666666666, ans=0.125 2024-09-19 05:17:12,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=871794.0, ans=0.125 2024-09-19 05:17:14,147 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=12.0 2024-09-19 05:17:19,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=871794.0, ans=0.0 2024-09-19 05:17:27,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=871822.3333333334, ans=0.125 2024-09-19 05:17:35,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=871822.3333333334, ans=0.2 2024-09-19 05:17:39,232 INFO [train.py:1198] (0/2) Epoch 49, batch 950, loss[loss=0.2518, ctc_loss=0.1692, cr_loss=0.4132, over 20937.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1429, cr_loss=0.3702, over 4072120.89 frames. ], batch size: 64, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:18:09,303 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.259e+02 2.375e+02 2.568e+02 3.161e+02, threshold=4.751e+02, percent-clipped=0.0 2024-09-19 05:18:11,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=871907.3333333334, ans=0.0 2024-09-19 05:18:30,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=871935.6666666666, ans=0.0 2024-09-19 05:18:53,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=871992.3333333334, ans=0.035 2024-09-19 05:18:54,541 INFO [train.py:1198] (0/2) Epoch 49, batch 1000, loss[loss=0.2232, ctc_loss=0.1475, cr_loss=0.3785, over 20970.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.142, cr_loss=0.3686, over 4075105.80 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:19:08,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=872020.6666666666, ans=0.1 2024-09-19 05:19:10,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=872020.6666666666, ans=22.5 2024-09-19 05:19:15,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=872020.6666666666, ans=0.125 2024-09-19 05:20:13,324 INFO [train.py:1198] (0/2) Epoch 49, batch 1050, loss[loss=0.2109, ctc_loss=0.1336, cr_loss=0.3865, over 20891.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1422, cr_loss=0.3688, over 4085500.25 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:20:13,789 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:20:43,618 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.304e+02 2.454e+02 2.606e+02 3.776e+02, threshold=4.908e+02, percent-clipped=0.0 2024-09-19 05:20:58,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-19 05:21:21,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=872247.3333333334, ans=0.0 2024-09-19 05:21:26,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=872247.3333333334, ans=0.125 2024-09-19 05:21:29,598 INFO [train.py:1198] (0/2) Epoch 49, batch 1100, loss[loss=0.2037, ctc_loss=0.1346, cr_loss=0.3456, over 21025.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.3682, over 4090592.20 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:21:52,896 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-09-19 05:22:12,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=872332.3333333334, ans=0.5 2024-09-19 05:22:27,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=872360.6666666666, ans=0.1 2024-09-19 05:22:39,167 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-19 05:22:42,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=872389.0, ans=0.0 2024-09-19 05:22:47,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=872417.3333333334, ans=0.125 2024-09-19 05:22:49,131 INFO [train.py:1198] (0/2) Epoch 49, batch 1150, loss[loss=0.2143, ctc_loss=0.1385, cr_loss=0.3791, over 20865.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1413, cr_loss=0.3674, over 4083965.85 frames. ], batch size: 65, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:22:54,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=872417.3333333334, ans=0.125 2024-09-19 05:23:10,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=872445.6666666666, ans=0.025 2024-09-19 05:23:19,417 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.289e+02 2.377e+02 2.497e+02 3.518e+02, threshold=4.753e+02, percent-clipped=0.0 2024-09-19 05:23:39,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=872502.3333333334, ans=0.1 2024-09-19 05:23:48,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=872530.6666666666, ans=0.125 2024-09-19 05:24:04,827 INFO [train.py:1198] (0/2) Epoch 49, batch 1200, loss[loss=0.2053, ctc_loss=0.1339, cr_loss=0.3572, over 20882.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.141, cr_loss=0.3669, over 4098860.16 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:24:09,933 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2024-09-19 05:24:32,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=872587.3333333334, ans=0.125 2024-09-19 05:24:39,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=872615.6666666666, ans=0.125 2024-09-19 05:25:00,747 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-308000.pt 2024-09-19 05:25:06,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=872672.3333333334, ans=0.0 2024-09-19 05:25:18,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=872672.3333333334, ans=0.125 2024-09-19 05:25:21,352 INFO [train.py:1198] (0/2) Epoch 49, batch 1250, loss[loss=0.2354, ctc_loss=0.1569, cr_loss=0.3922, over 19506.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3668, over 4081032.13 frames. ], batch size: 90, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:25:21,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=872700.6666666666, ans=0.125 2024-09-19 05:25:38,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=872729.0, ans=0.025 2024-09-19 05:25:44,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=872729.0, ans=0.125 2024-09-19 05:25:54,874 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.274e+02 2.409e+02 2.576e+02 4.717e+02, threshold=4.817e+02, percent-clipped=0.0 2024-09-19 05:26:17,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=872785.6666666666, ans=0.0 2024-09-19 05:26:22,628 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:26:27,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=872814.0, ans=0.0 2024-09-19 05:26:31,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=872814.0, ans=0.2 2024-09-19 05:26:40,852 INFO [train.py:1198] (0/2) Epoch 49, batch 1300, loss[loss=0.2078, ctc_loss=0.1366, cr_loss=0.3557, over 20793.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1407, cr_loss=0.3653, over 4092820.23 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:26:50,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=872842.3333333334, ans=0.0 2024-09-19 05:27:20,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=872899.0, ans=0.125 2024-09-19 05:27:27,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=872927.3333333334, ans=0.125 2024-09-19 05:27:52,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=872955.6666666666, ans=0.125 2024-09-19 05:27:55,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=872955.6666666666, ans=0.125 2024-09-19 05:27:58,675 INFO [train.py:1198] (0/2) Epoch 49, batch 1350, loss[loss=0.2184, ctc_loss=0.1443, cr_loss=0.3705, over 21044.00 frames. ], tot_loss[loss=0.2122, ctc_loss=0.1395, cr_loss=0.3632, over 4093759.20 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:28:15,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=873012.3333333334, ans=0.125 2024-09-19 05:28:29,233 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.209e+02 2.364e+02 2.594e+02 5.305e+02, threshold=4.728e+02, percent-clipped=1.0 2024-09-19 05:28:37,729 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-09-19 05:28:44,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=873069.0, ans=0.0 2024-09-19 05:28:49,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=873069.0, ans=0.125 2024-09-19 05:28:57,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=873069.0, ans=0.125 2024-09-19 05:29:14,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=22.5 2024-09-19 05:29:15,015 INFO [train.py:1198] (0/2) Epoch 49, batch 1400, loss[loss=0.2123, ctc_loss=0.1398, cr_loss=0.3627, over 20972.00 frames. ], tot_loss[loss=0.2127, ctc_loss=0.1399, cr_loss=0.3641, over 4091005.21 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:29:24,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=873125.6666666666, ans=0.125 2024-09-19 05:29:42,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=873154.0, ans=0.125 2024-09-19 05:30:02,451 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:30:11,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=873210.6666666666, ans=0.0 2024-09-19 05:30:19,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873239.0, ans=0.1 2024-09-19 05:30:25,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-19 05:30:27,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=873239.0, ans=0.125 2024-09-19 05:30:30,649 INFO [train.py:1198] (0/2) Epoch 49, batch 1450, loss[loss=0.2011, ctc_loss=0.1324, cr_loss=0.3437, over 20368.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.1395, cr_loss=0.3637, over 4098948.35 frames. ], batch size: 74, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:30:37,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=873267.3333333334, ans=0.2 2024-09-19 05:30:44,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=873295.6666666666, ans=0.0 2024-09-19 05:31:01,728 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.243e+02 2.369e+02 2.539e+02 3.109e+02, threshold=4.738e+02, percent-clipped=0.0 2024-09-19 05:31:02,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=873324.0, ans=0.0 2024-09-19 05:31:50,063 INFO [train.py:1198] (0/2) Epoch 49, batch 1500, loss[loss=0.2031, ctc_loss=0.1333, cr_loss=0.349, over 20824.00 frames. ], tot_loss[loss=0.2125, ctc_loss=0.1397, cr_loss=0.3639, over 4105709.16 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:32:58,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=873522.3333333334, ans=0.125 2024-09-19 05:33:06,012 INFO [train.py:1198] (0/2) Epoch 49, batch 1550, loss[loss=0.1968, ctc_loss=0.126, cr_loss=0.354, over 21047.00 frames. ], tot_loss[loss=0.2118, ctc_loss=0.1392, cr_loss=0.3632, over 4097639.01 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:33:14,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2024-09-19 05:33:39,113 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.256e+02 2.388e+02 2.544e+02 3.857e+02, threshold=4.776e+02, percent-clipped=0.0 2024-09-19 05:33:57,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=873635.6666666666, ans=0.0 2024-09-19 05:34:23,979 INFO [train.py:1198] (0/2) Epoch 49, batch 1600, loss[loss=0.2119, ctc_loss=0.1396, cr_loss=0.3613, over 21003.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.1397, cr_loss=0.3634, over 4093239.83 frames. ], batch size: 52, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:34:30,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=873692.3333333334, ans=0.125 2024-09-19 05:34:42,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=873720.6666666666, ans=0.0 2024-09-19 05:35:12,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=873777.3333333334, ans=0.125 2024-09-19 05:35:16,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2024-09-19 05:35:25,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.71 vs. limit=10.0 2024-09-19 05:35:39,990 INFO [train.py:1198] (0/2) Epoch 49, batch 1650, loss[loss=0.2151, ctc_loss=0.1402, cr_loss=0.3744, over 20977.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1398, cr_loss=0.3639, over 4106939.18 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:35:49,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=873834.0, ans=0.0 2024-09-19 05:36:10,023 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.253e+02 2.363e+02 2.502e+02 3.911e+02, threshold=4.727e+02, percent-clipped=0.0 2024-09-19 05:36:25,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=873919.0, ans=0.0 2024-09-19 05:36:31,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873919.0, ans=0.1 2024-09-19 05:36:33,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=873919.0, ans=0.0 2024-09-19 05:36:35,295 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2024-09-19 05:36:55,620 INFO [train.py:1198] (0/2) Epoch 49, batch 1700, loss[loss=0.2159, ctc_loss=0.1426, cr_loss=0.3665, over 20643.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1405, cr_loss=0.3644, over 4109524.79 frames. ], batch size: 68, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:38:12,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=874117.3333333334, ans=0.125 2024-09-19 05:38:12,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=874117.3333333334, ans=0.025 2024-09-19 05:38:13,851 INFO [train.py:1198] (0/2) Epoch 49, batch 1750, loss[loss=0.2528, ctc_loss=0.1724, cr_loss=0.4024, over 20868.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.1401, cr_loss=0.3637, over 4115395.77 frames. ], batch size: 65, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:38:35,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=874145.6666666666, ans=0.125 2024-09-19 05:38:45,433 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.335e+02 2.473e+02 2.656e+02 3.671e+02, threshold=4.946e+02, percent-clipped=0.0 2024-09-19 05:38:50,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.91 vs. limit=22.5 2024-09-19 05:38:53,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=874174.0, ans=0.95 2024-09-19 05:39:08,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=874202.3333333334, ans=0.0 2024-09-19 05:39:33,119 INFO [train.py:1198] (0/2) Epoch 49, batch 1800, loss[loss=0.2136, ctc_loss=0.1396, cr_loss=0.3703, over 20989.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1399, cr_loss=0.3634, over 4094103.67 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 05:39:41,397 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-19 05:39:59,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=874287.3333333334, ans=0.125 2024-09-19 05:40:12,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=874315.6666666666, ans=0.125 2024-09-19 05:40:47,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=874400.6666666666, ans=0.0 2024-09-19 05:40:48,688 INFO [train.py:1198] (0/2) Epoch 49, batch 1850, loss[loss=0.2018, ctc_loss=0.1303, cr_loss=0.3575, over 19950.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1408, cr_loss=0.3658, over 4094624.56 frames. ], batch size: 44, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 05:40:58,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=874400.6666666666, ans=0.125 2024-09-19 05:41:01,751 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.97 vs. limit=10.0 2024-09-19 05:41:13,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=874429.0, ans=0.1 2024-09-19 05:41:20,417 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.286e+02 2.400e+02 2.579e+02 4.218e+02, threshold=4.799e+02, percent-clipped=0.0 2024-09-19 05:41:28,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=874457.3333333334, ans=0.125 2024-09-19 05:41:53,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=874514.0, ans=0.125 2024-09-19 05:41:55,706 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-09-19 05:41:56,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=874514.0, ans=0.0 2024-09-19 05:42:04,194 INFO [train.py:1198] (0/2) Epoch 49, batch 1900, loss[loss=0.1796, ctc_loss=0.1147, cr_loss=0.3241, over 20955.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.141, cr_loss=0.3666, over 4100261.98 frames. ], batch size: 48, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 05:42:18,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.99 vs. limit=12.0 2024-09-19 05:42:30,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=874570.6666666666, ans=0.1 2024-09-19 05:42:33,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=874599.0, ans=0.015 2024-09-19 05:42:37,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=874599.0, ans=0.125 2024-09-19 05:42:55,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=874627.3333333334, ans=0.0 2024-09-19 05:43:22,697 INFO [train.py:1198] (0/2) Epoch 49, batch 1950, loss[loss=0.2084, ctc_loss=0.138, cr_loss=0.3519, over 20857.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.368, over 4088673.68 frames. ], batch size: 65, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 05:43:35,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=874684.0, ans=0.5 2024-09-19 05:43:38,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=874712.3333333334, ans=0.125 2024-09-19 05:43:44,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=874712.3333333334, ans=0.125 2024-09-19 05:43:54,904 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.253e+02 2.406e+02 2.618e+02 3.664e+02, threshold=4.813e+02, percent-clipped=0.0 2024-09-19 05:43:55,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=874740.6666666666, ans=0.1 2024-09-19 05:44:04,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=874740.6666666666, ans=0.125 2024-09-19 05:44:15,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=874769.0, ans=0.125 2024-09-19 05:44:33,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=12.0 2024-09-19 05:44:41,326 INFO [train.py:1198] (0/2) Epoch 49, batch 2000, loss[loss=0.2034, ctc_loss=0.1325, cr_loss=0.3542, over 20966.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3683, over 4087652.57 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:45:09,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=874854.0, ans=0.025 2024-09-19 05:45:15,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=874882.3333333334, ans=0.025 2024-09-19 05:45:36,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=874910.6666666666, ans=0.125 2024-09-19 05:45:57,235 INFO [train.py:1198] (0/2) Epoch 49, batch 2050, loss[loss=0.2134, ctc_loss=0.1408, cr_loss=0.3628, over 20771.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1422, cr_loss=0.3682, over 4081951.73 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:46:29,032 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.262e+02 2.437e+02 2.521e+02 3.820e+02, threshold=4.874e+02, percent-clipped=0.0 2024-09-19 05:46:30,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=875024.0, ans=0.04949747468305833 2024-09-19 05:47:05,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=875080.6666666666, ans=0.0 2024-09-19 05:47:06,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=875080.6666666666, ans=0.035 2024-09-19 05:47:12,675 INFO [train.py:1198] (0/2) Epoch 49, batch 2100, loss[loss=0.2364, ctc_loss=0.1568, cr_loss=0.3983, over 20948.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.3689, over 4083276.54 frames. ], batch size: 64, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:47:12,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=875109.0, ans=0.125 2024-09-19 05:47:44,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=875165.6666666666, ans=0.025 2024-09-19 05:48:28,560 INFO [train.py:1198] (0/2) Epoch 49, batch 2150, loss[loss=0.2419, ctc_loss=0.1572, cr_loss=0.4232, over 20858.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3687, over 4089954.65 frames. ], batch size: 65, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:48:28,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=875250.6666666666, ans=0.2 2024-09-19 05:49:01,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=875307.3333333334, ans=0.2 2024-09-19 05:49:03,026 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.237e+02 2.459e+02 2.612e+02 3.793e+02, threshold=4.918e+02, percent-clipped=0.0 2024-09-19 05:49:10,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=875307.3333333334, ans=0.1 2024-09-19 05:49:39,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=875364.0, ans=0.0 2024-09-19 05:49:46,298 INFO [train.py:1198] (0/2) Epoch 49, batch 2200, loss[loss=0.2438, ctc_loss=0.1622, cr_loss=0.4082, over 20670.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1426, cr_loss=0.3697, over 4088588.91 frames. ], batch size: 68, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:49:57,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=875392.3333333334, ans=0.0 2024-09-19 05:49:59,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=875392.3333333334, ans=0.0 2024-09-19 05:50:45,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=875477.3333333334, ans=0.125 2024-09-19 05:50:52,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=875505.6666666666, ans=0.07 2024-09-19 05:51:04,096 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:51:05,259 INFO [train.py:1198] (0/2) Epoch 49, batch 2250, loss[loss=0.2231, ctc_loss=0.1508, cr_loss=0.3617, over 21062.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3677, over 4102326.53 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:51:37,156 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.293e+02 2.440e+02 2.592e+02 4.236e+02, threshold=4.879e+02, percent-clipped=0.0 2024-09-19 05:51:47,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=875590.6666666666, ans=0.2 2024-09-19 05:52:01,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=875619.0, ans=0.125 2024-09-19 05:52:15,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=22.5 2024-09-19 05:52:20,911 INFO [train.py:1198] (0/2) Epoch 49, batch 2300, loss[loss=0.2196, ctc_loss=0.1427, cr_loss=0.3846, over 20955.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1412, cr_loss=0.3668, over 4105011.81 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:52:24,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=875675.6666666666, ans=0.125 2024-09-19 05:52:37,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=875704.0, ans=0.0 2024-09-19 05:52:40,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2024-09-19 05:52:42,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=875704.0, ans=0.0 2024-09-19 05:52:50,542 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.88 vs. limit=10.0 2024-09-19 05:52:51,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=875732.3333333334, ans=0.0 2024-09-19 05:53:21,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-09-19 05:53:36,112 INFO [train.py:1198] (0/2) Epoch 49, batch 2350, loss[loss=0.2183, ctc_loss=0.1442, cr_loss=0.3703, over 20828.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.3661, over 4114868.23 frames. ], batch size: 65, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:54:07,566 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.283e+02 2.395e+02 2.536e+02 3.291e+02, threshold=4.790e+02, percent-clipped=0.0 2024-09-19 05:54:53,945 INFO [train.py:1198] (0/2) Epoch 49, batch 2400, loss[loss=0.2226, ctc_loss=0.1499, cr_loss=0.3631, over 19494.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1401, cr_loss=0.3651, over 4120786.40 frames. ], batch size: 90, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:55:33,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=876015.6666666666, ans=0.2 2024-09-19 05:55:33,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=876015.6666666666, ans=0.0 2024-09-19 05:55:45,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=876044.0, ans=0.125 2024-09-19 05:56:11,714 INFO [train.py:1198] (0/2) Epoch 49, batch 2450, loss[loss=0.2111, ctc_loss=0.1376, cr_loss=0.3673, over 20647.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1406, cr_loss=0.366, over 4120717.87 frames. ], batch size: 68, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:56:28,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=876129.0, ans=0.125 2024-09-19 05:56:43,555 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.206e+02 2.361e+02 2.528e+02 4.866e+02, threshold=4.722e+02, percent-clipped=1.0 2024-09-19 05:56:43,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876157.3333333334, ans=0.1 2024-09-19 05:56:44,166 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=22.5 2024-09-19 05:57:00,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=876185.6666666666, ans=0.2 2024-09-19 05:57:27,402 INFO [train.py:1198] (0/2) Epoch 49, batch 2500, loss[loss=0.192, ctc_loss=0.123, cr_loss=0.3448, over 20946.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1407, cr_loss=0.3665, over 4117658.42 frames. ], batch size: 50, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:58:02,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=876299.0, ans=0.025 2024-09-19 05:58:17,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=12.0 2024-09-19 05:58:42,628 INFO [train.py:1198] (0/2) Epoch 49, batch 2550, loss[loss=0.2497, ctc_loss=0.1705, cr_loss=0.3961, over 13993.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1402, cr_loss=0.3651, over 4102391.61 frames. ], batch size: 149, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 05:58:56,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876412.3333333334, ans=0.1 2024-09-19 05:59:10,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-09-19 05:59:14,362 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.041e+02 2.260e+02 2.421e+02 2.630e+02 4.143e+02, threshold=4.843e+02, percent-clipped=0.0 2024-09-19 05:59:49,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=876497.3333333334, ans=0.025 2024-09-19 05:59:55,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=876497.3333333334, ans=0.125 2024-09-19 05:59:58,004 INFO [train.py:1198] (0/2) Epoch 49, batch 2600, loss[loss=0.2577, ctc_loss=0.1729, cr_loss=0.4244, over 19425.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1412, cr_loss=0.3666, over 4106525.07 frames. ], batch size: 90, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:00:05,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876525.6666666666, ans=0.1 2024-09-19 06:00:13,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=876554.0, ans=0.0 2024-09-19 06:00:18,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.93 vs. limit=15.0 2024-09-19 06:00:26,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-19 06:00:34,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876582.3333333334, ans=0.1 2024-09-19 06:00:58,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=876610.6666666666, ans=0.125 2024-09-19 06:01:15,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=876667.3333333334, ans=0.125 2024-09-19 06:01:15,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=876667.3333333334, ans=0.05 2024-09-19 06:01:16,422 INFO [train.py:1198] (0/2) Epoch 49, batch 2650, loss[loss=0.1911, ctc_loss=0.1262, cr_loss=0.3244, over 20785.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.3657, over 4098937.52 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:01:19,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=876667.3333333334, ans=0.125 2024-09-19 06:01:24,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=876667.3333333334, ans=0.0 2024-09-19 06:01:51,433 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.246e+02 2.382e+02 2.567e+02 4.659e+02, threshold=4.764e+02, percent-clipped=0.0 2024-09-19 06:01:53,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.20 vs. limit=6.0 2024-09-19 06:02:19,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=876780.6666666666, ans=0.2 2024-09-19 06:02:24,179 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-19 06:02:35,749 INFO [train.py:1198] (0/2) Epoch 49, batch 2700, loss[loss=0.2092, ctc_loss=0.1394, cr_loss=0.3489, over 20771.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1407, cr_loss=0.3652, over 4076247.86 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:02:43,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876809.0, ans=0.1 2024-09-19 06:02:52,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=876837.3333333334, ans=0.125 2024-09-19 06:02:55,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876837.3333333334, ans=0.1 2024-09-19 06:03:03,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.68 vs. limit=22.5 2024-09-19 06:03:04,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=876865.6666666666, ans=0.2 2024-09-19 06:03:13,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=876865.6666666666, ans=0.2 2024-09-19 06:03:19,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-09-19 06:03:37,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876922.3333333334, ans=0.1 2024-09-19 06:03:45,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=876922.3333333334, ans=0.0 2024-09-19 06:03:50,965 INFO [train.py:1198] (0/2) Epoch 49, batch 2750, loss[loss=0.2685, ctc_loss=0.1843, cr_loss=0.4211, over 14036.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1411, cr_loss=0.3662, over 4067505.99 frames. ], batch size: 149, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:04:00,686 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.67 vs. limit=6.0 2024-09-19 06:04:06,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876979.0, ans=0.1 2024-09-19 06:04:22,899 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.269e+02 2.435e+02 2.609e+02 3.166e+02, threshold=4.870e+02, percent-clipped=0.0 2024-09-19 06:04:38,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=22.5 2024-09-19 06:04:52,021 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:05:06,633 INFO [train.py:1198] (0/2) Epoch 49, batch 2800, loss[loss=0.2491, ctc_loss=0.1648, cr_loss=0.4217, over 20829.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.3674, over 4078189.78 frames. ], batch size: 65, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:05:53,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=877177.3333333334, ans=0.125 2024-09-19 06:06:25,208 INFO [train.py:1198] (0/2) Epoch 49, batch 2850, loss[loss=0.2158, ctc_loss=0.1413, cr_loss=0.3725, over 20953.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1414, cr_loss=0.3671, over 4080649.49 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:06:46,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=877262.3333333334, ans=0.125 2024-09-19 06:06:56,752 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.265e+02 2.353e+02 2.511e+02 6.871e+02, threshold=4.706e+02, percent-clipped=1.0 2024-09-19 06:07:12,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=877319.0, ans=0.125 2024-09-19 06:07:13,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=877319.0, ans=0.125 2024-09-19 06:07:14,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=877319.0, ans=0.125 2024-09-19 06:07:18,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=877319.0, ans=0.2 2024-09-19 06:07:40,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877347.3333333334, ans=0.1 2024-09-19 06:07:42,890 INFO [train.py:1198] (0/2) Epoch 49, batch 2900, loss[loss=0.1994, ctc_loss=0.1321, cr_loss=0.3366, over 21073.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1411, cr_loss=0.3669, over 4085219.00 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:08:02,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=877404.0, ans=0.04949747468305833 2024-09-19 06:08:10,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=877404.0, ans=0.125 2024-09-19 06:08:27,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877460.6666666666, ans=0.1 2024-09-19 06:08:30,179 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:08:39,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877460.6666666666, ans=0.1 2024-09-19 06:08:58,508 INFO [train.py:1198] (0/2) Epoch 49, batch 2950, loss[loss=0.2276, ctc_loss=0.1512, cr_loss=0.3822, over 21054.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1411, cr_loss=0.3663, over 4090025.85 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:09:19,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=877545.6666666666, ans=0.1 2024-09-19 06:09:29,884 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.260e+02 2.394e+02 2.539e+02 2.981e+02, threshold=4.788e+02, percent-clipped=0.0 2024-09-19 06:09:33,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=877574.0, ans=0.125 2024-09-19 06:09:45,693 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2024-09-19 06:09:56,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877602.3333333334, ans=0.1 2024-09-19 06:10:00,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=877630.6666666666, ans=0.035 2024-09-19 06:10:00,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=877630.6666666666, ans=0.0 2024-09-19 06:10:11,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=877630.6666666666, ans=0.125 2024-09-19 06:10:13,865 INFO [train.py:1198] (0/2) Epoch 49, batch 3000, loss[loss=0.2121, ctc_loss=0.1394, cr_loss=0.3633, over 20810.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1412, cr_loss=0.3672, over 4104490.42 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:10:13,866 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 06:10:31,735 INFO [train.py:1230] (0/2) Epoch 49, validation: loss=0.03905, ctc_loss=0.03905, cr_loss=1.593e-14, over 944034.00 frames. 2024-09-19 06:10:31,735 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-19 06:10:41,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877659.0, ans=0.1 2024-09-19 06:11:45,588 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.32 vs. limit=22.5 2024-09-19 06:11:50,767 INFO [train.py:1198] (0/2) Epoch 49, batch 3050, loss[loss=0.1831, ctc_loss=0.1212, cr_loss=0.3097, over 20947.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1415, cr_loss=0.3673, over 4098810.03 frames. ], batch size: 51, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:12:09,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877829.0, ans=0.1 2024-09-19 06:12:22,939 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.331e+02 2.450e+02 2.618e+02 3.633e+02, threshold=4.901e+02, percent-clipped=0.0 2024-09-19 06:12:47,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=877885.6666666666, ans=0.125 2024-09-19 06:12:52,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=877914.0, ans=0.025 2024-09-19 06:13:10,105 INFO [train.py:1198] (0/2) Epoch 49, batch 3100, loss[loss=0.2272, ctc_loss=0.1493, cr_loss=0.3894, over 20660.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1418, cr_loss=0.3685, over 4107521.72 frames. ], batch size: 68, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:13:10,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877942.3333333334, ans=0.1 2024-09-19 06:13:13,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=22.5 2024-09-19 06:13:18,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=877942.3333333334, ans=0.2 2024-09-19 06:14:08,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=878027.3333333334, ans=0.125 2024-09-19 06:14:21,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=878055.6666666666, ans=0.125 2024-09-19 06:14:25,896 INFO [train.py:1198] (0/2) Epoch 49, batch 3150, loss[loss=0.1715, ctc_loss=0.1123, cr_loss=0.2958, over 21015.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1414, cr_loss=0.3676, over 4109554.52 frames. ], batch size: 52, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 06:14:35,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=878084.0, ans=0.125 2024-09-19 06:14:59,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.293e+02 2.399e+02 2.527e+02 8.432e+02, threshold=4.798e+02, percent-clipped=1.0 2024-09-19 06:15:03,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=878140.6666666666, ans=0.2 2024-09-19 06:15:14,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=878169.0, ans=0.125 2024-09-19 06:15:41,636 INFO [train.py:1198] (0/2) Epoch 49, batch 3200, loss[loss=0.2605, ctc_loss=0.1797, cr_loss=0.4041, over 18504.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1416, cr_loss=0.3677, over 4110174.83 frames. ], batch size: 108, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:16:00,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=878254.0, ans=0.125 2024-09-19 06:16:10,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=878282.3333333334, ans=0.125 2024-09-19 06:16:41,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=878339.0, ans=0.1 2024-09-19 06:16:57,213 INFO [train.py:1198] (0/2) Epoch 49, batch 3250, loss[loss=0.2009, ctc_loss=0.1308, cr_loss=0.3501, over 20919.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1412, cr_loss=0.367, over 4110326.56 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:17:05,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=878367.3333333334, ans=0.0 2024-09-19 06:17:33,588 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.268e+02 2.398e+02 2.600e+02 3.640e+02, threshold=4.797e+02, percent-clipped=0.0 2024-09-19 06:17:54,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2024-09-19 06:17:55,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=878452.3333333334, ans=0.125 2024-09-19 06:18:16,013 INFO [train.py:1198] (0/2) Epoch 49, batch 3300, loss[loss=0.1871, ctc_loss=0.1184, cr_loss=0.3435, over 20987.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.3689, over 4095972.51 frames. ], batch size: 51, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:18:22,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=878509.0, ans=0.125 2024-09-19 06:19:01,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=878565.6666666666, ans=0.125 2024-09-19 06:19:01,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=878565.6666666666, ans=0.125 2024-09-19 06:19:34,731 INFO [train.py:1198] (0/2) Epoch 49, batch 3350, loss[loss=0.2388, ctc_loss=0.1584, cr_loss=0.402, over 20314.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1429, cr_loss=0.3696, over 4098843.78 frames. ], batch size: 74, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:19:36,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=878650.6666666666, ans=0.125 2024-09-19 06:20:08,229 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.294e+02 2.391e+02 2.564e+02 4.774e+02, threshold=4.781e+02, percent-clipped=0.0 2024-09-19 06:20:13,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=878707.3333333334, ans=0.04949747468305833 2024-09-19 06:20:32,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=878735.6666666666, ans=0.125 2024-09-19 06:20:32,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=878735.6666666666, ans=0.125 2024-09-19 06:20:50,616 INFO [train.py:1198] (0/2) Epoch 49, batch 3400, loss[loss=0.1908, ctc_loss=0.1235, cr_loss=0.3365, over 20874.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1435, cr_loss=0.371, over 4100257.21 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:20:52,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=878792.3333333334, ans=0.125 2024-09-19 06:22:06,309 INFO [train.py:1198] (0/2) Epoch 49, batch 3450, loss[loss=0.2326, ctc_loss=0.1571, cr_loss=0.3773, over 18446.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1436, cr_loss=0.3711, over 4091331.80 frames. ], batch size: 108, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:22:20,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=878962.3333333334, ans=15.0 2024-09-19 06:22:39,477 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.282e+02 2.423e+02 2.611e+02 3.875e+02, threshold=4.846e+02, percent-clipped=0.0 2024-09-19 06:22:50,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=879019.0, ans=10.0 2024-09-19 06:22:50,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=879019.0, ans=0.025 2024-09-19 06:23:16,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=879047.3333333334, ans=0.05 2024-09-19 06:23:24,837 INFO [train.py:1198] (0/2) Epoch 49, batch 3500, loss[loss=0.2103, ctc_loss=0.1369, cr_loss=0.3667, over 21023.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.143, cr_loss=0.37, over 4099905.68 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:23:32,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=879075.6666666666, ans=0.025 2024-09-19 06:23:55,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=879132.3333333334, ans=0.125 2024-09-19 06:23:57,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=879132.3333333334, ans=0.04949747468305833 2024-09-19 06:23:58,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=879132.3333333334, ans=0.2 2024-09-19 06:24:13,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=879160.6666666666, ans=0.025 2024-09-19 06:24:43,625 INFO [train.py:1198] (0/2) Epoch 49, batch 3550, loss[loss=0.2228, ctc_loss=0.1472, cr_loss=0.378, over 19405.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1429, cr_loss=0.3698, over 4092794.13 frames. ], batch size: 90, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:24:48,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=879217.3333333334, ans=0.2 2024-09-19 06:25:12,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=879274.0, ans=0.0 2024-09-19 06:25:16,678 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.268e+02 2.388e+02 2.557e+02 3.888e+02, threshold=4.776e+02, percent-clipped=0.0 2024-09-19 06:25:26,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=879274.0, ans=0.125 2024-09-19 06:25:58,669 INFO [train.py:1198] (0/2) Epoch 49, batch 3600, loss[loss=0.2331, ctc_loss=0.1538, cr_loss=0.3963, over 20063.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1424, cr_loss=0.3696, over 4106698.58 frames. ], batch size: 80, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:26:22,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=879387.3333333334, ans=0.1 2024-09-19 06:26:34,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-19 06:26:44,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=879444.0, ans=0.025 2024-09-19 06:26:48,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=879444.0, ans=10.0 2024-09-19 06:27:07,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2024-09-19 06:27:10,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=879472.3333333334, ans=0.0 2024-09-19 06:27:14,428 INFO [train.py:1198] (0/2) Epoch 49, batch 3650, loss[loss=0.2172, ctc_loss=0.1416, cr_loss=0.378, over 21078.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1421, cr_loss=0.3686, over 4110522.51 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:27:38,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=879529.0, ans=0.125 2024-09-19 06:27:47,758 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.233e+02 2.351e+02 2.545e+02 5.315e+02, threshold=4.701e+02, percent-clipped=1.0 2024-09-19 06:27:54,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=879557.3333333334, ans=0.125 2024-09-19 06:28:30,075 INFO [train.py:1198] (0/2) Epoch 49, batch 3700, loss[loss=0.2236, ctc_loss=0.1471, cr_loss=0.3824, over 20861.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1409, cr_loss=0.367, over 4115391.49 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:28:55,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=879670.6666666666, ans=0.0 2024-09-19 06:29:27,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=879727.3333333334, ans=0.125 2024-09-19 06:29:42,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=879755.6666666666, ans=0.2 2024-09-19 06:29:48,135 INFO [train.py:1198] (0/2) Epoch 49, batch 3750, loss[loss=0.2526, ctc_loss=0.1685, cr_loss=0.4202, over 20853.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1409, cr_loss=0.3671, over 4116870.24 frames. ], batch size: 65, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:30:25,389 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.220e+02 2.318e+02 2.478e+02 2.994e+02, threshold=4.637e+02, percent-clipped=0.0 2024-09-19 06:31:02,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=879897.3333333334, ans=0.125 2024-09-19 06:31:06,102 INFO [train.py:1198] (0/2) Epoch 49, batch 3800, loss[loss=0.23, ctc_loss=0.1514, cr_loss=0.3927, over 20293.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.141, cr_loss=0.3672, over 4117043.41 frames. ], batch size: 74, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 06:32:02,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=880010.6666666666, ans=0.125 2024-09-19 06:32:21,719 INFO [train.py:1198] (0/2) Epoch 49, batch 3850, loss[loss=0.2152, ctc_loss=0.145, cr_loss=0.3506, over 21069.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.141, cr_loss=0.3665, over 4111652.20 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 06:32:43,809 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=22.5 2024-09-19 06:32:56,740 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.261e+02 2.383e+02 2.507e+02 3.054e+02, threshold=4.766e+02, percent-clipped=0.0 2024-09-19 06:32:58,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=880124.0, ans=10.0 2024-09-19 06:33:37,911 INFO [train.py:1198] (0/2) Epoch 49, batch 3900, loss[loss=0.2171, ctc_loss=0.1446, cr_loss=0.3621, over 20894.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1406, cr_loss=0.3652, over 4117267.88 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 06:33:52,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=880237.3333333334, ans=0.05 2024-09-19 06:34:34,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=22.5 2024-09-19 06:34:39,319 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.99 vs. limit=10.0 2024-09-19 06:34:56,584 INFO [train.py:1198] (0/2) Epoch 49, batch 3950, loss[loss=0.2302, ctc_loss=0.152, cr_loss=0.3907, over 20675.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1412, cr_loss=0.3661, over 4103041.34 frames. ], batch size: 71, lr: 1.68e-03, grad_scale: 16.0 2024-09-19 06:35:31,515 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.290e+02 2.471e+02 2.656e+02 3.794e+02, threshold=4.941e+02, percent-clipped=0.0 2024-09-19 06:35:33,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=880407.3333333334, ans=0.0 2024-09-19 06:35:50,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=22.5 2024-09-19 06:35:56,639 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2024-09-19 06:36:14,965 INFO [train.py:1198] (0/2) Epoch 49, batch 4000, loss[loss=0.2238, ctc_loss=0.1469, cr_loss=0.3843, over 20653.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1411, cr_loss=0.3665, over 4107451.38 frames. ], batch size: 68, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:36:21,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880492.3333333334, ans=0.1 2024-09-19 06:36:37,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-09-19 06:37:31,307 INFO [train.py:1198] (0/2) Epoch 49, batch 4050, loss[loss=0.1956, ctc_loss=0.1265, cr_loss=0.3452, over 20871.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1406, cr_loss=0.3654, over 4095410.35 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:37:39,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-19 06:38:06,192 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.291e+02 2.397e+02 2.530e+02 3.380e+02, threshold=4.794e+02, percent-clipped=0.0 2024-09-19 06:38:12,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=880690.6666666666, ans=10.0 2024-09-19 06:38:26,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=880719.0, ans=0.125 2024-09-19 06:38:45,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-09-19 06:38:47,373 INFO [train.py:1198] (0/2) Epoch 49, batch 4100, loss[loss=0.2219, ctc_loss=0.1446, cr_loss=0.3864, over 20971.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1409, cr_loss=0.3662, over 4099864.28 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:39:02,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=880804.0, ans=0.0 2024-09-19 06:39:07,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=880804.0, ans=0.2 2024-09-19 06:39:21,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=880832.3333333334, ans=0.125 2024-09-19 06:39:38,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=16.91 vs. limit=15.0 2024-09-19 06:40:03,579 INFO [train.py:1198] (0/2) Epoch 49, batch 4150, loss[loss=0.2181, ctc_loss=0.1434, cr_loss=0.3732, over 20887.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1401, cr_loss=0.3649, over 4089885.68 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2024-09-19 06:40:06,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=880917.3333333334, ans=0.125 2024-09-19 06:40:08,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=880917.3333333334, ans=0.125 2024-09-19 06:40:27,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=880945.6666666666, ans=0.0 2024-09-19 06:40:41,033 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.294e+02 2.413e+02 2.622e+02 3.774e+02, threshold=4.826e+02, percent-clipped=0.0 2024-09-19 06:41:00,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.89 vs. limit=10.0 2024-09-19 06:41:05,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=881030.6666666666, ans=0.125 2024-09-19 06:41:16,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=881030.6666666666, ans=0.125 2024-09-19 06:41:21,778 INFO [train.py:1198] (0/2) Epoch 49, batch 4200, loss[loss=0.2324, ctc_loss=0.1547, cr_loss=0.3885, over 19235.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1402, cr_loss=0.3649, over 4089387.21 frames. ], batch size: 90, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:41:46,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=881087.3333333334, ans=0.125 2024-09-19 06:42:06,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=881115.6666666666, ans=0.025 2024-09-19 06:42:41,250 INFO [train.py:1198] (0/2) Epoch 49, batch 4250, loss[loss=0.2199, ctc_loss=0.1442, cr_loss=0.3788, over 20933.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1409, cr_loss=0.3664, over 4077168.43 frames. ], batch size: 60, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:42:50,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=881200.6666666666, ans=0.125 2024-09-19 06:42:52,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=881200.6666666666, ans=0.2 2024-09-19 06:43:03,402 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:43:16,512 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.285e+02 2.411e+02 2.605e+02 4.016e+02, threshold=4.822e+02, percent-clipped=0.0 2024-09-19 06:43:41,084 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=22.5 2024-09-19 06:43:57,204 INFO [train.py:1198] (0/2) Epoch 49, batch 4300, loss[loss=0.1775, ctc_loss=0.1138, cr_loss=0.3185, over 20972.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1416, cr_loss=0.3674, over 4076750.52 frames. ], batch size: 48, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:44:13,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=881370.6666666666, ans=0.125 2024-09-19 06:44:31,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.07 vs. limit=22.5 2024-09-19 06:44:39,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=881399.0, ans=0.125 2024-09-19 06:44:41,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-09-19 06:44:44,837 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-09-19 06:44:56,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=881455.6666666666, ans=0.1 2024-09-19 06:45:12,896 INFO [train.py:1198] (0/2) Epoch 49, batch 4350, loss[loss=0.1764, ctc_loss=0.1128, cr_loss=0.3177, over 20991.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1412, cr_loss=0.3662, over 4081023.00 frames. ], batch size: 51, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:45:20,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=881484.0, ans=0.2 2024-09-19 06:45:24,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=22.5 2024-09-19 06:45:48,019 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.276e+02 2.445e+02 2.604e+02 3.378e+02, threshold=4.890e+02, percent-clipped=0.0 2024-09-19 06:46:16,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=881597.3333333334, ans=0.2 2024-09-19 06:46:32,171 INFO [train.py:1198] (0/2) Epoch 49, batch 4400, loss[loss=0.2533, ctc_loss=0.1773, cr_loss=0.38, over 13907.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1412, cr_loss=0.3654, over 4084827.90 frames. ], batch size: 149, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:47:25,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=881710.6666666666, ans=0.125 2024-09-19 06:47:45,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=881739.0, ans=0.125 2024-09-19 06:47:51,098 INFO [train.py:1198] (0/2) Epoch 49, batch 4450, loss[loss=0.2026, ctc_loss=0.134, cr_loss=0.3432, over 20902.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1411, cr_loss=0.3654, over 4089174.81 frames. ], batch size: 54, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:48:00,513 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:48:04,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=881795.6666666666, ans=0.0 2024-09-19 06:48:27,557 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.320e+02 2.495e+02 2.668e+02 3.983e+02, threshold=4.991e+02, percent-clipped=0.0 2024-09-19 06:48:29,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=881824.0, ans=0.125 2024-09-19 06:48:32,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=881824.0, ans=0.2 2024-09-19 06:48:38,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=881852.3333333334, ans=0.125 2024-09-19 06:49:07,097 INFO [train.py:1198] (0/2) Epoch 49, batch 4500, loss[loss=0.2125, ctc_loss=0.139, cr_loss=0.3675, over 19883.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1407, cr_loss=0.3651, over 4085054.40 frames. ], batch size: 44, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:49:31,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=881937.3333333334, ans=0.125 2024-09-19 06:49:43,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=881965.6666666666, ans=0.1 2024-09-19 06:50:03,911 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=12.0 2024-09-19 06:50:07,099 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-19 06:50:23,131 INFO [train.py:1198] (0/2) Epoch 49, batch 4550, loss[loss=0.1862, ctc_loss=0.1177, cr_loss=0.3428, over 20961.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1401, cr_loss=0.3644, over 4082832.96 frames. ], batch size: 51, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:50:37,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-19 06:50:58,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=882107.3333333334, ans=0.2 2024-09-19 06:50:59,451 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.229e+02 2.333e+02 2.525e+02 5.159e+02, threshold=4.666e+02, percent-clipped=1.0 2024-09-19 06:51:30,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2024-09-19 06:51:31,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=882164.0, ans=15.0 2024-09-19 06:51:38,773 INFO [train.py:1198] (0/2) Epoch 49, batch 4600, loss[loss=0.1912, ctc_loss=0.1255, cr_loss=0.3283, over 20941.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.1396, cr_loss=0.3635, over 4086946.91 frames. ], batch size: 50, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:51:54,892 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.31 vs. limit=15.0 2024-09-19 06:52:45,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=882305.6666666666, ans=0.0 2024-09-19 06:52:45,905 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=12.0 2024-09-19 06:52:57,231 INFO [train.py:1198] (0/2) Epoch 49, batch 4650, loss[loss=0.1977, ctc_loss=0.1269, cr_loss=0.3542, over 20971.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1406, cr_loss=0.3651, over 4079715.04 frames. ], batch size: 55, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:53:12,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=882334.0, ans=0.125 2024-09-19 06:53:36,309 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.308e+02 2.420e+02 2.560e+02 3.188e+02, threshold=4.840e+02, percent-clipped=0.0 2024-09-19 06:53:54,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=882419.0, ans=0.125 2024-09-19 06:54:15,450 INFO [train.py:1198] (0/2) Epoch 49, batch 4700, loss[loss=0.1955, ctc_loss=0.1279, cr_loss=0.3379, over 20999.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1403, cr_loss=0.365, over 4091821.87 frames. ], batch size: 63, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:54:44,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=882532.3333333334, ans=0.2 2024-09-19 06:54:57,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=882532.3333333334, ans=0.0 2024-09-19 06:55:05,127 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-09-19 06:55:28,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=882589.0, ans=0.05 2024-09-19 06:55:31,406 INFO [train.py:1198] (0/2) Epoch 49, batch 4750, loss[loss=0.2245, ctc_loss=0.1458, cr_loss=0.3937, over 21059.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1407, cr_loss=0.3657, over 4088741.54 frames. ], batch size: 59, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 06:56:04,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=882674.0, ans=0.125 2024-09-19 06:56:07,314 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.318e+02 2.432e+02 2.570e+02 3.803e+02, threshold=4.863e+02, percent-clipped=0.0 2024-09-19 06:56:22,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.41 vs. limit=10.0 2024-09-19 06:56:46,453 INFO [train.py:1198] (0/2) Epoch 49, batch 4800, loss[loss=0.2413, ctc_loss=0.1643, cr_loss=0.3847, over 14989.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1409, cr_loss=0.3661, over 4076277.84 frames. ], batch size: 150, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:57:15,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=882815.6666666666, ans=0.0 2024-09-19 06:57:32,582 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=22.5 2024-09-19 06:57:44,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=882844.0, ans=0.07 2024-09-19 06:58:05,099 INFO [train.py:1198] (0/2) Epoch 49, batch 4850, loss[loss=0.1859, ctc_loss=0.1171, cr_loss=0.3439, over 20920.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1406, cr_loss=0.3658, over 4083809.35 frames. ], batch size: 48, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:58:05,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2024-09-19 06:58:19,376 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2024-09-19 06:58:22,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=882929.0, ans=0.125 2024-09-19 06:58:40,940 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.266e+02 2.402e+02 2.535e+02 8.779e+02, threshold=4.804e+02, percent-clipped=1.0 2024-09-19 06:59:04,735 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-09-19 06:59:23,969 INFO [train.py:1198] (0/2) Epoch 49, batch 4900, loss[loss=0.2238, ctc_loss=0.149, cr_loss=0.3738, over 20852.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.14, cr_loss=0.3643, over 4085775.82 frames. ], batch size: 65, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 06:59:39,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=883070.6666666666, ans=0.0 2024-09-19 06:59:56,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=883099.0, ans=0.125 2024-09-19 07:00:02,295 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-19 07:00:03,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=883099.0, ans=0.1 2024-09-19 07:00:15,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=883127.3333333334, ans=0.125 2024-09-19 07:00:27,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=883155.6666666666, ans=0.125 2024-09-19 07:00:38,999 INFO [train.py:1198] (0/2) Epoch 49, batch 4950, loss[loss=0.2184, ctc_loss=0.1451, cr_loss=0.3664, over 20057.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1404, cr_loss=0.3649, over 4095575.95 frames. ], batch size: 80, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:00:49,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=883184.0, ans=0.125 2024-09-19 07:01:00,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=883212.3333333334, ans=0.125 2024-09-19 07:01:01,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=12.0 2024-09-19 07:01:14,320 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.060e+02 2.287e+02 2.498e+02 2.641e+02 4.233e+02, threshold=4.997e+02, percent-clipped=0.0 2024-09-19 07:01:31,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=883269.0, ans=0.1 2024-09-19 07:01:40,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=883297.3333333334, ans=0.025 2024-09-19 07:01:53,347 INFO [train.py:1198] (0/2) Epoch 49, batch 5000, loss[loss=0.2039, ctc_loss=0.1324, cr_loss=0.3576, over 20789.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1407, cr_loss=0.3653, over 4094813.79 frames. ], batch size: 53, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:02:09,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883354.0, ans=0.1 2024-09-19 07:03:08,150 INFO [train.py:1198] (0/2) Epoch 49, batch 5050, loss[loss=0.2249, ctc_loss=0.1528, cr_loss=0.3609, over 19437.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1414, cr_loss=0.3665, over 4088051.29 frames. ], batch size: 90, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:03:43,937 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.281e+02 2.422e+02 2.560e+02 3.112e+02, threshold=4.845e+02, percent-clipped=0.0 2024-09-19 07:04:14,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-09-19 07:04:23,217 INFO [train.py:1198] (0/2) Epoch 49, batch 5100, loss[loss=0.2227, ctc_loss=0.1469, cr_loss=0.379, over 20882.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1408, cr_loss=0.3662, over 4101583.33 frames. ], batch size: 57, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:05:17,505 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:05:23,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=883722.3333333334, ans=0.0 2024-09-19 07:05:38,104 INFO [train.py:1198] (0/2) Epoch 49, batch 5150, loss[loss=0.1867, ctc_loss=0.1214, cr_loss=0.3264, over 19850.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1407, cr_loss=0.3663, over 4104385.34 frames. ], batch size: 44, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:06:11,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2024-09-19 07:06:14,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2024-09-19 07:06:16,523 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.260e+02 2.417e+02 2.553e+02 3.484e+02, threshold=4.835e+02, percent-clipped=0.0 2024-09-19 07:06:37,995 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:06:55,549 INFO [train.py:1198] (0/2) Epoch 49, batch 5200, loss[loss=0.2201, ctc_loss=0.1451, cr_loss=0.3751, over 20703.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1408, cr_loss=0.3663, over 4099973.20 frames. ], batch size: 68, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:06:57,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=883892.3333333334, ans=0.07 2024-09-19 07:07:05,167 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.09 vs. limit=6.0 2024-09-19 07:07:32,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=883949.0, ans=0.0 2024-09-19 07:07:33,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=883949.0, ans=0.0 2024-09-19 07:07:50,108 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-312000.pt 2024-09-19 07:08:09,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=884034.0, ans=0.0 2024-09-19 07:08:09,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2024-09-19 07:08:13,058 INFO [train.py:1198] (0/2) Epoch 49, batch 5250, loss[loss=0.2391, ctc_loss=0.1595, cr_loss=0.398, over 20868.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1406, cr_loss=0.3656, over 4096296.61 frames. ], batch size: 65, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:08:48,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.316e+02 2.504e+02 2.645e+02 6.124e+02, threshold=5.008e+02, percent-clipped=1.0 2024-09-19 07:09:06,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=884119.0, ans=0.0 2024-09-19 07:09:21,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=884147.3333333334, ans=0.125 2024-09-19 07:09:26,946 INFO [train.py:1198] (0/2) Epoch 49, batch 5300, loss[loss=0.2154, ctc_loss=0.1415, cr_loss=0.3695, over 21012.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1409, cr_loss=0.3665, over 4095712.00 frames. ], batch size: 61, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:10:05,364 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-19 07:10:18,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884260.6666666666, ans=0.1 2024-09-19 07:10:19,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=884260.6666666666, ans=0.2 2024-09-19 07:10:23,651 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:10:41,395 INFO [train.py:1198] (0/2) Epoch 49, batch 5350, loss[loss=0.223, ctc_loss=0.1459, cr_loss=0.3855, over 20822.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.141, cr_loss=0.3661, over 4101024.06 frames. ], batch size: 59, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:10:41,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884317.3333333334, ans=0.1 2024-09-19 07:11:16,949 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.327e+02 2.505e+02 2.655e+02 3.238e+02, threshold=5.009e+02, percent-clipped=0.0 2024-09-19 07:11:30,841 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:11:56,032 INFO [train.py:1198] (0/2) Epoch 49, batch 5400, loss[loss=0.2043, ctc_loss=0.1312, cr_loss=0.3655, over 20899.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1409, cr_loss=0.3658, over 4086192.92 frames. ], batch size: 54, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:12:07,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=884459.0, ans=0.125 2024-09-19 07:12:19,643 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2024-09-19 07:12:42,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=884544.0, ans=0.125 2024-09-19 07:13:10,985 INFO [train.py:1198] (0/2) Epoch 49, batch 5450, loss[loss=0.2357, ctc_loss=0.1553, cr_loss=0.4021, over 20677.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1406, cr_loss=0.3653, over 4094902.59 frames. ], batch size: 66, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:13:17,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=884600.6666666666, ans=0.2 2024-09-19 07:13:17,723 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2024-09-19 07:13:46,451 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.275e+02 2.412e+02 2.577e+02 4.328e+02, threshold=4.823e+02, percent-clipped=0.0 2024-09-19 07:14:01,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=884685.6666666666, ans=0.2 2024-09-19 07:14:03,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=884685.6666666666, ans=0.02 2024-09-19 07:14:03,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=884685.6666666666, ans=0.0 2024-09-19 07:14:06,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=884685.6666666666, ans=0.125 2024-09-19 07:14:08,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-19 07:14:25,268 INFO [train.py:1198] (0/2) Epoch 49, batch 5500, loss[loss=0.2303, ctc_loss=0.1509, cr_loss=0.3968, over 20980.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1418, cr_loss=0.3678, over 4093166.74 frames. ], batch size: 64, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:14:58,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=884799.0, ans=0.125 2024-09-19 07:15:02,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884799.0, ans=0.1 2024-09-19 07:15:15,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884827.3333333334, ans=0.1 2024-09-19 07:15:15,931 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-09-19 07:15:40,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=884884.0, ans=0.125 2024-09-19 07:15:42,127 INFO [train.py:1198] (0/2) Epoch 49, batch 5550, loss[loss=0.2177, ctc_loss=0.1437, cr_loss=0.3703, over 20800.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1415, cr_loss=0.3673, over 4082854.88 frames. ], batch size: 53, lr: 1.67e-03, grad_scale: 16.0 2024-09-19 07:15:45,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=884884.0, ans=0.2 2024-09-19 07:16:07,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=884912.3333333334, ans=0.125 2024-09-19 07:16:07,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=884912.3333333334, ans=0.125 2024-09-19 07:16:12,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=884940.6666666666, ans=0.0 2024-09-19 07:16:19,249 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.310e+02 2.440e+02 2.629e+02 5.251e+02, threshold=4.879e+02, percent-clipped=1.0 2024-09-19 07:16:47,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=884997.3333333334, ans=0.0 2024-09-19 07:16:53,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=884997.3333333334, ans=0.125 2024-09-19 07:16:56,583 INFO [train.py:1198] (0/2) Epoch 49, batch 5600, loss[loss=0.1943, ctc_loss=0.1257, cr_loss=0.3428, over 20286.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1414, cr_loss=0.3673, over 4091821.25 frames. ], batch size: 45, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:17:11,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=885054.0, ans=0.0 2024-09-19 07:17:14,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885054.0, ans=0.1 2024-09-19 07:17:16,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=885054.0, ans=0.05 2024-09-19 07:17:30,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=885082.3333333334, ans=0.0 2024-09-19 07:18:01,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885139.0, ans=0.1 2024-09-19 07:18:13,441 INFO [train.py:1198] (0/2) Epoch 49, batch 5650, loss[loss=0.2187, ctc_loss=0.1444, cr_loss=0.3718, over 21064.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1418, cr_loss=0.3679, over 4095267.69 frames. ], batch size: 59, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:18:32,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=885195.6666666666, ans=0.2 2024-09-19 07:18:43,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=885224.0, ans=0.0 2024-09-19 07:18:50,372 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.923e+02 2.302e+02 2.418e+02 2.566e+02 3.557e+02, threshold=4.836e+02, percent-clipped=0.0 2024-09-19 07:19:14,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=885280.6666666666, ans=0.035 2024-09-19 07:19:14,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=885280.6666666666, ans=0.0 2024-09-19 07:19:19,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=885280.6666666666, ans=0.125 2024-09-19 07:19:27,594 INFO [train.py:1198] (0/2) Epoch 49, batch 5700, loss[loss=0.194, ctc_loss=0.1263, cr_loss=0.3388, over 20974.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3675, over 4089047.45 frames. ], batch size: 48, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:20:03,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=885365.6666666666, ans=0.125 2024-09-19 07:20:22,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=885394.0, ans=0.125 2024-09-19 07:20:30,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.51 vs. limit=15.0 2024-09-19 07:20:33,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2024-09-19 07:20:35,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=885422.3333333334, ans=0.125 2024-09-19 07:20:41,435 INFO [train.py:1198] (0/2) Epoch 49, batch 5750, loss[loss=0.2487, ctc_loss=0.167, cr_loss=0.4084, over 20652.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1428, cr_loss=0.3692, over 4071803.95 frames. ], batch size: 66, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:20:44,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885450.6666666666, ans=0.1 2024-09-19 07:20:48,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=15.0 2024-09-19 07:21:04,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=885479.0, ans=0.07 2024-09-19 07:21:05,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=885479.0, ans=0.0 2024-09-19 07:21:14,515 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:21:18,603 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.306e+02 2.426e+02 2.589e+02 3.226e+02, threshold=4.851e+02, percent-clipped=0.0 2024-09-19 07:21:44,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=22.5 2024-09-19 07:21:56,105 INFO [train.py:1198] (0/2) Epoch 49, batch 5800, loss[loss=0.1896, ctc_loss=0.1221, cr_loss=0.3377, over 20785.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1425, cr_loss=0.3684, over 4061439.02 frames. ], batch size: 53, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:22:26,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=885649.0, ans=0.125 2024-09-19 07:22:30,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=885649.0, ans=0.035 2024-09-19 07:23:11,131 INFO [train.py:1198] (0/2) Epoch 49, batch 5850, loss[loss=0.1995, ctc_loss=0.1288, cr_loss=0.3535, over 20962.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1416, cr_loss=0.3676, over 4076986.72 frames. ], batch size: 58, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:23:24,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=885762.3333333334, ans=0.0 2024-09-19 07:23:29,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=885762.3333333334, ans=0.0 2024-09-19 07:23:50,807 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.257e+02 2.371e+02 2.513e+02 3.999e+02, threshold=4.743e+02, percent-clipped=0.0 2024-09-19 07:24:10,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885819.0, ans=0.1 2024-09-19 07:24:28,049 INFO [train.py:1198] (0/2) Epoch 49, batch 5900, loss[loss=0.2309, ctc_loss=0.1511, cr_loss=0.399, over 20053.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1415, cr_loss=0.3667, over 4088553.99 frames. ], batch size: 80, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:24:42,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=885904.0, ans=0.125 2024-09-19 07:25:28,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=885989.0, ans=0.125 2024-09-19 07:25:42,628 INFO [train.py:1198] (0/2) Epoch 49, batch 5950, loss[loss=0.2371, ctc_loss=0.1593, cr_loss=0.3889, over 19396.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1424, cr_loss=0.3685, over 4074410.46 frames. ], batch size: 90, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:25:49,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=886017.3333333334, ans=0.1 2024-09-19 07:26:07,760 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:26:22,460 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.336e+02 2.472e+02 2.589e+02 3.765e+02, threshold=4.944e+02, percent-clipped=0.0 2024-09-19 07:26:36,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=886102.3333333334, ans=0.125 2024-09-19 07:26:41,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=12.0 2024-09-19 07:26:54,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2024-09-19 07:26:59,449 INFO [train.py:1198] (0/2) Epoch 49, batch 6000, loss[loss=0.2152, ctc_loss=0.1392, cr_loss=0.3799, over 20857.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3671, over 4071236.56 frames. ], batch size: 57, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:26:59,450 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 07:27:18,258 INFO [train.py:1230] (0/2) Epoch 49, validation: loss=0.03867, ctc_loss=0.03867, cr_loss=1.6e-14, over 944034.00 frames. 2024-09-19 07:27:18,259 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-19 07:27:43,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-19 07:27:49,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.73 vs. limit=15.0 2024-09-19 07:27:54,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=886215.6666666666, ans=0.2 2024-09-19 07:28:25,173 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.74 vs. limit=10.0 2024-09-19 07:28:33,456 INFO [train.py:1198] (0/2) Epoch 49, batch 6050, loss[loss=0.2244, ctc_loss=0.1474, cr_loss=0.3848, over 21017.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.141, cr_loss=0.3665, over 4093386.56 frames. ], batch size: 63, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:29:10,947 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.265e+02 2.419e+02 2.591e+02 5.269e+02, threshold=4.838e+02, percent-clipped=1.0 2024-09-19 07:29:35,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=886414.0, ans=0.07 2024-09-19 07:29:48,148 INFO [train.py:1198] (0/2) Epoch 49, batch 6100, loss[loss=0.2287, ctc_loss=0.151, cr_loss=0.3885, over 21009.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1414, cr_loss=0.3673, over 4096963.05 frames. ], batch size: 63, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:29:52,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=886442.3333333334, ans=0.125 2024-09-19 07:30:02,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=886470.6666666666, ans=0.125 2024-09-19 07:30:05,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=886470.6666666666, ans=0.2 2024-09-19 07:30:13,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=886470.6666666666, ans=0.125 2024-09-19 07:30:31,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=886527.3333333334, ans=0.125 2024-09-19 07:30:46,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-19 07:31:02,124 INFO [train.py:1198] (0/2) Epoch 49, batch 6150, loss[loss=0.2051, ctc_loss=0.1354, cr_loss=0.3488, over 21035.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.142, cr_loss=0.3685, over 4091888.81 frames. ], batch size: 62, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:31:12,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=886584.0, ans=0.125 2024-09-19 07:31:40,259 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.294e+02 2.443e+02 2.619e+02 4.038e+02, threshold=4.886e+02, percent-clipped=0.0 2024-09-19 07:31:48,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-19 07:32:17,258 INFO [train.py:1198] (0/2) Epoch 49, batch 6200, loss[loss=0.222, ctc_loss=0.1439, cr_loss=0.3908, over 21040.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1437, cr_loss=0.3708, over 4070947.79 frames. ], batch size: 62, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:32:17,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=886725.6666666666, ans=0.2 2024-09-19 07:32:58,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=886782.3333333334, ans=0.125 2024-09-19 07:33:31,746 INFO [train.py:1198] (0/2) Epoch 49, batch 6250, loss[loss=0.2316, ctc_loss=0.1535, cr_loss=0.3904, over 21028.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1455, cr_loss=0.3738, over 4045357.54 frames. ], batch size: 62, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:34:08,716 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.182e+02 2.374e+02 2.493e+02 2.740e+02 6.672e+02, threshold=4.985e+02, percent-clipped=2.0 2024-09-19 07:34:22,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=886952.3333333334, ans=0.125 2024-09-19 07:34:26,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=886952.3333333334, ans=0.025 2024-09-19 07:34:44,933 INFO [train.py:1198] (0/2) Epoch 49, batch 6300, loss[loss=0.1839, ctc_loss=0.1183, cr_loss=0.3279, over 19992.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1467, cr_loss=0.374, over 3986106.74 frames. ], batch size: 44, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:35:03,723 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=22.5 2024-09-19 07:35:56,851 INFO [train.py:1198] (0/2) Epoch 49, batch 6350, loss[loss=0.2507, ctc_loss=0.1732, cr_loss=0.3876, over 14363.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1511, cr_loss=0.377, over 3791023.54 frames. ], batch size: 149, lr: 1.67e-03, grad_scale: 32.0 2024-09-19 07:36:02,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=887150.6666666666, ans=0.125 2024-09-19 07:36:12,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=887179.0, ans=0.125 2024-09-19 07:36:31,817 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.659e+02 2.858e+02 3.070e+02 6.716e+02, threshold=5.716e+02, percent-clipped=1.0 2024-09-19 07:36:54,970 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-49.pt 2024-09-19 07:37:43,472 INFO [train.py:1198] (0/2) Epoch 50, batch 0, loss[loss=0.2363, ctc_loss=0.1572, cr_loss=0.3956, over 19431.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1572, cr_loss=0.3956, over 19431.00 frames. ], batch size: 90, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:37:43,473 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 07:37:57,979 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.0634, 4.8217, 3.7331, 4.2527], device='cuda:0') 2024-09-19 07:38:01,068 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3590, 4.8990, 4.7189, 5.0848], device='cuda:0') 2024-09-19 07:38:04,299 INFO [train.py:1230] (0/2) Epoch 50, validation: loss=0.03842, ctc_loss=0.03842, cr_loss=1.713e-14, over 944034.00 frames. 2024-09-19 07:38:04,300 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-19 07:38:12,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=887266.8333333334, ans=0.0 2024-09-19 07:38:42,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=887323.5, ans=0.125 2024-09-19 07:38:46,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=887323.5, ans=0.125 2024-09-19 07:38:52,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=887351.8333333334, ans=0.0 2024-09-19 07:38:56,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887351.8333333334, ans=0.1 2024-09-19 07:39:02,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=887380.1666666666, ans=0.1 2024-09-19 07:39:19,535 INFO [train.py:1198] (0/2) Epoch 50, batch 50, loss[loss=0.1761, ctc_loss=0.1147, cr_loss=0.3068, over 20766.00 frames. ], tot_loss[loss=0.212, ctc_loss=0.1394, cr_loss=0.3633, over 917285.35 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:40:00,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-09-19 07:40:10,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=887493.5, ans=0.125 2024-09-19 07:40:13,658 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.275e+02 2.418e+02 2.747e+02 4.190e+02, threshold=4.835e+02, percent-clipped=0.0 2024-09-19 07:40:26,339 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:40:37,937 INFO [train.py:1198] (0/2) Epoch 50, batch 100, loss[loss=0.2349, ctc_loss=0.1554, cr_loss=0.3977, over 20644.00 frames. ], tot_loss[loss=0.2122, ctc_loss=0.1395, cr_loss=0.3634, over 1620092.20 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:40:38,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=887550.1666666666, ans=0.2 2024-09-19 07:40:45,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=887550.1666666666, ans=0.1 2024-09-19 07:41:12,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=887606.8333333334, ans=0.125 2024-09-19 07:41:15,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=887606.8333333334, ans=0.2 2024-09-19 07:41:30,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=887635.1666666666, ans=0.5 2024-09-19 07:41:44,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=887663.5, ans=0.125 2024-09-19 07:41:53,070 INFO [train.py:1198] (0/2) Epoch 50, batch 150, loss[loss=0.243, ctc_loss=0.1629, cr_loss=0.4006, over 17981.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.14, cr_loss=0.3641, over 2164636.46 frames. ], batch size: 108, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:42:23,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=887748.5, ans=0.025 2024-09-19 07:42:44,260 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.231e+02 2.391e+02 2.516e+02 3.152e+02, threshold=4.782e+02, percent-clipped=0.0 2024-09-19 07:43:08,319 INFO [train.py:1198] (0/2) Epoch 50, batch 200, loss[loss=0.2276, ctc_loss=0.1495, cr_loss=0.3902, over 20833.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1414, cr_loss=0.3662, over 2573654.26 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:43:48,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=887890.1666666666, ans=0.2 2024-09-19 07:44:16,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=887946.8333333334, ans=0.0 2024-09-19 07:44:23,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=887946.8333333334, ans=0.0 2024-09-19 07:44:27,309 INFO [train.py:1198] (0/2) Epoch 50, batch 250, loss[loss=0.2029, ctc_loss=0.1335, cr_loss=0.3473, over 20828.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1428, cr_loss=0.3693, over 2910071.65 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:44:43,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=888003.5, ans=0.125 2024-09-19 07:45:09,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=888031.8333333334, ans=0.125 2024-09-19 07:45:18,456 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.313e+02 2.452e+02 2.591e+02 3.246e+02, threshold=4.905e+02, percent-clipped=0.0 2024-09-19 07:45:23,813 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=22.5 2024-09-19 07:45:45,778 INFO [train.py:1198] (0/2) Epoch 50, batch 300, loss[loss=0.2191, ctc_loss=0.1466, cr_loss=0.3629, over 20999.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1435, cr_loss=0.3697, over 3175693.94 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:45:58,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=888116.8333333334, ans=0.0 2024-09-19 07:46:03,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=888145.1666666666, ans=0.1 2024-09-19 07:46:08,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-09-19 07:46:12,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=888145.1666666666, ans=0.2 2024-09-19 07:46:22,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=888173.5, ans=10.0 2024-09-19 07:46:24,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=22.5 2024-09-19 07:47:01,460 INFO [train.py:1198] (0/2) Epoch 50, batch 350, loss[loss=0.2538, ctc_loss=0.1679, cr_loss=0.4296, over 20974.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1428, cr_loss=0.3693, over 3376102.79 frames. ], batch size: 64, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:47:06,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=888258.5, ans=0.125 2024-09-19 07:47:39,059 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:47:43,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=888315.1666666666, ans=0.125 2024-09-19 07:47:52,137 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.282e+02 2.368e+02 2.527e+02 3.636e+02, threshold=4.735e+02, percent-clipped=0.0 2024-09-19 07:48:16,102 INFO [train.py:1198] (0/2) Epoch 50, batch 400, loss[loss=0.2052, ctc_loss=0.1342, cr_loss=0.3552, over 20880.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1426, cr_loss=0.3695, over 3539454.82 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:48:37,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=888428.5, ans=0.125 2024-09-19 07:49:34,593 INFO [train.py:1198] (0/2) Epoch 50, batch 450, loss[loss=0.199, ctc_loss=0.1303, cr_loss=0.3437, over 20873.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3686, over 3664535.67 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:50:02,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=888570.1666666666, ans=0.0 2024-09-19 07:50:26,179 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.265e+02 2.395e+02 2.492e+02 3.193e+02, threshold=4.790e+02, percent-clipped=0.0 2024-09-19 07:50:50,421 INFO [train.py:1198] (0/2) Epoch 50, batch 500, loss[loss=0.2052, ctc_loss=0.132, cr_loss=0.366, over 20880.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.3691, over 3756551.80 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:51:19,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=888711.8333333334, ans=0.0 2024-09-19 07:51:29,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-19 07:51:54,853 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:51:55,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-09-19 07:52:09,561 INFO [train.py:1198] (0/2) Epoch 50, batch 550, loss[loss=0.1999, ctc_loss=0.1296, cr_loss=0.3514, over 20962.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3672, over 3835097.58 frames. ], batch size: 50, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:53:00,853 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.266e+02 2.376e+02 2.535e+02 3.229e+02, threshold=4.752e+02, percent-clipped=0.0 2024-09-19 07:53:24,382 INFO [train.py:1198] (0/2) Epoch 50, batch 600, loss[loss=0.2164, ctc_loss=0.1419, cr_loss=0.3723, over 21072.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.142, cr_loss=0.3681, over 3895526.80 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:53:38,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=15.0 2024-09-19 07:53:53,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=889023.5, ans=0.0 2024-09-19 07:54:15,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=889051.8333333334, ans=0.125 2024-09-19 07:54:39,926 INFO [train.py:1198] (0/2) Epoch 50, batch 650, loss[loss=0.2056, ctc_loss=0.1352, cr_loss=0.3521, over 20985.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1409, cr_loss=0.3667, over 3955042.27 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:54:59,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=889136.8333333334, ans=0.0 2024-09-19 07:55:19,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=889165.1666666666, ans=0.125 2024-09-19 07:55:34,112 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.301e+02 2.436e+02 2.631e+02 6.907e+02, threshold=4.873e+02, percent-clipped=1.0 2024-09-19 07:55:46,708 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:55:49,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=889221.8333333334, ans=0.1 2024-09-19 07:55:58,239 INFO [train.py:1198] (0/2) Epoch 50, batch 700, loss[loss=0.2119, ctc_loss=0.1375, cr_loss=0.3722, over 20931.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.3673, over 3965675.40 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:56:01,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=889250.1666666666, ans=0.125 2024-09-19 07:56:32,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2024-09-19 07:56:42,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=889335.1666666666, ans=0.0 2024-09-19 07:57:15,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=889391.8333333334, ans=0.025 2024-09-19 07:57:16,880 INFO [train.py:1198] (0/2) Epoch 50, batch 750, loss[loss=0.2285, ctc_loss=0.1508, cr_loss=0.3887, over 21026.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1405, cr_loss=0.366, over 4000795.21 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:57:24,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=889391.8333333334, ans=0.125 2024-09-19 07:57:26,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=889391.8333333334, ans=0.1 2024-09-19 07:57:36,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=889420.1666666666, ans=0.0 2024-09-19 07:57:48,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=889448.5, ans=0.2 2024-09-19 07:58:02,697 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-09-19 07:58:08,065 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.284e+02 2.389e+02 2.497e+02 3.132e+02, threshold=4.777e+02, percent-clipped=0.0 2024-09-19 07:58:32,345 INFO [train.py:1198] (0/2) Epoch 50, batch 800, loss[loss=0.2318, ctc_loss=0.1555, cr_loss=0.3812, over 19444.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1413, cr_loss=0.3672, over 4025025.55 frames. ], batch size: 90, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 07:58:37,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=889533.5, ans=0.0 2024-09-19 07:58:59,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=889561.8333333334, ans=0.025 2024-09-19 07:59:11,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-09-19 07:59:20,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=889618.5, ans=0.125 2024-09-19 07:59:24,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=889618.5, ans=0.2 2024-09-19 07:59:40,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=15.0 2024-09-19 07:59:47,204 INFO [train.py:1198] (0/2) Epoch 50, batch 850, loss[loss=0.2283, ctc_loss=0.1494, cr_loss=0.3944, over 21063.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1416, cr_loss=0.3673, over 4033233.74 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:00:00,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=889703.5, ans=0.025 2024-09-19 08:00:03,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=889703.5, ans=0.0 2024-09-19 08:00:15,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=889731.8333333334, ans=0.07 2024-09-19 08:00:37,481 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.266e+02 2.406e+02 2.507e+02 3.196e+02, threshold=4.812e+02, percent-clipped=0.0 2024-09-19 08:01:01,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2024-09-19 08:01:01,824 INFO [train.py:1198] (0/2) Epoch 50, batch 900, loss[loss=0.1834, ctc_loss=0.119, cr_loss=0.3223, over 20966.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.368, over 4034426.34 frames. ], batch size: 51, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:01:12,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=889816.8333333334, ans=0.125 2024-09-19 08:01:59,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-09-19 08:02:12,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=889930.1666666666, ans=0.2 2024-09-19 08:02:20,123 INFO [train.py:1198] (0/2) Epoch 50, batch 950, loss[loss=0.1792, ctc_loss=0.1146, cr_loss=0.323, over 19926.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1422, cr_loss=0.3686, over 4046513.78 frames. ], batch size: 44, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:02:25,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=889958.5, ans=0.025 2024-09-19 08:02:50,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=889986.8333333334, ans=0.125 2024-09-19 08:02:55,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=890015.1666666666, ans=0.95 2024-09-19 08:03:14,468 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.262e+02 2.398e+02 2.595e+02 3.730e+02, threshold=4.796e+02, percent-clipped=0.0 2024-09-19 08:03:16,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=890043.5, ans=0.1 2024-09-19 08:03:38,765 INFO [train.py:1198] (0/2) Epoch 50, batch 1000, loss[loss=0.2218, ctc_loss=0.1478, cr_loss=0.3702, over 20636.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1423, cr_loss=0.3692, over 4064867.93 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:03:49,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890100.1666666666, ans=0.1 2024-09-19 08:04:15,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=890156.8333333334, ans=0.0 2024-09-19 08:04:55,197 INFO [train.py:1198] (0/2) Epoch 50, batch 1050, loss[loss=0.217, ctc_loss=0.1436, cr_loss=0.3669, over 20946.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.368, over 4080926.33 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:04:55,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-09-19 08:05:09,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=22.5 2024-09-19 08:05:42,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=890326.8333333334, ans=0.2 2024-09-19 08:05:43,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=890326.8333333334, ans=0.125 2024-09-19 08:05:46,573 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.296e+02 2.435e+02 2.565e+02 3.108e+02, threshold=4.870e+02, percent-clipped=0.0 2024-09-19 08:05:59,598 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:06:11,223 INFO [train.py:1198] (0/2) Epoch 50, batch 1100, loss[loss=0.23, ctc_loss=0.1523, cr_loss=0.3883, over 21057.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1416, cr_loss=0.368, over 4092723.54 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:06:15,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=890383.5, ans=0.0 2024-09-19 08:07:29,751 INFO [train.py:1198] (0/2) Epoch 50, batch 1150, loss[loss=0.1893, ctc_loss=0.1221, cr_loss=0.3358, over 20952.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.141, cr_loss=0.3669, over 4088999.73 frames. ], batch size: 50, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:07:36,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=890525.1666666666, ans=0.0 2024-09-19 08:07:44,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=890553.5, ans=0.0 2024-09-19 08:07:50,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=890553.5, ans=0.125 2024-09-19 08:08:19,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2024-09-19 08:08:20,633 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.263e+02 2.381e+02 2.513e+02 8.725e+02, threshold=4.761e+02, percent-clipped=1.0 2024-09-19 08:08:47,484 INFO [train.py:1198] (0/2) Epoch 50, batch 1200, loss[loss=0.1935, ctc_loss=0.1233, cr_loss=0.3506, over 20984.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1407, cr_loss=0.3661, over 4090573.90 frames. ], batch size: 48, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:08:55,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=890666.8333333334, ans=0.2 2024-09-19 08:10:02,648 INFO [train.py:1198] (0/2) Epoch 50, batch 1250, loss[loss=0.2072, ctc_loss=0.1358, cr_loss=0.3567, over 20974.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1415, cr_loss=0.3672, over 4081580.13 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:10:07,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=890808.5, ans=0.025 2024-09-19 08:10:21,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=890836.8333333334, ans=0.0 2024-09-19 08:10:28,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.12 vs. limit=10.0 2024-09-19 08:10:40,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=890865.1666666666, ans=0.1 2024-09-19 08:10:41,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=890865.1666666666, ans=0.0 2024-09-19 08:10:45,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-19 08:10:46,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=890893.5, ans=0.0 2024-09-19 08:10:48,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=890893.5, ans=0.0 2024-09-19 08:10:53,681 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.278e+02 2.388e+02 2.563e+02 3.317e+02, threshold=4.777e+02, percent-clipped=0.0 2024-09-19 08:11:17,670 INFO [train.py:1198] (0/2) Epoch 50, batch 1300, loss[loss=0.1932, ctc_loss=0.1241, cr_loss=0.3455, over 20892.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1407, cr_loss=0.3656, over 4090167.52 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:12:06,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=891035.1666666666, ans=0.125 2024-09-19 08:12:23,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=891063.5, ans=15.0 2024-09-19 08:12:28,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=891063.5, ans=0.125 2024-09-19 08:12:35,611 INFO [train.py:1198] (0/2) Epoch 50, batch 1350, loss[loss=0.2307, ctc_loss=0.152, cr_loss=0.3934, over 19978.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1405, cr_loss=0.3652, over 4097595.04 frames. ], batch size: 80, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:12:36,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=891091.8333333334, ans=15.0 2024-09-19 08:12:42,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=891091.8333333334, ans=0.125 2024-09-19 08:12:55,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=891120.1666666666, ans=0.1 2024-09-19 08:13:19,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=891176.8333333334, ans=0.125 2024-09-19 08:13:26,089 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.350e+02 2.446e+02 2.605e+02 3.274e+02, threshold=4.892e+02, percent-clipped=0.0 2024-09-19 08:13:26,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=891176.8333333334, ans=0.025 2024-09-19 08:13:45,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=891205.1666666666, ans=0.025 2024-09-19 08:13:49,970 INFO [train.py:1198] (0/2) Epoch 50, batch 1400, loss[loss=0.2532, ctc_loss=0.1666, cr_loss=0.4327, over 20652.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1406, cr_loss=0.3652, over 4096179.40 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:13:50,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2024-09-19 08:14:07,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=891261.8333333334, ans=0.0 2024-09-19 08:14:14,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=891261.8333333334, ans=0.025 2024-09-19 08:15:09,352 INFO [train.py:1198] (0/2) Epoch 50, batch 1450, loss[loss=0.1896, ctc_loss=0.1242, cr_loss=0.3268, over 20964.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1411, cr_loss=0.366, over 4096169.00 frames. ], batch size: 51, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:15:15,846 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-09-19 08:15:26,111 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:15:40,147 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-19 08:16:00,951 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.268e+02 2.401e+02 2.572e+02 3.033e+02, threshold=4.802e+02, percent-clipped=0.0 2024-09-19 08:16:06,271 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.16 vs. limit=15.0 2024-09-19 08:16:07,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=891460.1666666666, ans=0.0 2024-09-19 08:16:07,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=891460.1666666666, ans=0.125 2024-09-19 08:16:25,148 INFO [train.py:1198] (0/2) Epoch 50, batch 1500, loss[loss=0.2231, ctc_loss=0.1487, cr_loss=0.3723, over 20945.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1406, cr_loss=0.365, over 4107160.53 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:16:31,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=891516.8333333334, ans=0.125 2024-09-19 08:17:05,037 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2024-09-19 08:17:13,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=891601.8333333334, ans=0.2 2024-09-19 08:17:39,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=891658.5, ans=0.125 2024-09-19 08:17:40,101 INFO [train.py:1198] (0/2) Epoch 50, batch 1550, loss[loss=0.2188, ctc_loss=0.1425, cr_loss=0.3816, over 20874.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.141, cr_loss=0.3666, over 4117433.44 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:17:43,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891658.5, ans=0.1 2024-09-19 08:18:28,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=891743.5, ans=0.5 2024-09-19 08:18:34,206 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.272e+02 2.394e+02 2.514e+02 2.979e+02, threshold=4.788e+02, percent-clipped=0.0 2024-09-19 08:18:40,606 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:18:45,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=891771.8333333334, ans=0.02 2024-09-19 08:18:58,443 INFO [train.py:1198] (0/2) Epoch 50, batch 1600, loss[loss=0.2153, ctc_loss=0.1405, cr_loss=0.3741, over 21073.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1403, cr_loss=0.366, over 4114268.16 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:19:01,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=891800.1666666666, ans=0.2 2024-09-19 08:19:06,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2024-09-19 08:19:08,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=891800.1666666666, ans=0.125 2024-09-19 08:19:26,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2024-09-19 08:19:29,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=891856.8333333334, ans=0.0 2024-09-19 08:19:38,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=891856.8333333334, ans=0.1 2024-09-19 08:19:42,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=891885.1666666666, ans=0.125 2024-09-19 08:19:55,001 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-09-19 08:20:07,102 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=22.5 2024-09-19 08:20:17,017 INFO [train.py:1198] (0/2) Epoch 50, batch 1650, loss[loss=0.177, ctc_loss=0.1118, cr_loss=0.3256, over 20984.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1404, cr_loss=0.3662, over 4116905.53 frames. ], batch size: 51, lr: 1.65e-03, grad_scale: 64.0 2024-09-19 08:20:17,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=891941.8333333334, ans=0.125 2024-09-19 08:20:21,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=891941.8333333334, ans=0.0 2024-09-19 08:21:10,009 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.317e+02 2.511e+02 2.608e+02 4.038e+02, threshold=5.022e+02, percent-clipped=0.0 2024-09-19 08:21:32,353 INFO [train.py:1198] (0/2) Epoch 50, batch 1700, loss[loss=0.2046, ctc_loss=0.1325, cr_loss=0.3603, over 21043.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1409, cr_loss=0.3666, over 4099490.41 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:21:37,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=892083.5, ans=0.07 2024-09-19 08:22:22,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=892168.5, ans=0.0 2024-09-19 08:22:25,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=892168.5, ans=0.125 2024-09-19 08:22:25,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892168.5, ans=0.1 2024-09-19 08:22:41,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=892196.8333333334, ans=0.125 2024-09-19 08:22:48,274 INFO [train.py:1198] (0/2) Epoch 50, batch 1750, loss[loss=0.2322, ctc_loss=0.1563, cr_loss=0.3794, over 19333.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1402, cr_loss=0.3653, over 4104030.83 frames. ], batch size: 90, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:23:36,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=892310.1666666666, ans=0.0 2024-09-19 08:23:40,757 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.279e+02 2.409e+02 2.572e+02 3.164e+02, threshold=4.818e+02, percent-clipped=0.0 2024-09-19 08:24:06,246 INFO [train.py:1198] (0/2) Epoch 50, batch 1800, loss[loss=0.2431, ctc_loss=0.1607, cr_loss=0.4121, over 20362.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1405, cr_loss=0.3662, over 4104406.68 frames. ], batch size: 74, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:24:21,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=892395.1666666666, ans=0.125 2024-09-19 08:25:25,175 INFO [train.py:1198] (0/2) Epoch 50, batch 1850, loss[loss=0.2381, ctc_loss=0.1604, cr_loss=0.3888, over 18241.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.141, cr_loss=0.3672, over 4091784.05 frames. ], batch size: 108, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:25:58,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-09-19 08:26:01,820 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2024-09-19 08:26:17,963 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.277e+02 2.407e+02 2.527e+02 3.076e+02, threshold=4.815e+02, percent-clipped=0.0 2024-09-19 08:26:19,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=892593.5, ans=0.125 2024-09-19 08:26:40,727 INFO [train.py:1198] (0/2) Epoch 50, batch 1900, loss[loss=0.1656, ctc_loss=0.1075, cr_loss=0.2907, over 19929.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1402, cr_loss=0.3659, over 4093539.68 frames. ], batch size: 44, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:26:53,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.07 vs. limit=6.0 2024-09-19 08:26:56,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=892678.5, ans=0.025 2024-09-19 08:27:03,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=892678.5, ans=0.0 2024-09-19 08:27:09,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=892706.8333333334, ans=0.125 2024-09-19 08:27:55,925 INFO [train.py:1198] (0/2) Epoch 50, batch 1950, loss[loss=0.2707, ctc_loss=0.1896, cr_loss=0.4057, over 14579.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1411, cr_loss=0.3675, over 4095592.74 frames. ], batch size: 150, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:27:56,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=892791.8333333334, ans=0.0 2024-09-19 08:28:04,030 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2024-09-19 08:28:21,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=892820.1666666666, ans=0.07 2024-09-19 08:28:48,249 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.320e+02 2.442e+02 2.576e+02 3.122e+02, threshold=4.884e+02, percent-clipped=0.0 2024-09-19 08:28:53,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=892876.8333333334, ans=0.2 2024-09-19 08:28:57,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=892905.1666666666, ans=0.125 2024-09-19 08:29:10,747 INFO [train.py:1198] (0/2) Epoch 50, batch 2000, loss[loss=0.1987, ctc_loss=0.1288, cr_loss=0.3493, over 20877.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1413, cr_loss=0.3679, over 4099956.44 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:29:28,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=892961.8333333334, ans=0.125 2024-09-19 08:29:35,813 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.63 vs. limit=8.0 2024-09-19 08:30:20,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=893046.8333333334, ans=0.0 2024-09-19 08:30:20,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=893046.8333333334, ans=0.0 2024-09-19 08:30:28,918 INFO [train.py:1198] (0/2) Epoch 50, batch 2050, loss[loss=0.2101, ctc_loss=0.1358, cr_loss=0.3716, over 21021.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.141, cr_loss=0.3668, over 4098446.92 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:30:43,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=893103.5, ans=0.95 2024-09-19 08:30:59,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=893131.8333333334, ans=0.0 2024-09-19 08:31:21,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=893160.1666666666, ans=0.125 2024-09-19 08:31:23,790 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.256e+02 2.370e+02 2.621e+02 3.459e+02, threshold=4.739e+02, percent-clipped=0.0 2024-09-19 08:31:46,585 INFO [train.py:1198] (0/2) Epoch 50, batch 2100, loss[loss=0.2159, ctc_loss=0.1401, cr_loss=0.3793, over 21072.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1408, cr_loss=0.3662, over 4097694.12 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:32:39,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=893301.8333333334, ans=0.1 2024-09-19 08:33:02,274 INFO [train.py:1198] (0/2) Epoch 50, batch 2150, loss[loss=0.1907, ctc_loss=0.1252, cr_loss=0.3273, over 20975.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1405, cr_loss=0.3659, over 4098652.93 frames. ], batch size: 51, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:33:11,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=893358.5, ans=0.125 2024-09-19 08:33:55,033 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.258e+02 2.419e+02 2.581e+02 3.408e+02, threshold=4.838e+02, percent-clipped=0.0 2024-09-19 08:34:02,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=893471.8333333334, ans=0.2 2024-09-19 08:34:05,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=893471.8333333334, ans=0.125 2024-09-19 08:34:10,433 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:34:16,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=893500.1666666666, ans=0.2 2024-09-19 08:34:17,333 INFO [train.py:1198] (0/2) Epoch 50, batch 2200, loss[loss=0.204, ctc_loss=0.1321, cr_loss=0.3594, over 21065.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.141, cr_loss=0.3674, over 4105345.29 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:35:23,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=893613.5, ans=0.125 2024-09-19 08:35:35,078 INFO [train.py:1198] (0/2) Epoch 50, batch 2250, loss[loss=0.2335, ctc_loss=0.1534, cr_loss=0.4006, over 20667.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1424, cr_loss=0.3692, over 4083056.54 frames. ], batch size: 68, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:35:43,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=893641.8333333334, ans=0.035 2024-09-19 08:35:52,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=893670.1666666666, ans=0.05 2024-09-19 08:36:02,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=893670.1666666666, ans=10.0 2024-09-19 08:36:08,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=893698.5, ans=0.035 2024-09-19 08:36:28,214 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.242e+02 2.392e+02 2.522e+02 4.473e+02, threshold=4.784e+02, percent-clipped=0.0 2024-09-19 08:36:54,066 INFO [train.py:1198] (0/2) Epoch 50, batch 2300, loss[loss=0.1988, ctc_loss=0.1268, cr_loss=0.3602, over 20936.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.142, cr_loss=0.3691, over 4091002.08 frames. ], batch size: 49, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:37:42,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=893868.5, ans=0.125 2024-09-19 08:37:44,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=893868.5, ans=0.125 2024-09-19 08:37:48,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=893868.5, ans=0.125 2024-09-19 08:37:51,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=893868.5, ans=0.04949747468305833 2024-09-19 08:38:09,207 INFO [train.py:1198] (0/2) Epoch 50, batch 2350, loss[loss=0.2208, ctc_loss=0.1449, cr_loss=0.3795, over 19485.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1411, cr_loss=0.3673, over 4102864.30 frames. ], batch size: 90, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:38:16,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=893925.1666666666, ans=0.125 2024-09-19 08:38:53,064 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:38:55,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894010.1666666666, ans=0.1 2024-09-19 08:39:01,631 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.305e+02 2.433e+02 2.556e+02 4.397e+02, threshold=4.867e+02, percent-clipped=0.0 2024-09-19 08:39:16,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=894038.5, ans=0.2 2024-09-19 08:39:24,082 INFO [train.py:1198] (0/2) Epoch 50, batch 2400, loss[loss=0.2202, ctc_loss=0.1443, cr_loss=0.38, over 20880.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1412, cr_loss=0.3674, over 4102770.50 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:40:17,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=894151.8333333334, ans=0.125 2024-09-19 08:40:23,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=894180.1666666666, ans=0.07 2024-09-19 08:40:39,094 INFO [train.py:1198] (0/2) Epoch 50, batch 2450, loss[loss=0.2069, ctc_loss=0.1369, cr_loss=0.3501, over 20727.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1407, cr_loss=0.3662, over 4098267.37 frames. ], batch size: 71, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:40:47,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894208.5, ans=0.1 2024-09-19 08:40:53,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=894236.8333333334, ans=0.0 2024-09-19 08:41:00,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894236.8333333334, ans=0.1 2024-09-19 08:41:02,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=894236.8333333334, ans=0.0 2024-09-19 08:41:31,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=894293.5, ans=0.025 2024-09-19 08:41:34,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.107e+02 2.270e+02 2.403e+02 2.563e+02 3.540e+02, threshold=4.805e+02, percent-clipped=0.0 2024-09-19 08:41:36,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=894293.5, ans=0.0 2024-09-19 08:41:46,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.20 vs. limit=10.0 2024-09-19 08:41:57,398 INFO [train.py:1198] (0/2) Epoch 50, batch 2500, loss[loss=0.2197, ctc_loss=0.1436, cr_loss=0.3802, over 20696.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1418, cr_loss=0.3677, over 4074875.64 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:42:00,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=894350.1666666666, ans=0.2 2024-09-19 08:42:02,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=894350.1666666666, ans=0.0 2024-09-19 08:42:17,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=894378.5, ans=0.0 2024-09-19 08:42:59,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=894463.5, ans=0.125 2024-09-19 08:43:15,315 INFO [train.py:1198] (0/2) Epoch 50, batch 2550, loss[loss=0.215, ctc_loss=0.1418, cr_loss=0.366, over 20935.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3676, over 4073468.60 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:43:21,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894491.8333333334, ans=0.1 2024-09-19 08:43:23,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=894491.8333333334, ans=0.0 2024-09-19 08:43:32,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=894520.1666666666, ans=0.0 2024-09-19 08:43:34,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=894520.1666666666, ans=0.04949747468305833 2024-09-19 08:43:43,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=15.0 2024-09-19 08:43:45,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=894548.5, ans=0.2 2024-09-19 08:43:52,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=894548.5, ans=0.025 2024-09-19 08:44:03,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=894576.8333333334, ans=0.025 2024-09-19 08:44:08,113 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.280e+02 2.426e+02 2.621e+02 3.118e+02, threshold=4.853e+02, percent-clipped=0.0 2024-09-19 08:44:10,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=894576.8333333334, ans=0.125 2024-09-19 08:44:14,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2024-09-19 08:44:26,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=894605.1666666666, ans=0.125 2024-09-19 08:44:30,801 INFO [train.py:1198] (0/2) Epoch 50, batch 2600, loss[loss=0.2134, ctc_loss=0.1407, cr_loss=0.3634, over 20866.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1419, cr_loss=0.3675, over 4066446.47 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2024-09-19 08:44:49,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=894661.8333333334, ans=0.0 2024-09-19 08:44:49,345 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:44:55,670 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-09-19 08:45:47,239 INFO [train.py:1198] (0/2) Epoch 50, batch 2650, loss[loss=0.1923, ctc_loss=0.1255, cr_loss=0.3342, over 21060.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3668, over 4076830.55 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2024-09-19 08:46:22,092 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:46:41,488 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.287e+02 2.433e+02 2.592e+02 7.189e+02, threshold=4.866e+02, percent-clipped=1.0 2024-09-19 08:46:43,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=894860.1666666666, ans=0.07 2024-09-19 08:46:56,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-09-19 08:47:06,041 INFO [train.py:1198] (0/2) Epoch 50, batch 2700, loss[loss=0.2017, ctc_loss=0.1312, cr_loss=0.3526, over 20933.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.141, cr_loss=0.3663, over 4090012.19 frames. ], batch size: 49, lr: 1.65e-03, grad_scale: 16.0 2024-09-19 08:47:18,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=894916.8333333334, ans=0.0 2024-09-19 08:47:28,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-09-19 08:47:48,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=894973.5, ans=0.0 2024-09-19 08:47:56,233 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-09-19 08:48:24,588 INFO [train.py:1198] (0/2) Epoch 50, batch 2750, loss[loss=0.1918, ctc_loss=0.1251, cr_loss=0.3333, over 20958.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1412, cr_loss=0.3667, over 4092557.53 frames. ], batch size: 51, lr: 1.64e-03, grad_scale: 8.0 2024-09-19 08:49:01,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=895115.1666666666, ans=0.0 2024-09-19 08:49:07,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=895115.1666666666, ans=0.125 2024-09-19 08:49:20,867 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.232e+02 2.400e+02 2.549e+02 4.117e+02, threshold=4.799e+02, percent-clipped=0.0 2024-09-19 08:49:40,434 INFO [train.py:1198] (0/2) Epoch 50, batch 2800, loss[loss=0.2014, ctc_loss=0.1323, cr_loss=0.3456, over 21004.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1411, cr_loss=0.3668, over 4085609.71 frames. ], batch size: 52, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:49:58,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=895228.5, ans=0.125 2024-09-19 08:50:23,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=895256.8333333334, ans=0.125 2024-09-19 08:50:26,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=895285.1666666666, ans=0.95 2024-09-19 08:50:50,040 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/checkpoint-316000.pt 2024-09-19 08:50:52,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=895313.5, ans=0.5 2024-09-19 08:50:56,771 INFO [train.py:1198] (0/2) Epoch 50, batch 2850, loss[loss=0.2524, ctc_loss=0.1782, cr_loss=0.3713, over 13652.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1407, cr_loss=0.3659, over 4085555.21 frames. ], batch size: 149, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:50:57,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=895341.8333333334, ans=0.025 2024-09-19 08:51:09,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=895341.8333333334, ans=0.09899494936611666 2024-09-19 08:51:43,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=895426.8333333334, ans=0.125 2024-09-19 08:51:52,661 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.294e+02 2.419e+02 2.605e+02 5.521e+02, threshold=4.838e+02, percent-clipped=1.0 2024-09-19 08:52:12,186 INFO [train.py:1198] (0/2) Epoch 50, batch 2900, loss[loss=0.2326, ctc_loss=0.152, cr_loss=0.4032, over 20649.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1407, cr_loss=0.366, over 4095733.69 frames. ], batch size: 66, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:52:27,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=895511.8333333334, ans=0.0 2024-09-19 08:52:29,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=895511.8333333334, ans=0.125 2024-09-19 08:52:37,000 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:52:38,530 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:52:51,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=895540.1666666666, ans=0.125 2024-09-19 08:53:05,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=895568.5, ans=0.125 2024-09-19 08:53:31,084 INFO [train.py:1198] (0/2) Epoch 50, batch 2950, loss[loss=0.2142, ctc_loss=0.1426, cr_loss=0.358, over 21009.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3672, over 4087215.48 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:54:04,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=895681.8333333334, ans=0.0 2024-09-19 08:54:30,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.317e+02 2.424e+02 2.670e+02 4.100e+02, threshold=4.848e+02, percent-clipped=0.0 2024-09-19 08:54:49,925 INFO [train.py:1198] (0/2) Epoch 50, batch 3000, loss[loss=0.2434, ctc_loss=0.1613, cr_loss=0.4106, over 21008.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1409, cr_loss=0.366, over 4093687.49 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:54:49,926 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 08:55:08,577 INFO [train.py:1230] (0/2) Epoch 50, validation: loss=0.03862, ctc_loss=0.03862, cr_loss=1.608e-14, over 944034.00 frames. 2024-09-19 08:55:08,578 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-19 08:55:17,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=895766.8333333334, ans=0.125 2024-09-19 08:55:42,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=895823.5, ans=0.0 2024-09-19 08:55:54,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=895851.8333333334, ans=0.125 2024-09-19 08:56:11,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=895880.1666666666, ans=0.125 2024-09-19 08:56:24,686 INFO [train.py:1198] (0/2) Epoch 50, batch 3050, loss[loss=0.1703, ctc_loss=0.1076, cr_loss=0.3139, over 20936.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1415, cr_loss=0.3677, over 4103394.59 frames. ], batch size: 49, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:56:36,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=895908.5, ans=0.2 2024-09-19 08:56:42,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=895936.8333333334, ans=0.0 2024-09-19 08:57:02,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=22.5 2024-09-19 08:57:17,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=895993.5, ans=0.125 2024-09-19 08:57:19,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=895993.5, ans=15.0 2024-09-19 08:57:19,765 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.269e+02 2.440e+02 2.595e+02 3.509e+02, threshold=4.880e+02, percent-clipped=0.0 2024-09-19 08:57:39,252 INFO [train.py:1198] (0/2) Epoch 50, batch 3100, loss[loss=0.2316, ctc_loss=0.1537, cr_loss=0.3894, over 21040.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1412, cr_loss=0.3667, over 4103058.82 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:58:09,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=896078.5, ans=0.0 2024-09-19 08:58:15,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=896106.8333333334, ans=0.125 2024-09-19 08:58:33,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=896135.1666666666, ans=0.1 2024-09-19 08:58:51,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=896163.5, ans=0.125 2024-09-19 08:58:57,710 INFO [train.py:1198] (0/2) Epoch 50, batch 3150, loss[loss=0.2078, ctc_loss=0.1376, cr_loss=0.3511, over 20767.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3671, over 4086142.59 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 08:59:05,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=896191.8333333334, ans=0.2 2024-09-19 08:59:12,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896220.1666666666, ans=0.1 2024-09-19 08:59:24,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=896220.1666666666, ans=0.0 2024-09-19 08:59:26,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=896248.5, ans=22.5 2024-09-19 08:59:44,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=896276.8333333334, ans=0.125 2024-09-19 08:59:55,564 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.256e+02 2.364e+02 2.511e+02 3.291e+02, threshold=4.728e+02, percent-clipped=0.0 2024-09-19 09:00:14,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=896333.5, ans=0.0 2024-09-19 09:00:15,599 INFO [train.py:1198] (0/2) Epoch 50, batch 3200, loss[loss=0.2478, ctc_loss=0.168, cr_loss=0.3992, over 20958.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1419, cr_loss=0.3672, over 4083149.49 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:00:15,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=896333.5, ans=0.0 2024-09-19 09:00:26,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=896333.5, ans=0.125 2024-09-19 09:00:51,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=896390.1666666666, ans=0.025 2024-09-19 09:01:05,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=896418.5, ans=0.125 2024-09-19 09:01:30,968 INFO [train.py:1198] (0/2) Epoch 50, batch 3250, loss[loss=0.2296, ctc_loss=0.1538, cr_loss=0.3794, over 20887.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.3677, over 4071004.80 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:01:35,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=896475.1666666666, ans=0.2 2024-09-19 09:01:41,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=896475.1666666666, ans=0.125 2024-09-19 09:02:27,283 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.902e+02 2.265e+02 2.376e+02 2.604e+02 4.520e+02, threshold=4.751e+02, percent-clipped=0.0 2024-09-19 09:02:46,656 INFO [train.py:1198] (0/2) Epoch 50, batch 3300, loss[loss=0.2194, ctc_loss=0.1453, cr_loss=0.3703, over 20960.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3669, over 4073301.76 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:03:06,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896645.1666666666, ans=0.1 2024-09-19 09:03:20,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=896673.5, ans=0.0 2024-09-19 09:03:22,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2024-09-19 09:03:30,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=896701.8333333334, ans=0.125 2024-09-19 09:03:32,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=896701.8333333334, ans=0.025 2024-09-19 09:03:35,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2024-09-19 09:04:04,569 INFO [train.py:1198] (0/2) Epoch 50, batch 3350, loss[loss=0.2002, ctc_loss=0.1277, cr_loss=0.3625, over 20981.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3671, over 4078878.26 frames. ], batch size: 48, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:04:10,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=896758.5, ans=0.125 2024-09-19 09:04:18,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=896786.8333333334, ans=0.125 2024-09-19 09:04:54,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=896843.5, ans=0.2 2024-09-19 09:05:00,331 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.285e+02 2.411e+02 2.587e+02 3.722e+02, threshold=4.823e+02, percent-clipped=0.0 2024-09-19 09:05:08,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2024-09-19 09:05:10,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=22.5 2024-09-19 09:05:22,671 INFO [train.py:1198] (0/2) Epoch 50, batch 3400, loss[loss=0.2512, ctc_loss=0.1676, cr_loss=0.4183, over 18104.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.141, cr_loss=0.3663, over 4081174.44 frames. ], batch size: 108, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:05:33,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=896900.1666666666, ans=0.07 2024-09-19 09:05:47,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=896928.5, ans=0.125 2024-09-19 09:06:06,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=896985.1666666666, ans=10.0 2024-09-19 09:06:38,007 INFO [train.py:1198] (0/2) Epoch 50, batch 3450, loss[loss=0.1911, ctc_loss=0.1229, cr_loss=0.3411, over 20947.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1407, cr_loss=0.3651, over 4086877.15 frames. ], batch size: 48, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:06:43,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=22.5 2024-09-19 09:06:53,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=897070.1666666666, ans=0.05 2024-09-19 09:07:26,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.08 vs. limit=10.0 2024-09-19 09:07:33,938 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.318e+02 2.426e+02 2.580e+02 3.269e+02, threshold=4.853e+02, percent-clipped=0.0 2024-09-19 09:07:48,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-09-19 09:07:53,741 INFO [train.py:1198] (0/2) Epoch 50, batch 3500, loss[loss=0.1802, ctc_loss=0.1171, cr_loss=0.3154, over 21046.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1409, cr_loss=0.3659, over 4087928.94 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:08:10,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=897211.8333333334, ans=0.125 2024-09-19 09:08:30,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=897240.1666666666, ans=0.0 2024-09-19 09:09:09,431 INFO [train.py:1198] (0/2) Epoch 50, batch 3550, loss[loss=0.2196, ctc_loss=0.1432, cr_loss=0.3819, over 21003.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1413, cr_loss=0.367, over 4088522.70 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:09:09,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=897325.1666666666, ans=0.2 2024-09-19 09:09:16,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=897325.1666666666, ans=0.2 2024-09-19 09:10:08,215 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.358e+02 2.484e+02 2.630e+02 2.984e+02, threshold=4.968e+02, percent-clipped=0.0 2024-09-19 09:10:28,084 INFO [train.py:1198] (0/2) Epoch 50, batch 3600, loss[loss=0.2242, ctc_loss=0.1482, cr_loss=0.3801, over 20790.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.142, cr_loss=0.3685, over 4095824.72 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:10:55,057 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.94 vs. limit=12.0 2024-09-19 09:11:13,125 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-19 09:11:18,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=897551.8333333334, ans=0.125 2024-09-19 09:11:44,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=897580.1666666666, ans=0.0 2024-09-19 09:11:46,718 INFO [train.py:1198] (0/2) Epoch 50, batch 3650, loss[loss=0.1906, ctc_loss=0.1231, cr_loss=0.3371, over 20967.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1418, cr_loss=0.3687, over 4085739.89 frames. ], batch size: 49, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:12:09,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=897636.8333333334, ans=0.1 2024-09-19 09:12:12,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=897636.8333333334, ans=0.0 2024-09-19 09:12:22,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=897665.1666666666, ans=0.0 2024-09-19 09:12:25,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=897665.1666666666, ans=0.125 2024-09-19 09:12:27,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=897665.1666666666, ans=0.0 2024-09-19 09:12:42,046 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.342e+02 2.472e+02 2.648e+02 3.415e+02, threshold=4.943e+02, percent-clipped=0.0 2024-09-19 09:12:57,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=897721.8333333334, ans=0.2 2024-09-19 09:13:01,486 INFO [train.py:1198] (0/2) Epoch 50, batch 3700, loss[loss=0.2372, ctc_loss=0.1573, cr_loss=0.3994, over 20974.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1417, cr_loss=0.3686, over 4097801.11 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:13:06,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=897750.1666666666, ans=0.125 2024-09-19 09:13:23,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2024-09-19 09:13:35,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=897806.8333333334, ans=0.125 2024-09-19 09:14:09,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=897863.5, ans=0.125 2024-09-19 09:14:15,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=897891.8333333334, ans=0.1 2024-09-19 09:14:16,274 INFO [train.py:1198] (0/2) Epoch 50, batch 3750, loss[loss=0.2227, ctc_loss=0.1495, cr_loss=0.3658, over 20666.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3688, over 4088836.33 frames. ], batch size: 66, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:14:43,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=897920.1666666666, ans=0.125 2024-09-19 09:15:12,089 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.252e+02 2.391e+02 2.576e+02 2.941e+02, threshold=4.783e+02, percent-clipped=0.0 2024-09-19 09:15:21,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=898005.1666666666, ans=0.125 2024-09-19 09:15:21,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=898005.1666666666, ans=0.2 2024-09-19 09:15:23,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-19 09:15:25,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=898005.1666666666, ans=0.5 2024-09-19 09:15:29,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=898005.1666666666, ans=0.125 2024-09-19 09:15:34,809 INFO [train.py:1198] (0/2) Epoch 50, batch 3800, loss[loss=0.2205, ctc_loss=0.144, cr_loss=0.3825, over 20827.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1415, cr_loss=0.3678, over 4101440.89 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:16:10,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=898090.1666666666, ans=0.125 2024-09-19 09:16:20,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=898118.5, ans=0.125 2024-09-19 09:16:42,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=898146.8333333334, ans=0.0 2024-09-19 09:16:44,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=898146.8333333334, ans=0.125 2024-09-19 09:16:53,014 INFO [train.py:1198] (0/2) Epoch 50, batch 3850, loss[loss=0.1744, ctc_loss=0.1131, cr_loss=0.3061, over 20959.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.141, cr_loss=0.3675, over 4100314.39 frames. ], batch size: 51, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:17:18,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=898203.5, ans=0.125 2024-09-19 09:17:36,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=898260.1666666666, ans=0.125 2024-09-19 09:17:49,936 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.321e+02 2.455e+02 2.629e+02 5.773e+02, threshold=4.910e+02, percent-clipped=2.0 2024-09-19 09:18:07,849 INFO [train.py:1198] (0/2) Epoch 50, batch 3900, loss[loss=0.2239, ctc_loss=0.1519, cr_loss=0.3602, over 19555.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1414, cr_loss=0.3687, over 4089188.94 frames. ], batch size: 90, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:18:58,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898401.8333333334, ans=0.1 2024-09-19 09:19:17,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.13 vs. limit=5.0 2024-09-19 09:19:23,353 INFO [train.py:1198] (0/2) Epoch 50, batch 3950, loss[loss=0.2388, ctc_loss=0.1582, cr_loss=0.403, over 19901.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.142, cr_loss=0.3693, over 4082608.05 frames. ], batch size: 80, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:19:32,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=898458.5, ans=0.1 2024-09-19 09:19:48,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=898486.8333333334, ans=0.125 2024-09-19 09:20:19,947 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:20:20,963 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.303e+02 2.461e+02 2.591e+02 8.269e+02, threshold=4.922e+02, percent-clipped=1.0 2024-09-19 09:20:38,990 INFO [train.py:1198] (0/2) Epoch 50, batch 4000, loss[loss=0.2343, ctc_loss=0.159, cr_loss=0.3767, over 20674.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1419, cr_loss=0.3692, over 4090344.70 frames. ], batch size: 68, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:21:03,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.52 vs. limit=15.0 2024-09-19 09:21:07,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=898628.5, ans=0.2 2024-09-19 09:21:37,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=898685.1666666666, ans=0.0 2024-09-19 09:21:52,845 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-09-19 09:21:56,602 INFO [train.py:1198] (0/2) Epoch 50, batch 4050, loss[loss=0.2252, ctc_loss=0.1504, cr_loss=0.3743, over 20980.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1425, cr_loss=0.37, over 4092684.97 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:22:12,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=898770.1666666666, ans=0.125 2024-09-19 09:22:50,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=898826.8333333334, ans=0.025 2024-09-19 09:22:56,452 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.304e+02 2.442e+02 2.641e+02 3.605e+02, threshold=4.884e+02, percent-clipped=0.0 2024-09-19 09:22:58,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=898855.1666666666, ans=0.0 2024-09-19 09:22:58,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=898855.1666666666, ans=0.125 2024-09-19 09:23:14,300 INFO [train.py:1198] (0/2) Epoch 50, batch 4100, loss[loss=0.2287, ctc_loss=0.1521, cr_loss=0.3834, over 21008.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1419, cr_loss=0.3685, over 4092915.75 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:23:43,398 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2024-09-19 09:24:04,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=898968.5, ans=0.0 2024-09-19 09:24:10,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=898968.5, ans=0.0 2024-09-19 09:24:29,388 INFO [train.py:1198] (0/2) Epoch 50, batch 4150, loss[loss=0.2178, ctc_loss=0.1448, cr_loss=0.3652, over 21078.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1413, cr_loss=0.3682, over 4097807.69 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:25:07,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=12.0 2024-09-19 09:25:24,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=22.5 2024-09-19 09:25:26,980 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.117e+02 2.319e+02 2.478e+02 2.657e+02 3.033e+02, threshold=4.956e+02, percent-clipped=0.0 2024-09-19 09:25:44,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=899166.8333333334, ans=0.125 2024-09-19 09:25:45,408 INFO [train.py:1198] (0/2) Epoch 50, batch 4200, loss[loss=0.2283, ctc_loss=0.1511, cr_loss=0.3859, over 20720.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.141, cr_loss=0.3674, over 4108591.00 frames. ], batch size: 71, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:26:14,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=899223.5, ans=0.125 2024-09-19 09:26:29,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=899251.8333333334, ans=0.125 2024-09-19 09:27:03,579 INFO [train.py:1198] (0/2) Epoch 50, batch 4250, loss[loss=0.234, ctc_loss=0.1578, cr_loss=0.3814, over 20677.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1419, cr_loss=0.3687, over 4088778.72 frames. ], batch size: 71, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:27:12,065 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2024-09-19 09:27:20,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=899336.8333333334, ans=15.0 2024-09-19 09:27:26,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=899336.8333333334, ans=0.0 2024-09-19 09:27:40,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=899365.1666666666, ans=0.125 2024-09-19 09:27:53,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=22.5 2024-09-19 09:28:03,570 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.275e+02 2.415e+02 2.601e+02 5.675e+02, threshold=4.830e+02, percent-clipped=1.0 2024-09-19 09:28:04,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=899393.5, ans=15.0 2024-09-19 09:28:08,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-19 09:28:21,412 INFO [train.py:1198] (0/2) Epoch 50, batch 4300, loss[loss=0.2101, ctc_loss=0.1385, cr_loss=0.358, over 20882.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1415, cr_loss=0.3681, over 4086079.55 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:28:32,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=899450.1666666666, ans=0.05 2024-09-19 09:29:28,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=899563.5, ans=0.125 2024-09-19 09:29:37,070 INFO [train.py:1198] (0/2) Epoch 50, batch 4350, loss[loss=0.2212, ctc_loss=0.1435, cr_loss=0.3886, over 20975.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1409, cr_loss=0.367, over 4103331.80 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:30:13,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=899648.5, ans=0.0 2024-09-19 09:30:19,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=899648.5, ans=0.125 2024-09-19 09:30:19,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=899648.5, ans=0.125 2024-09-19 09:30:34,254 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.298e+02 2.428e+02 2.673e+02 1.217e+03, threshold=4.857e+02, percent-clipped=1.0 2024-09-19 09:30:35,124 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-09-19 09:30:52,303 INFO [train.py:1198] (0/2) Epoch 50, batch 4400, loss[loss=0.2317, ctc_loss=0.1564, cr_loss=0.3767, over 18147.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1409, cr_loss=0.367, over 4094687.87 frames. ], batch size: 108, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:31:13,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2024-09-19 09:31:34,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=899790.1666666666, ans=0.125 2024-09-19 09:31:51,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=899846.8333333334, ans=0.2 2024-09-19 09:32:02,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=899846.8333333334, ans=0.1 2024-09-19 09:32:10,725 INFO [train.py:1198] (0/2) Epoch 50, batch 4450, loss[loss=0.2222, ctc_loss=0.1459, cr_loss=0.3815, over 20194.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1407, cr_loss=0.3663, over 4092489.82 frames. ], batch size: 80, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:32:45,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=899931.8333333334, ans=0.1 2024-09-19 09:32:47,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=899931.8333333334, ans=0.0 2024-09-19 09:32:53,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=899931.8333333334, ans=0.0 2024-09-19 09:33:08,025 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.325e+02 2.445e+02 2.628e+02 3.331e+02, threshold=4.890e+02, percent-clipped=0.0 2024-09-19 09:33:15,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=899988.5, ans=0.125 2024-09-19 09:33:21,381 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2024-09-19 09:33:26,031 INFO [train.py:1198] (0/2) Epoch 50, batch 4500, loss[loss=0.2265, ctc_loss=0.1503, cr_loss=0.3814, over 20700.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1405, cr_loss=0.3661, over 4094551.07 frames. ], batch size: 68, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:34:03,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=22.5 2024-09-19 09:34:36,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=900130.1666666666, ans=0.0 2024-09-19 09:34:44,518 INFO [train.py:1198] (0/2) Epoch 50, batch 4550, loss[loss=0.2038, ctc_loss=0.1371, cr_loss=0.3333, over 20977.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1409, cr_loss=0.3668, over 4089997.07 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:34:58,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=900186.8333333334, ans=0.2 2024-09-19 09:35:02,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=900186.8333333334, ans=0.125 2024-09-19 09:35:20,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=900215.1666666666, ans=0.1 2024-09-19 09:35:41,990 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.282e+02 2.424e+02 2.573e+02 3.172e+02, threshold=4.848e+02, percent-clipped=0.0 2024-09-19 09:35:50,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-09-19 09:36:00,028 INFO [train.py:1198] (0/2) Epoch 50, batch 4600, loss[loss=0.1977, ctc_loss=0.1293, cr_loss=0.3418, over 20880.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1408, cr_loss=0.3666, over 4085411.75 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:36:07,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=900300.1666666666, ans=0.125 2024-09-19 09:36:21,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=900328.5, ans=0.125 2024-09-19 09:36:40,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=900356.8333333334, ans=0.0 2024-09-19 09:36:40,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=900356.8333333334, ans=0.04949747468305833 2024-09-19 09:36:44,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=900385.1666666666, ans=0.0 2024-09-19 09:36:54,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=900385.1666666666, ans=0.125 2024-09-19 09:37:13,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=900441.8333333334, ans=0.125 2024-09-19 09:37:15,052 INFO [train.py:1198] (0/2) Epoch 50, batch 4650, loss[loss=0.2281, ctc_loss=0.1481, cr_loss=0.4002, over 20969.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1405, cr_loss=0.366, over 4089468.68 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:37:38,009 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:37:45,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=900498.5, ans=0.0 2024-09-19 09:37:56,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=900498.5, ans=0.0 2024-09-19 09:38:02,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-19 09:38:03,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=900526.8333333334, ans=0.125 2024-09-19 09:38:03,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=900526.8333333334, ans=0.125 2024-09-19 09:38:15,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.298e+02 2.468e+02 2.596e+02 3.526e+02, threshold=4.936e+02, percent-clipped=0.0 2024-09-19 09:38:24,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=900555.1666666666, ans=0.2 2024-09-19 09:38:25,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=900555.1666666666, ans=0.1 2024-09-19 09:38:33,153 INFO [train.py:1198] (0/2) Epoch 50, batch 4700, loss[loss=0.2135, ctc_loss=0.1388, cr_loss=0.3738, over 21053.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1417, cr_loss=0.3681, over 4080967.71 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:38:42,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=900583.5, ans=0.125 2024-09-19 09:38:42,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=900583.5, ans=0.2 2024-09-19 09:39:29,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=900668.5, ans=0.125 2024-09-19 09:39:33,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=900668.5, ans=0.125 2024-09-19 09:39:51,357 INFO [train.py:1198] (0/2) Epoch 50, batch 4750, loss[loss=0.1894, ctc_loss=0.123, cr_loss=0.3321, over 20967.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1415, cr_loss=0.3684, over 4091186.50 frames. ], batch size: 48, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:39:59,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=900725.1666666666, ans=0.125 2024-09-19 09:40:48,233 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.273e+02 2.438e+02 2.557e+02 4.070e+02, threshold=4.876e+02, percent-clipped=0.0 2024-09-19 09:41:06,152 INFO [train.py:1198] (0/2) Epoch 50, batch 4800, loss[loss=0.2127, ctc_loss=0.1405, cr_loss=0.361, over 20973.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1416, cr_loss=0.3684, over 4089733.42 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:41:19,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=900895.1666666666, ans=0.125 2024-09-19 09:41:23,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=900895.1666666666, ans=15.0 2024-09-19 09:41:33,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=900895.1666666666, ans=0.1 2024-09-19 09:42:15,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=900980.1666666666, ans=0.2 2024-09-19 09:42:21,500 INFO [train.py:1198] (0/2) Epoch 50, batch 4850, loss[loss=0.2343, ctc_loss=0.1583, cr_loss=0.3798, over 18101.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1417, cr_loss=0.3683, over 4092216.88 frames. ], batch size: 108, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:42:45,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=901036.8333333334, ans=0.125 2024-09-19 09:42:50,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=901065.1666666666, ans=15.0 2024-09-19 09:42:55,052 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=22.5 2024-09-19 09:43:18,230 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.316e+02 2.418e+02 2.607e+02 6.487e+02, threshold=4.836e+02, percent-clipped=1.0 2024-09-19 09:43:22,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.41 vs. limit=15.0 2024-09-19 09:43:33,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=901121.8333333334, ans=0.125 2024-09-19 09:43:39,341 INFO [train.py:1198] (0/2) Epoch 50, batch 4900, loss[loss=0.207, ctc_loss=0.1339, cr_loss=0.3653, over 20997.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1421, cr_loss=0.369, over 4087126.35 frames. ], batch size: 48, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:43:57,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=901178.5, ans=0.0 2024-09-19 09:44:12,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=901206.8333333334, ans=0.2 2024-09-19 09:44:31,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=901235.1666666666, ans=0.125 2024-09-19 09:44:37,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=901263.5, ans=0.015 2024-09-19 09:44:42,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=901263.5, ans=0.09899494936611666 2024-09-19 09:44:46,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=901263.5, ans=0.125 2024-09-19 09:44:53,774 INFO [train.py:1198] (0/2) Epoch 50, batch 4950, loss[loss=0.2196, ctc_loss=0.1455, cr_loss=0.3709, over 20883.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1418, cr_loss=0.3691, over 4093085.69 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:44:56,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=901291.8333333334, ans=0.125 2024-09-19 09:45:13,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=901320.1666666666, ans=0.125 2024-09-19 09:45:21,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-19 09:45:49,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.307e+02 2.452e+02 2.660e+02 3.351e+02, threshold=4.903e+02, percent-clipped=0.0 2024-09-19 09:45:56,677 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=12.0 2024-09-19 09:45:59,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901405.1666666666, ans=0.1 2024-09-19 09:46:10,557 INFO [train.py:1198] (0/2) Epoch 50, batch 5000, loss[loss=0.2323, ctc_loss=0.15, cr_loss=0.4116, over 20856.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1414, cr_loss=0.3683, over 4091604.67 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:46:51,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901490.1666666666, ans=0.1 2024-09-19 09:47:20,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=901546.8333333334, ans=0.125 2024-09-19 09:47:24,874 INFO [train.py:1198] (0/2) Epoch 50, batch 5050, loss[loss=0.2174, ctc_loss=0.143, cr_loss=0.3719, over 20974.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1417, cr_loss=0.3685, over 4092165.61 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:47:29,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=901575.1666666666, ans=0.0 2024-09-19 09:48:12,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=901660.1666666666, ans=0.2 2024-09-19 09:48:21,179 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.229e+02 2.381e+02 2.526e+02 8.900e+02, threshold=4.762e+02, percent-clipped=1.0 2024-09-19 09:48:34,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=901688.5, ans=0.0 2024-09-19 09:48:38,944 INFO [train.py:1198] (0/2) Epoch 50, batch 5100, loss[loss=0.2357, ctc_loss=0.1547, cr_loss=0.4051, over 20831.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1419, cr_loss=0.369, over 4101434.47 frames. ], batch size: 65, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:49:24,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-09-19 09:49:25,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=12.0 2024-09-19 09:49:41,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=901830.1666666666, ans=0.125 2024-09-19 09:49:47,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=901830.1666666666, ans=0.2 2024-09-19 09:49:49,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901830.1666666666, ans=0.1 2024-09-19 09:49:53,448 INFO [train.py:1198] (0/2) Epoch 50, batch 5150, loss[loss=0.2128, ctc_loss=0.1388, cr_loss=0.3702, over 20974.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1412, cr_loss=0.3672, over 4081792.04 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:50:10,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=901886.8333333334, ans=0.0 2024-09-19 09:50:17,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=901886.8333333334, ans=0.0 2024-09-19 09:50:39,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=901943.5, ans=0.125 2024-09-19 09:50:49,770 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.300e+02 2.425e+02 2.557e+02 3.658e+02, threshold=4.850e+02, percent-clipped=0.0 2024-09-19 09:51:07,809 INFO [train.py:1198] (0/2) Epoch 50, batch 5200, loss[loss=0.2361, ctc_loss=0.1568, cr_loss=0.3964, over 21033.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1416, cr_loss=0.3672, over 4081597.00 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:51:34,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=902028.5, ans=0.2 2024-09-19 09:52:21,875 INFO [train.py:1198] (0/2) Epoch 50, batch 5250, loss[loss=0.2376, ctc_loss=0.1588, cr_loss=0.3937, over 17970.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1413, cr_loss=0.3669, over 4074218.04 frames. ], batch size: 108, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:52:26,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=902141.8333333334, ans=0.2 2024-09-19 09:52:56,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=902198.5, ans=0.0 2024-09-19 09:53:02,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=902198.5, ans=0.125 2024-09-19 09:53:18,641 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.309e+02 2.443e+02 2.645e+02 7.412e+02, threshold=4.887e+02, percent-clipped=1.0 2024-09-19 09:53:30,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=902255.1666666666, ans=0.0 2024-09-19 09:53:38,805 INFO [train.py:1198] (0/2) Epoch 50, batch 5300, loss[loss=0.2364, ctc_loss=0.1529, cr_loss=0.4171, over 20765.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1422, cr_loss=0.3682, over 4059591.97 frames. ], batch size: 71, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 09:53:55,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=902311.8333333334, ans=0.2 2024-09-19 09:54:00,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-09-19 09:54:08,699 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:54:30,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=902368.5, ans=0.125 2024-09-19 09:54:41,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=902396.8333333334, ans=0.2 2024-09-19 09:54:52,532 INFO [train.py:1198] (0/2) Epoch 50, batch 5350, loss[loss=0.2154, ctc_loss=0.1409, cr_loss=0.3724, over 20350.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.3671, over 4067611.23 frames. ], batch size: 74, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:54:57,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=902425.1666666666, ans=0.125 2024-09-19 09:55:00,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=902425.1666666666, ans=0.0 2024-09-19 09:55:22,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=902481.8333333334, ans=0.125 2024-09-19 09:55:40,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=902510.1666666666, ans=0.0 2024-09-19 09:55:53,026 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.306e+02 2.484e+02 2.674e+02 7.892e+02, threshold=4.969e+02, percent-clipped=1.0 2024-09-19 09:56:02,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=902538.5, ans=0.125 2024-09-19 09:56:09,488 INFO [train.py:1198] (0/2) Epoch 50, batch 5400, loss[loss=0.2354, ctc_loss=0.1568, cr_loss=0.3932, over 20974.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.142, cr_loss=0.3678, over 4077162.47 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:56:41,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=902623.5, ans=0.0 2024-09-19 09:56:48,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=902623.5, ans=0.05 2024-09-19 09:56:52,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=902651.8333333334, ans=0.125 2024-09-19 09:57:12,046 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:57:23,867 INFO [train.py:1198] (0/2) Epoch 50, batch 5450, loss[loss=0.1732, ctc_loss=0.11, cr_loss=0.3163, over 21003.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1417, cr_loss=0.3674, over 4071123.05 frames. ], batch size: 52, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:57:43,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=902736.8333333334, ans=0.125 2024-09-19 09:57:50,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=902736.8333333334, ans=0.125 2024-09-19 09:58:21,417 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.299e+02 2.453e+02 2.614e+02 4.135e+02, threshold=4.905e+02, percent-clipped=0.0 2024-09-19 09:58:27,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=22.5 2024-09-19 09:58:31,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=902821.8333333334, ans=0.125 2024-09-19 09:58:37,506 INFO [train.py:1198] (0/2) Epoch 50, batch 5500, loss[loss=0.1767, ctc_loss=0.1157, cr_loss=0.3049, over 20964.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1422, cr_loss=0.3681, over 4077285.56 frames. ], batch size: 52, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 09:58:53,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=902878.5, ans=0.1 2024-09-19 09:59:20,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=902935.1666666666, ans=0.1 2024-09-19 09:59:20,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=902935.1666666666, ans=0.2 2024-09-19 09:59:50,908 INFO [train.py:1198] (0/2) Epoch 50, batch 5550, loss[loss=0.2547, ctc_loss=0.1698, cr_loss=0.4246, over 19554.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1421, cr_loss=0.3679, over 4076442.52 frames. ], batch size: 90, lr: 1.64e-03, grad_scale: 16.0 2024-09-19 10:00:09,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=12.0 2024-09-19 10:00:18,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=903020.1666666666, ans=0.0 2024-09-19 10:00:20,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2024-09-19 10:00:31,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=903048.5, ans=0.125 2024-09-19 10:00:40,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=903076.8333333334, ans=0.125 2024-09-19 10:00:48,994 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.247e+02 2.367e+02 2.567e+02 4.623e+02, threshold=4.734e+02, percent-clipped=0.0 2024-09-19 10:01:05,413 INFO [train.py:1198] (0/2) Epoch 50, batch 5600, loss[loss=0.2076, ctc_loss=0.1364, cr_loss=0.3558, over 20947.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1417, cr_loss=0.3667, over 4081349.26 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:01:21,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=903161.8333333334, ans=10.0 2024-09-19 10:02:10,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=903246.8333333334, ans=0.125 2024-09-19 10:02:22,290 INFO [train.py:1198] (0/2) Epoch 50, batch 5650, loss[loss=0.2445, ctc_loss=0.1647, cr_loss=0.399, over 20038.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1409, cr_loss=0.3662, over 4098099.93 frames. ], batch size: 80, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:02:32,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=903275.1666666666, ans=0.125 2024-09-19 10:02:50,504 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:02:50,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=903331.8333333334, ans=0.125 2024-09-19 10:03:19,585 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.290e+02 2.430e+02 2.621e+02 3.528e+02, threshold=4.860e+02, percent-clipped=0.0 2024-09-19 10:03:21,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-09-19 10:03:35,879 INFO [train.py:1198] (0/2) Epoch 50, batch 5700, loss[loss=0.2401, ctc_loss=0.1605, cr_loss=0.3982, over 20866.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1415, cr_loss=0.3663, over 4089183.04 frames. ], batch size: 65, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:03:39,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=903416.8333333334, ans=0.125 2024-09-19 10:03:39,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=903416.8333333334, ans=0.0 2024-09-19 10:04:02,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=903445.1666666666, ans=0.125 2024-09-19 10:04:06,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=903473.5, ans=0.1 2024-09-19 10:04:25,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=903501.8333333334, ans=0.125 2024-09-19 10:04:31,900 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2024-09-19 10:04:52,317 INFO [train.py:1198] (0/2) Epoch 50, batch 5750, loss[loss=0.1737, ctc_loss=0.1099, cr_loss=0.3192, over 20983.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1418, cr_loss=0.3665, over 4061652.40 frames. ], batch size: 48, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:04:52,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=903558.5, ans=0.1 2024-09-19 10:04:57,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=903558.5, ans=0.125 2024-09-19 10:05:13,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=903586.8333333334, ans=0.07 2024-09-19 10:05:42,123 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2024-09-19 10:05:50,002 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.344e+02 2.489e+02 2.688e+02 3.298e+02, threshold=4.978e+02, percent-clipped=0.0 2024-09-19 10:06:06,419 INFO [train.py:1198] (0/2) Epoch 50, batch 5800, loss[loss=0.2233, ctc_loss=0.1478, cr_loss=0.3776, over 21079.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1422, cr_loss=0.3675, over 4064855.89 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:06:15,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=903700.1666666666, ans=0.0 2024-09-19 10:06:21,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=903728.5, ans=0.0 2024-09-19 10:06:29,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2024-09-19 10:06:31,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.07 vs. limit=15.0 2024-09-19 10:07:02,257 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-19 10:07:21,032 INFO [train.py:1198] (0/2) Epoch 50, batch 5850, loss[loss=0.165, ctc_loss=0.1061, cr_loss=0.2947, over 19945.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1409, cr_loss=0.3653, over 4068448.24 frames. ], batch size: 44, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:07:48,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=903870.1666666666, ans=0.125 2024-09-19 10:07:54,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=903898.5, ans=0.125 2024-09-19 10:08:19,204 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.289e+02 2.410e+02 2.545e+02 7.787e+02, threshold=4.821e+02, percent-clipped=1.0 2024-09-19 10:08:21,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903955.1666666666, ans=0.1 2024-09-19 10:08:35,795 INFO [train.py:1198] (0/2) Epoch 50, batch 5900, loss[loss=0.1792, ctc_loss=0.116, cr_loss=0.3157, over 19837.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.141, cr_loss=0.366, over 4079653.50 frames. ], batch size: 44, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:08:37,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=12.0 2024-09-19 10:09:08,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=904040.1666666666, ans=0.1 2024-09-19 10:09:36,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=904096.8333333334, ans=0.09899494936611666 2024-09-19 10:09:37,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=904096.8333333334, ans=0.125 2024-09-19 10:09:41,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-19 10:09:49,633 INFO [train.py:1198] (0/2) Epoch 50, batch 5950, loss[loss=0.2532, ctc_loss=0.1702, cr_loss=0.4152, over 20948.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1413, cr_loss=0.3667, over 4084148.04 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:10:10,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=904153.5, ans=0.0 2024-09-19 10:10:13,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=904153.5, ans=0.125 2024-09-19 10:10:43,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=904210.1666666666, ans=0.0 2024-09-19 10:10:50,243 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.257e+02 2.429e+02 2.560e+02 4.338e+02, threshold=4.859e+02, percent-clipped=0.0 2024-09-19 10:11:06,522 INFO [train.py:1198] (0/2) Epoch 50, batch 6000, loss[loss=0.2375, ctc_loss=0.1612, cr_loss=0.3813, over 18447.00 frames. ], tot_loss[loss=0.2149, ctc_loss=0.1416, cr_loss=0.3668, over 4083127.27 frames. ], batch size: 108, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:11:06,523 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 10:11:15,078 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.3332, 6.0070, 5.7933, 5.4402], device='cuda:0') 2024-09-19 10:11:16,857 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3097, 4.7864, 4.8273, 5.1231], device='cuda:0') 2024-09-19 10:11:24,437 INFO [train.py:1230] (0/2) Epoch 50, validation: loss=0.03896, ctc_loss=0.03896, cr_loss=1.621e-14, over 944034.00 frames. 2024-09-19 10:11:24,438 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 20875MB 2024-09-19 10:11:41,225 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:11:54,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=904323.5, ans=0.2 2024-09-19 10:12:41,361 INFO [train.py:1198] (0/2) Epoch 50, batch 6050, loss[loss=0.1924, ctc_loss=0.1268, cr_loss=0.328, over 20798.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1417, cr_loss=0.367, over 4090207.14 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:13:05,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=904436.8333333334, ans=0.125 2024-09-19 10:13:08,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=904436.8333333334, ans=0.0 2024-09-19 10:13:17,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=904465.1666666666, ans=0.025 2024-09-19 10:13:36,147 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2024-09-19 10:13:39,933 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.302e+02 2.437e+02 2.622e+02 3.791e+02, threshold=4.874e+02, percent-clipped=0.0 2024-09-19 10:13:46,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=904521.8333333334, ans=0.125 2024-09-19 10:13:56,340 INFO [train.py:1198] (0/2) Epoch 50, batch 6100, loss[loss=0.2034, ctc_loss=0.1318, cr_loss=0.3582, over 20989.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1411, cr_loss=0.3663, over 4092154.44 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:13:58,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=904550.1666666666, ans=0.125 2024-09-19 10:14:11,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=904578.5, ans=0.125 2024-09-19 10:14:19,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=904578.5, ans=0.1 2024-09-19 10:14:36,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=904606.8333333334, ans=0.2 2024-09-19 10:14:43,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=904635.1666666666, ans=0.0 2024-09-19 10:14:47,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=904635.1666666666, ans=0.125 2024-09-19 10:14:51,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=904635.1666666666, ans=0.025 2024-09-19 10:14:54,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=904663.5, ans=0.2 2024-09-19 10:15:10,190 INFO [train.py:1198] (0/2) Epoch 50, batch 6150, loss[loss=0.2119, ctc_loss=0.1366, cr_loss=0.3765, over 21051.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1419, cr_loss=0.3678, over 4080973.71 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:15:14,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=904691.8333333334, ans=0.2 2024-09-19 10:15:37,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=12.0 2024-09-19 10:15:54,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=904776.8333333334, ans=0.0 2024-09-19 10:16:00,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=904776.8333333334, ans=0.2 2024-09-19 10:16:03,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=904776.8333333334, ans=0.2 2024-09-19 10:16:07,514 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.050e+02 2.304e+02 2.446e+02 2.627e+02 3.437e+02, threshold=4.892e+02, percent-clipped=0.0 2024-09-19 10:16:23,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-09-19 10:16:23,986 INFO [train.py:1198] (0/2) Epoch 50, batch 6200, loss[loss=0.2667, ctc_loss=0.1813, cr_loss=0.4272, over 18316.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1421, cr_loss=0.3685, over 4073516.47 frames. ], batch size: 108, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:16:30,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=904833.5, ans=0.125 2024-09-19 10:16:44,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=904861.8333333334, ans=0.125 2024-09-19 10:17:37,732 INFO [train.py:1198] (0/2) Epoch 50, batch 6250, loss[loss=0.2422, ctc_loss=0.1594, cr_loss=0.414, over 20768.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1431, cr_loss=0.3702, over 4058113.78 frames. ], batch size: 71, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:17:59,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=905003.5, ans=0.2 2024-09-19 10:18:23,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=905060.1666666666, ans=0.0 2024-09-19 10:18:33,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=905060.1666666666, ans=0.2 2024-09-19 10:18:34,990 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.339e+02 2.541e+02 2.732e+02 3.809e+02, threshold=5.081e+02, percent-clipped=0.0 2024-09-19 10:18:50,849 INFO [train.py:1198] (0/2) Epoch 50, batch 6300, loss[loss=0.2246, ctc_loss=0.1492, cr_loss=0.3771, over 20626.00 frames. ], tot_loss[loss=0.2188, ctc_loss=0.1444, cr_loss=0.3718, over 4016089.53 frames. ], batch size: 71, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:19:05,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=905145.1666666666, ans=0.1 2024-09-19 10:19:17,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=905145.1666666666, ans=0.125 2024-09-19 10:19:48,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-09-19 10:19:55,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=905230.1666666666, ans=0.125 2024-09-19 10:20:02,507 INFO [train.py:1198] (0/2) Epoch 50, batch 6350, loss[loss=0.2613, ctc_loss=0.179, cr_loss=0.4118, over 14654.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1478, cr_loss=0.3732, over 3829847.06 frames. ], batch size: 149, lr: 1.64e-03, grad_scale: 32.0 2024-09-19 10:20:15,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=905286.8333333334, ans=0.0 2024-09-19 10:20:28,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=905286.8333333334, ans=0.125 2024-09-19 10:20:58,677 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.054e+02 2.707e+02 2.928e+02 3.162e+02 4.182e+02, threshold=5.857e+02, percent-clipped=0.0 2024-09-19 10:21:00,087 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-small-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1/epoch-50.pt 2024-09-19 10:21:01,697 INFO [train.py:1496] (0/2) Done!